Carnegie Mellon University

How Statisticians Write and Why

July 06, 2021

How Statisticians Write and Why

By Sarah Voorhees

David Brown, an Associate Teaching Professor of English, and Michael Laudenbach, a Rhetoric PhD student, teamed up to investigate how statistics students write and sought to illuminate the critical “invisible choices” embedded in their papers.

When Brown invited Laudenbach to attend the weekly meetings of the Teaching Statistics Group in 2020, Laudenbach quickly discovered a way to bridge his interests in rhetoric and disciplinary writing.

Brown and Laudenbach enlisted help from faculty in the Department of Statistics and Data Science to compile a corpus of over 900 student papers. The pair used DocuScope–a text analysis tool created by Department of English professors David Kaufer and Suguru Ishizaki—to tag rhetorical and lexicogrammatical patterns in each paper of the corpus. Brown and Laudenbach then conducted statistical analyses of their own to identify differences between novice versus expert papers and client-facing versus academic papers.

Brown and Laudenbach found that students in the courses 36-200 and 36-707 tended to write about human actors and actions and consistently used supporting verbs such as indicated, include, and observe. According to Brown and Laudenbach, the assignments in these courses ask students to translate data for non-expert audiences, which may explain the stronger narrative elements as compared to others in the corpus.

In their report draft, Brown and Laudenbach confirmed that expert students “deploy a wider and more targeted vocabulary,” while novice students “rely on a narrower repertoire, not only of nouns, but also of verbs.” Novice students tend to use the verb to be more often than expert students and use high-confidence phrases such as it is clear that and it is likely that

By connecting rhetorical tasks to the language patterns of these papers, Brown and Laudenbach can describe statistics as a disciplinary genre. Laudenbach explained that the authors of these papers made a series of “invisible choices” to accomplish the rhetorical tasks they were given. “We want to make these invisible choices visible to students and instructors,” he said.

In the future, Brown and Laudenbach hope to translate their discoveries into training materials for Teaching Assistants. With these materials, TAs and instructors would learn to recognize rhetorical and lexicogrammatical patterns in statistics student papers and determine which patterns are most effective for which type of writing task. Brown and Laudenbach will continue collecting texts for their corpus and possibly begin analyzing non-academic texts. With the field of statistics and data science growing rapidly, their research is more important than ever.

Pictured above: A boxplot from Laudenbach and Brown's research revealing the "invisible choices" in statistics students' papers