Close Statistical Reading: Rhetorical Interpretation of Computational Analyses of Texts at Scale
Author: Hannah Ringler
Degree: Ph.D. in Rhetoric, Carnegie Mellon University, 2022
In recent years, computational text analysis has become remarkably accurate at text classification, such as in distinguishing texts by categories like genre based upon a text’s word frequencies. In rhetorical studies though, something like genre would instead be distinguished in terms of the “social action” that it performs in the world, not differences in word frequencies. And while we might be able to imagine how rhetorical actions could link to specific word frequency patterns, the incongruence between these approaches to concepts like genre raises provocative questions about how rhetorical action is accomplished through recurring linguistic form. However, because many computational methods were developed with a goal of prediction, the research models around them are not necessarily built toward allowing this kind of interpretation – it would be infeasible to closely analyze a large corpus for all of the weighted features used in classification to understand why it works. Theories of textual interpretation are very much at the core of rhetorical studies, and as such, can offer a unique lens on this problem.
Toward this challenge, this dissertation takes a rhetorical approach to large-scale textual interpretation by offering a new hermeneutic, or interpretive theory and argumentative strategy, called “close statistical reading” that allows for defensible and insightful rhetorical interpretation. In developing interpretations of corpora, I argue that the goal is not to produce one “correct” interpretation. Rather, we can interpret tables of data about a corpus not by reading every text and data point, but by strategically analyzing how many different bits of data all point to the same understanding of a corpus. As such, the interpretive theory provides a way of understanding interpretation in a corpus context, but also an argumentative strategy the analyst needs for making the case for their interpretation. Compared to past work on computational hermeneutics, close statistical reading highlights the utility of synthesis of analysis toward understanding; it also takes a rhetorical approach to hermeneutics by foregrounding the argumentative strategies to justify interpretations of large corpora of texts. This new approach thus does not rely only upon the traditional mode of closely reading each text, but is instead conducive to working with large amounts of data. To demonstrate this theory, I offer an extended case study using stylometry methods to classify academic writing by discipline. This classification uses function word frequencies to classify disciplinary writing, and thus asks, where do function word frequencies fit into a conceptualization of disciplines as “ways of knowing and doing”? It synthesizes analyses of particular function words in action through different analytical techniques to develop rhetorical theory about how function words play a constitutive role in the creation of meaning, particularly in the context of academic knowledge representation.