Carnegie Mellon University

DocuScope: Computer-aided Rhetorical Analysis

DocuScope screenshot

What is DocuScope?

DocuScope is a text analysis environment with a suite of interactive visualization tools for corpus-based rhetorical analysis. The DocuScope Project began in 1998 as a result of a collaboration between David Kaufer and Suguru Ishizaki at Carnegie Mellon University. David created what we call the generic (default) dictionary, consisting of over 40 million linguistic patterns of English classified into over 100 categories of rhetorical effects. Suguru designed and implemented the analysis and visualization software, which can annotate a corpus of text against any dictionary of regular strings that are classified into a hierarchy of rhetorical effects. While we designed DocuScope as a tool for rhetorical analysis, we also found that it was extremely effective for developing the dictionary in a systematic fashion.

DocuScope screenshot

Theoretical Background

The default dictionary was developed based on David's and Brian Butler's earlier theoretical work in rhetoric (Kaufer & Butler 1996) and their applied work in representational theories of language (Kaufer & Butler 2000). Our latest theoretical framework as well as the overview of the generic dictionary is presented in Power of Words: Unveiling the Speaker and Writer's Hidden Craft (Kaufer, Ishizaki, Butler and Collins 2004). See Collins, Kaufer, Vlachos, Butler, Ishizaki, 2004; Kaufer & Hariman, 2008; Kaufer & Al-Malki, 2009a; and Kaufer & Ishizaki, 2006 for projects that used the generic dictionary.

Domain-Specific Custom Dictionaries

Analysts can also use DocuScope with a domain-specific (i.e., custom) dictionary. They may customize the generic dictionary by using only subsets, or they can create a completely new dictionary. DocuScope allows analysts to systematically explore and create a new domain-specific dictionary. See Al-Malki, Kaufer, Ishizaki, Dreher (forthcoming) and Kaufer & Al-Malki, 2009b for example projects that used custom dictionaries.

Shakespeare Project

Michael Witmore, Director of the Shakespeare Folger Library, and Jonathan Hope of Strathclyde University have used DocuScope for years to analyze Shakespeare and other early modern texts. You can read more about this project in the Early Modern Literary Studies journal article "The Very Large Textual Object: A Prosthetic Reading of Shakespeare." See also the article in Forbes Magazine and the Digital History blog post.

What if I want to use DocuScope on my own Textual Corpus?

Unfortunately, we don't have the resources to support the use of DocuScope outside of our research group and our students. Fortunately, the Working Group for Digital Inquiry at the University of Wisconsin-Madison has received funds from the Mellon Foundation to construct an environment that will allow scholars to have their texts analyzed by a variety of methods, including by the default dictionaries of the DocuScope environment. We will notify people on this site when that infrastructure is ready. If you have an interesting data set that you'd like to analyze with DocuScope, you can contact David Kaufer ( through email and David will give you an assessment as to whether the default dictionaries can add value to your analysis.



Amal, A. M., Kaufer, D., Ishizaki, S., & Dreher, K. (2012). Arab Women in Arab News: Old Stereotypes and New Media. Bloomsbury Academic.

Kaufer, D. & Buter, B. (2000). Designing Interactive Worlds with Words: Principles of Writing as Representational Composition. Routledge.

Kaufer, D. & Butler, B. (1996). Rhetoric and the Arts of Design. Routledge.

Kaufer, D., Ishizaki, S., Butler, B., & Collins, J. (2004). The Power of Words: Unveiling the Speaker and Writer's Hidden Craft. Routledge.


Collins, J., Kaufer, D., Vlachos, P., Butler, B., & Ishizaki, S. (2004). Detecting collaborations in text comparing the authors' rhetorical language choices in the Federalist Papers. Computers and the Humanities, 38(1), 15-36.

Geisler, C., Kaufer, D. & Itext Working Group. (2001). Future directions for research on the relationship between information technology and writing. Journal of Business and Technical Communication, Part I, 270-308.

Kaufer, D. (2006). Genre variation and minority ethnic identity: exploring the personal profile in Indian American community publications. Discourse & Society, 17(6), 761-784.

Kaufer, D. & Al-Malki, A. M. (2009). A "first" for women in the kingdom: Arab/West representations of female trendsetters in Saudi Arabia. Journal of Arab and Muslim Media Research, 2(2), 113-133.

Kaufer, D. & Al-Malki, A. M. (2009). The War on Terror through Arab-American eyes: the Arab-American press as a rhetorical counterpublic. Rhetoric Review, 28(1), 47-65.

Kaufer, D. & Hariman, R. (2008). A corpus analysis evaluating Hariman's theory of political style. Text & Talk, 28(4), 475-500.

Kaufer, D. & Ishizaki, S. (2006). A corpus study of canned letters: mining the latent rhetorical proficiencies marketed to writers in a hurry and non-writers. IEEE Transactions on Professional Communication, 49(3), 254-266.

Kaufer, D., Ishizaki, S., Collins, J., & Vlachos, P. (2004). Teaching language awareness in rhetorical choice using Itext and visualization in classroom genre assignments. Journal for Business and Technical Communication, 18(3), 361-402.

Kaufer, D., Parry-Giles, S., & Klebanov, B. B. (forthcoming). Tracking "image bites" across the public/private divide: NBC News coverage of Hillary Clinton from scorned wife to senate candidate. Journal of Language and Politics.

Klebanov, B. B., Kaufer, D., & Franklin, H. (forthcoming). A figure in a field: semantic field-based analysis of antithesis. Journal of Cognitive Semiotics.

Parry-Giles, S. & Kaufer, D. (forthcoming). Lincoln reminiscences and nineteenth-century portraiture: the private virtues of presidential character. Rhetoric and Public Affairs.

Chapters in Edited Volumes

Hu, Y., Kaufer, D., & Ishizaki, S. (2010). Genre and Instinct. Computing with Instinct, Lecture Notes in Artificial Intelligence, LNAI 5897, ed. Cai, Y. Springer.

Ishizaki, S. & Kaufer, D. The DocuScope Text Analysis and Visualization Environment. (2011). Invited chapter for Applied Natural Language Processing and Content Analysis: Identification, Investigation, and Resolution, ed. McCarthy, P. & Boonthum, C.

Kaufer, D. (2004). Public vs. Private Rhetoric: An Analysis of the NY Times Writers on Writing Series. The Public in Rhetorical Theory, ed. Kent, T. & Couture, B. Utah State Press, 163-185.

Kaufer, D., Geisler, C., Ishizaki, S., & Vlachos, P. (2005). Computer-Support for Genre Analysis and Discovery. Ambient Intelligence for Scientific Discovery, ed. Cai, Y. Springer, 129-151.

Kaufer, D., Geisler, C., Vlachos, P., & Ishizaki, S. (2006). Mining Textual Knowledge for Writing Research and Education. Writing & Digital Media, ed. Waes, L. V., Leijten, M., & Neuwirth, C. Amsterdam: Elsevier, 115-129.

Kaufer, D., Ishizaki, S., & Al-Malki, A. M. (2007). A Framework for Training Writing Teachers in the Discourse Patterns Underlying Cross-institutional Writing Assignments. Sustaining Excellence in Communicating Across the Curriculum: Cross-institutional Experiences and Best Practices. Cambridge Scholars Press, UK.

Oakley, T. & Kaufer, D. (2007). Designing Clinical Experiences with Words: The Three Layers of Analysis in Clinical Reports; A Dilemma for Mental Spaces and Genre Theory. Mental Spaces in Discourse and Interaction, ed. Hougaard, A. & Oakley, T. John Benjamins Publishing Company.