The DocuScope Project began in 1998 at Carnegie Mellon University as an interdisciplinary collaboration between David Kaufer, Professor of English, and Suguru Ishizaki, then a professor in the School of Design, today Professor of English. DocuScope’s natural language processing capability draws on a proprietary dictionary of millions of English phrases collected and classified over 20 years by David. DocuScope consists of an analytic engine, a suite of interactive visualizations, and a dictionary authoring tool.

Our earliest dictionary was developed based on David and Brian Butler's earlier theoretical work in rhetoric (Kaufer & Butler 1996) and their applied work in representational theories of language (Kaufer & Butler 2000). Our latest theoretical framework, as well as the overview of the generic dictionary, is presented in Power of Words: Unveiling the Speaker and Writer's Hidden Craft (Kaufer, Ishizaki, Butler and Collins 2004). (See Collins, Kaufer, Vlachos, Butler, Ishizaki, 2004; Kaufer & Hariman, 2008; Kaufer & Al-Malki, 2009a; and Kaufer & Ishizaki, 2006 for projects that used the generic dictionary.)

DocuScope was initially developed as an educational tool for David’s writing course, Narrative & Argument. We wished to create a studio-like writing course that allowed students to “see” and critique their drafts publicly. But we soon found that DocuScope was also a useful tool for corpus-based rhetorical analysis.

Over 20+ years we, with the support of numerous scholars and students, have continued to improve DocuScope and its theoretical framework. In the past few years, our project has expanded significantly to encompass multiple applications.

Through our research we have created a range of tools for computer-aided text analysis and technology-enhanced writing instruction: 


DocuScope Global


DocuScope Global is a text analysis environment with a suite of interactive visualization tools for corpus-based rhetorical analysis. The core elements of DocuScope Global are (1) a dictionary, created by David, consisting of tens of millions of uniquely classified linguistic patterns of English based on their effect on readers and (2) analysis and visualization software, designed and implemented by Suguru.

Download the latest release of DocuScope Global

DocuScope Classroom


DocuScope Classroom is an online text analysis/visualization environment that helps students see how writing strategies are used in their drafts and how those strategies are similar and different from the strategies of their classmates. The visualizations enhance students’ awareness of their composing decisions and the relationship of those choices to their writing context and intended genre. DocuScope Classroom has been used in certain sections of Carnegie Mellon’s Writing & Communication program.



OnTopic is a revision environment made up of interactive visualizations designed to help students keep their writing coherent and on topic. OnTopic uses natural language processing algorithms to visualize the topical organization of the student’s draft by highlighting salient topics within each paragraph, as well as in the text as a whole. At the sentence level, OnTopic allows students to study their sentence proportionality and “flow” by tracking the number of noun phrases their readers must process before and after a sentence’s main verb.

DocuScope 6.0


The latest incarnation of DocuScope supports student writers who want to inspect both their topical organization and the rhetorical experiences they create, whether localized to specific topics or ambient across the whole text. DocuScope 6.0 combines the basic functionality of OnTopic and DocuScope Classroom in a seamless interactive visualization environment.

Watch DocuScope 6.0 Video Tutorials



DiaGrammar is an online learning environment for practicing sentence diagramming. The technology is based on form-function sentence diagrams popularized in Paul Hopper’s book, A Short Course in Grammar. Following the instructional approach developed by Hopper, DiaGrammar provides an online practice environment with automated feedback, designed to support hybrid (virtual/in-person) grammar courses.

Use Cases

RAND Corporation

The Rand Pardee Graduate School of the RAND Corporation provides DocuScope to its quantitative analysts as a tool to analyze social media and other documents. RAND has published a series of technical reports using DocuScope, showing the declining objectivity of journalism. RAND has also incorporated a prior version of the DocuScope Dictionaries into its in-house product Rand-Lex, which it uses to conduct language analysis for clients. RAND was sufficiently impressed with DocuScope that it invested heavily in machine learning methods to produce DocuScope versions in Arabic, Russian, and Chinese.

Shakespeare Studies

Shakespeare scholars have used DocuScope to find that the bard’s history plays are organized around a single narrator, while his comedies and tragedies are organized around lighthearted and darker-spirited character dialogue, respectively. After writing his famous comedies and tragedies, Shakespeare composed his late “tragicomedies” or “problem plays,” a unique hybrid genre. Shakespeareans have long debated whether these plays are actually comedies, tragedies, or truly a novel combination. Using DocuScope, a team of leading Shakespeare scholars confirmed that these late plays are indeed genre-bending mixtures, expertly blended by an artist at the height of his craft.

You can read more about this project in the Early Modern Literary Studies journal article "The Very Large Textual Object: A Prosthetic Reading of Shakespeare." See also the article in Forbes Magazine and the “Digital History” blog post.

Educational Testing Service

ETS faces much criticism that its timed writing tests have no ecological validity with authentic writing tasks. A group of ETS scholars used DocuScope and compared hundreds of GRE “arguments” that test-takers wrote under timed conditions (45 minutes) and scores of “arguments” that graduate students had three weeks to draft and redraft with feedback. Researchers found that the “core” patterns that constitute argument did not differ between the two populations of writers. ETS recently won a general patent for testing validity of its timed writing tests using this method.

Alexander Hamilton and James Madison

Historians have long debated which of the 88 Federalist Papers were authored by Alexander Hamilton vs. James Madison. All prior studies assumed that for any paper, either Hamilton or Madison was the author, but not both. Using DocuScope, a research team discovered that some of the disputed papers were clearly co-authored, bearing the stamp of both writers. A look into the historical record confirmed that Hamilton and Madison did physically convene to plan a new block of papers. All of the texts that DocuScope found to be co-authored were the first paper in each block, very likely planned and at least partially drafted during Hamilton and Madison’s time together.


Clinton vs. Trump, 2016

While Donald Trump polled poorly in the run-up to the 2016 presidential election, it is also true that Hillary Clinton’s public perception was similarly negative. Clinton was perceived to be aloof, guarded, scripted, and inauthentic. To investigate the roots of this negative image, a team used DocuScope to analyze Clinton’s two memoirs, one a personal memoir after her first lady years (2003) and the second a policy memoir to describe her time as President Obama’s Secretary of State (2014). A DocuScope analysis revealed, as predicted, that the personal memoir relied much more on the language of disclosure than the policy memoir. But an analysis of where Clinton disclosed herself was revealing. Her most disclosive passages and chapters appeared when she talked about the trials of her parents and grandparents. She is less disclosive and more guarded when talking about herself, her marriage, and her husband’s infidelity. This rhetorical choice may have contributed to perceptions of Clinton as a personally remote candidate, especially when juxtaposed with Donald Trump’s style of brash, unedited speech.

Media Representation of Arab Women in Arab News

Western social science has documented a long tradition of western media representing Arab women as passive and voiceless, yet there has been little research studying how Arab women are represented in Arab media. In research funded by the Qatar National Research Fund using DocuScope, researchers found that at least some liberal Arab media outlets based in London do represent Arab women in more complex ways. The study was published in a book titled Arab Women in Arab News: Old Stereotypes and New Media (Bloomsbury) in 2012.


This project has been partially funded by:

  • A.W. Mellon Foundation
  • Macaulay Family Foundation
  • Simon Initiative Seed Grant, Carnegie Mellon University
  • Berkman Faculty Development Fund, Carnegie Mellon University


  • Eberly Center, Teaching Excellence & Educational Innovation, Carnegie Mellon University
  • Open Learning Initiative, Carnegie Mellon University
  • Howard Seltman, Department of Statistics & Data Science


