Carnegie Mellon University

CASOS Center

Center for Computational Analysis of Social and Organizational Systems

CASOS Center

AutoMap


AutoMap is a text mining tool developed by CASOS at Carnegie Mellon and enables the extraction of information from texts using Network Text Analysis methods. AutoMap supports the extraction of several types of data from unstructured documents. The type of information that can be extracted includes: content analytic data (words and frequencies), semantic network data (the network of concepts), meta-network data (the cross classification of concepts into their ontological category such as people, places and things and the connections among these classified concepts), and sentiment data (attitudes, beliefs). Extraction of each type of data assumes the previously listed type of data has been extracted.

AutoMap exists as part of a text mining suite that includes a series of pre-processors for cleaning the raw texts so that they can be processed and a set of post-processor that employ semantic inferencing to improve the coding and deduce missing information. These pre-processors include such sub-tools as a pdf to txt converter, non-printing character removal, and limited types of deduplication. Text pre-processing condenses data into concepts, which capture the features of the texts relevant to the user. Statement formation rules determine how to link extracted concepts into networks. The postprocessors include such procedures that link to gazetteers and augment the coding with latitude and longitude, belief inference procedures, and secondary data cleaning tools. In addition there are a series of support tools for creating, maintaining, and editing delete lists, generalization thesauri, and meta-network thesauri.

AutoMap uses parts of speech tagging and proximity analysis to do computer-assisted Network Text Analysis (NTA). NTA encodes the links among words in a text and constructs a network of the linked words.

AutoMap subsumes classical Content Analysis by analyzing the existence, frequencies, and covariance of terms and themes.

AutoMap has been implemented in Java 1.7.

It can operate in both a front end with gui, and backend mode.

Main functionalities of AutoMap are:

  • Extract, analyze and compare mental models of individuals and groups.
  • Reveal structure of social and organizational systems from texts.

 

AutoMap also offers a variety of techniques for pre-processing Natural Language:

  • Named-Entity Recognition
  • Stemming (Porter, KStem)
  • Collocation (Bigram) Detection
  • Extraction routines for dates, events, parts of speech
  • Deletion
  • Thesaurus development and application
  • Flexible ontology usage
  • Parts of Speech Tagging

 

The employed algorithm for map analysis is based on Carley's approach to coding texts as cognitive maps and Danowski's approach for proximity analysis. AutoMap is designed to work seamlessly with ORA and ORA-LITE.

Hardware Requirements

  • CPU with 500 megahertz or higher processor clock speed recommended
    * Intel Pentium/Celeron family, or AMD K6/Athlon/Duron family, or compatible processor recommended
  • 512 MB of RAM or higher recommended (1 GB preferred)
  • 1 GB of available hard disk space

Notes on Multi-Core Processors

  • AutoMap3 is multi-threaded and can use all the cores in your machine.

Partial support for this tool has been provided by:

  • The National Science Foundation under grants ITR/IM IIS-0081219, NSF 0201706 doctoral dissertation award from NSF Sociology, NSF IGERT 9972762
  • The U.S. Army Research Institute (ARI)
  • The U.S. Army Research Laboratory (ARL)
  • ARL-CTA