Carnegie Mellon University

CASOS Center

Center for Computational Analysis of Social and Organizational Systems

CASOS Center

DyNetML: Interchange Format for Rich Social Network Data

Authors:
Maksim Tsvetovat, Jeff Reminga, Kathleen M. Carley
Carnegie Mellon University, Pittsburgh, PA

To facilitate cooperation between research groups within the field of social network analysis and exchange of social network data, it is essential to provide a data interchange language that (a) provides sufficient expressive power to represent rich datasets, (b) is human-readable and (c) can be used with a large number of programming languages and architectures. In this paper, we present DyNetML: an XML-based data interchange language that combines ability to express complex dynamic SNA datasets with ease of integration within a variety of software platforms.

Faced with a pressing need for tool interoperability within our laboratory, we have developed DyNetML - and XML derivative language that addresses the above requirements.

dynetml.png

Figure 1: Structure of DyNetML

Figure 1 shows the hierarchical structure of the DyNetML files. The <DynamicNetwork> element encapsulates all time periods within a dynamic network. Each time period is represented by a <MetaMatrix> element, which encapsulates network data for a single time period, including multiple matrices and node and properties. Optional ``timePeriod'' attribute identifies the time at which a given metamatrix has been collected. Optional <measures> element encapsulates a set of MetaMatrix-level measures that have been computed on the given time period. <measure name=''sampleMeasure'' type=''double'' value=''1''>
Each measure is specified with a unique name, type (double, string, boolean) and value <nodes> element encapsulates all of the nodesets in a given MetaMatrix. <nodeset id=''nodeset1'' type=''agent''>
A nodeset is a grouping of nodes by type; types include agent, knowledge, resource, task, organization, location. More the one nodeset of the same type can be defined; nodeset ID must be unique. Each <node> within a <nodeset> has to be supplied with a unique ID and can contain an arbitrary number of innate <properties> or computed <measures>. This allows the data collectors to specify arbitrarily complex data about nodes while separating collected data from results of analysis. The <networks> element encapsulates network data stored as graph connection lists. The <graph> nodes are specified with a unique ID and IDs of the source and target nodesets. Each Graph contains a collection of Edge elements whose source and target are nodes previously declared in a Nodeset. This allows the user to specify an arbitrary number of networks involving the same (e.g friendship and advice networks) or different types of actors (e.g. communication and resource distribution networks). <edge source=''node1'' target=''node2'' type=''double'' value=''1''>
Edges are represented by specifying the source and target of the edge. Each edge also has a value and a value type (double, string or boolean). Each graph and edge can also be followed by a set of innate Properties and computed Measures. For more information, please refer to the Document Type Definition (DTD) and a sample dataset in the appendix of this paper.

Support of DyNetML

DyNetML is currently supported through a C and Java libraries that are a part of the CASOS software suite. Since XML parsers exist for practically all platforms and languages, integration of DyNetML into existing tools can be completed in one day or less.

Analysis Toolchains: A Vision of the Future

While the research community has developed a number of very powerful data gathering, analysis and visualization tools, the tools rarely operate well with each other. While file import/export options make it possible to use multiple analysis tools within a single project, a lack of automation and scripting features does not allow for batch-processing of data and report generation, thus vastly increasing labour requirements for analysis of complex datasets. In our vision, the future of social network analysis lies in creating a seamless toolchain, enabling researchers to mix and match data gathering, analysis and visualization tools and to create analysis scripts for batch-mode processing of large datasets or for repeating the same analysis on different datasets. Publishing analysis scripts would allow the research community to more easily reproduce and verify experimental or empirical results. Each of the tools on the toolchain shall:

  • Take the accepted data interchange format (such as DyNetML) as input and produce it as output (with the exception of conversion tools)
  • Analysis tools shall integrate computational results into the dataset, using accepted measure identifiers
  • Each tool that modifies the dataset shall mark its modifications with tool name or ID.
  • Each tool shall provide a command-line interface that allows full access to its features via a scripting language
  • A C-like scripting language shall be developed for integration of tools within the toolchain. Alternatively, existing scripting languages such as Java, Perl or Python can be used.
  • Visual analysis builder tools shall be developed to allow creation of analysis scripts by non-programmers

Conclusion

An integrated toolchain such as the one outlined above can only be created through cooperation of members of the research community through an open-source development process, but the first step is to create a uniform data interchange language. In this paper, we proposed one such language: DyNetML, an XML-derived language for specification of rich social network data. It is important to note that since DyNetML is intended as a service to the social network analysis and simulation community, comments and requests for revisions are welcome at any time. Once the project has considerable community support, we shall establish a revision process that will respond to the requirements of the community while maintaining backward compatibility with existing software.

This work was supported in part by the Department of Defense, the NSF ITR 1040059 and the Office of Naval Research N00014-02-1-0973, and the National Science Foundation under the IGERT program for training and research in CASOS. Additional support was provided by CASOS - the center for Computational Analysis of Social and Organizational Systems at Carnegie Mellon University. The views and conclusions contained in this document are those of the author and should not be interpreted as representing the official policies, either expressed or implied, of the Department of Defense, the Office of Naval Research, the National Science Foundation, or the U.S. government.

Maksim Tsvetovat, Jeff Reminga and Kathleen M. Carley, 2003, "DyNetML: Interchange Format for Rich Social Network Data," NAACSOS conference proceedings, Pittsburgh, PA. [link]