Carnegie Mellon University

Data Literacy 

Gartner defines data literacy as "the ability to read, write, and communicate data in context." Understanding your data sources, analytical methods, and applied techniques is important to tell an effective data story. Use this resource to guide your data research project. 

Find

The first step in any project is identifying your research questions. The second is to determine what information you can use to answer those questions. While you may have access to this information, that data may not be available for public use. Follow these tips to help identify data that can be freely used to support your research.

Finding Open Data Sources

To identify possible data sources for a research project, visit the Carnegie Mellon University (CMU) Libraries Finding Data Guide. The US Census site is useful for government data or Western Pennsylvania Regional Data Center for institutional. These sources help answer demographic questions and are generally open to the public. There are also private sources. These can include data collected by or to be used by private companies such as Twitter or Zillow. Always check permissions before using data.

 

Finding CMU Data Sources

A convenient way to request institutional data for research purposes, integrations, and administrative are available through our request process.

Protecting Data

There are several ways to license information to protect the data you collect or create and use the information generated or maintained by others. Examples that allow for open use are the Creative Commons Licenses. Citing the license of your data is an important step in ensuring fair use.

Collect

How data is collected can impact data analysis and findings. As a result, it is imperative to ensure that data collection is done ethically, consistently, and without bias. The following section describes data collection, the difference between data and information, quantitative and qualitative data, and ethical and consistent data collection.

Data Collection

Data collection is gathering facts to answer research or business questions. Data collection can be done in the form of observational studies, survey responses, and through the use of other primary sources.

 

Data vs. Information

Data are facts, while information is the data within a specific context. Information is built from data and cannot exist without it; however, data does not need to be connected to information to be valuable. Data contains raw observations, values, and facts. Information is what provides insights into the analyzed data.

Ethical Data Collection

According to Cote, “data ethics encompasses the moral obligations of gathering, protecting, and using personally identifiable information and how it affects individuals” (Cote, 2021). Data collection must include consent, transparency, accountability, anonymity, and removing bias to be ethical. For more information, visit Ethics of Data Collection.

Consistent Data Collection

Collecting data consistently to ensure data quality and clarity in datasets is important. Data consistency is also vital to ensure accurate data analysis. Visit Consistent Data Collection and Recording for additional guidance.

Institutional Review Board (IRB) and Data Stewards

To ensure that the data is collected and used ethically, the Institutional Review Board (IRB) should be consulted for research projects involving human subject research. The appropriate data stewards should be consulted for projects utilizing institutional administrative data.

Qualitative vs. Quantitative Data

Quantitative data is facts with numbers, while qualitative data is descriptive facts. Additionally, data collection is different for quantitative and qualitative data. Visit Quantitative vs. Qualitative Research to understand the difference.

Manage

It is always important to keep track of your information, though it is not always clear how to do so. While institutions, data repositories, or other data users might have specific needs for how data needs to be kept and described, there are also general best practices for managing your data.

Keeping Track of Data

When working on a project involving data, it is recommended that researchers use a Data Management Plan (DMP). This plan involves keeping a record of the data you use and how you have worked with or stored your data. You can learn more about managing your data through CMU Libraries Data Management Guide.

Using Metadata

Metadata is the pieces of information you may need to collect to ensure future users of your project data will be able to quickly understand what information is contained within it (e.g., variable types) and the data structure (e.g., file size or naming conventions).

Clean and Verify

Incorrect, duplicated, missing, or corrupted data can profoundly impact data analysis and subsequent findings. It is important to cleanse the data before analysis. The section below provides more information on data cleaning and its importance when working with data.

Clean Data

Clean data is important because it ensures reliability and data quality. It also helps to enable clearer use of data to answer particular research questions. Similarly, it will help other researchers re-use or reproduce results for projects involving the data.

Term Consistency

Often, the same term is used with different meanings. It’s important to understand the context and the definition of the term, or bias or quality issues may be introduced into the results. An institutional data glossary is available for those using data definitions for CMU institutional data.

Missing Data

Missing data can skew analysis or introduce bias to the results. Moreover, analysis results may be invalid if data is missing. The article How to Deal with Missing Data offers helpful tips.

Software

There are many software options available to support data cleaning. Some of the most common software solutions include Open Refine and Tableau Prep. Contact one of our data-focused librarians to learn more about these software titles or other tools to clean data.

Data Patterns

Patterns can be found in both quantitative and qualitative data. These videos offer help with patterns.

How to Identify Patterns and Trends in DataPatterns, Themes, and Arguments in Qualitative Data Analysis

Variables

Independent variables are controlled or used to change dependent variables in a study. A dependent variable changes or is manipulated by an independent variable in an experiment. USC Libraries offers an Independent and Dependent Variables guide to help identify variables.

Avoid Bias

Bias can be introduced in any study area, including data collection and analysis. Visit 9 Types of Research Bias and How to Avoid Them to learn more.

Do No Harm Guide
Tableau Do No Harm Guide video 

Analyze

Data can be analyzed in many ways, depending on the data collected. During data analysis, it is important that the results are accurate and that the findings are reproducible.

Testing Data

Qualitative and qualitative analysis utilize different types of analyses based on the data type. This short video on Choosing Which Statistical Test to Use can help you decide. This Qualitative Data Analysis video provides a tutorial on six different methods.

Ensuring Accurate Results

Data accuracy is imperative to ensure no bias in the results. Additionally, when data is accurate, studies can be reproduced and replicated. This short video from Statistics Canada offers steps to ensure Data Accuracy and Validation.

Reproducibility and Replicability

Reproducibility refers to using the same data and analysis with the same results. Replicability refers to the ability to repeat the study, resulting in the same findings. Visit Reproducibility and Replicability from Frontiers to understand the difference.

Avoiding Conclusions

Conclusions should be based on the analysis results, which can be verified. It is also important to avoid confirmation bias. Visit Confirmation Bias and the Power of Disconfirming Evidence for more information.

Visualize

Creating stories around information can include techniques to build research narratives, identify trends, and communicate about data. This can also extend to visualizing data. Data visualization is a helpful tool to explain large amounts of data in digestible forms quickly.

Data Visualization and Stories

The data and any visualizations should be able to tell a story to the audience. This story helps the audience understand the results of the research conducted. The Harvard Business Review offers tips on How to Tell a Story with Data.

Design Guidelines

Follow the Data Visualization Design Guidelines when working with data visualization at CMU.

Visualization Tools

There are many tools you can use to visualize information. While CMU does not recommend one specific software or tool, the CMU Libraries provide support for Tableau, Excel, Power BI, Python, and R. The Libraries offer Data Office Hours to the campus community. Make an appointment today to meet one of our data specialists. If you are a faculty or staff member, you can also find support through the Data Analytics Community of Practice.

Need Additional Help?

Data for university business needs

Contact the Data Governance Office or reference the Data @ CMU site.

Research or other projects

Contact the CMU Libraries Data Services team or register for a data consultation.