Carnegie Mellon University
May 22, 2025

Faculty Spotlight: Weijing Tang

By Stefanie Johndrow

Weijing Tang is an assistant professor in the Department of Statistics & Data Science whose research focuses on developing statistical methodology and theory for analyzing massive and complex data from interdisciplinary research.

Tell me about your scholarly work.

My recent research focuses on developing latent variable models and theory to understand data with complex dependency structures. By embedding observed data into an unobserved latent space, we gain the flexibility to provide interpretable insights into observed dependencies and so as to uncover hidden patterns that can guide further analysis. Along this line, I have been working on applications involving network data and electronic health records.

In network analysis, data points are not isolated; instead, they are connected by edges representing relationships, such as friendships in social networks. The formation of these relationships between nodes is highly dependent. For example, two nodes sharing many common neighbors are more likely to be connected. My research develops new statistical methods and theory to analyze real-world complex and heterogeneous network data to understand their formation mechanism.

In electronic health records (EHRs), many EHR features are highly granular and redundant due to the complicated clinical coding systems. Latent variable models naturally embed features with similar semantic meaning into nearby locations in the latent space, which leads to interpretable knowledge discovery from the massive EHRs.

How is your scholarly work adding to the greater field? 

The statistical tools I develop for understanding the formation mechanisms behind complex data structures are essential for interpretable and effective data analysis. For example, while EHRs provide rich longitudinal patient information and complement clinical trial data for clinical research, EHR data are often messy. Differences in coding languages across health systems, version inconsistencies over time and feature redundancy bring challenges for downstream analysis. Mapping these EHR features into a universal low-dimensional latent space through latent variable models creates a more efficient foundation for data analysis and offers a transferable framework for integrative knowledge discovery. Similarly, for network data, our methods uncover interpretable structures within heterogeneous complex systems, which provide insights into social, health and behavioral dynamics.

How did you become interested in this topic?

I studied mathematics as an undergraduate, and during my Ph.D., I had the opportunity to collaborate closely with clinicians. It was a very rewarding experience: sitting down with domain experts, understanding the questions that matter to them, translating those into mathematical problems, and then bringing solutions back to the table. During the COVID-19 pandemic, this understanding became even more personal. Together with colleagues, I participated in a data challenge organized by the American Heart Association to assess the effectiveness of public health policies. That experience reinforced how powerful and necessary it is to use data and statistical tools to answer urgent questions in health and society.

What are you most excited to accomplish as a faculty member at CMU?

Looking ahead, I want to build an interdisciplinary research program that advances statistical methodology for uncovering interpretable patterns in complex data and for addressing important questions in the health and social sciences. CMU’s strong tradition of fostering interdisciplinary collaboration provides an ideal environment for this goal. I look forward to working with faculty across campus to tackle challenges at the intersection of human behavior, health outcomes and AI. So far, I have established collaborations with faculty members in psychology and computer science. I am eager to continue expanding these collaborations and to contribute to CMU’s vibrant interdisciplinary community.

What are your goals for the next generation of scholars?

I want to help students build strong technical skills and cultivate the ability to think critically about how methods can be adapted and applied across diverse domains. My goal is to support them in becoming not only excellent statisticians but also thoughtful, collaborative researchers, prepared to tackle complex challenges and make meaningful contributions across a variety of fields.