Research Areas
The department is home to cutting-edge research programs that tackle a wide range of problems.
Our faculty and students are advancing Statistics and related fields through a combination of theoretical, methodological, and applied work. We prioritize interdisciplinary collaborations, working closely with scientists and researchers across diverse disciplines.
Department members work "in the trenches" in fields such as genetics, neuroscience, cosmology, demography, epidemiology, finance, and forensics. This hands-on approach allows us to address significant scientific questions while simultaneously developing new statistical theories and methods.
By integrating our expertise with real-world challenges, we drive progress in both statistical science and the fields we support.
AI-SDM
The AI Institute for Societal Decision Making (AI-SDM) combines AI and social sciences to develop human-centric AI for complex societal decisions in areas like public health and disaster management. They focus on creating ethical AI tools, training an interdisciplinary workforce, and raising awareness about AI's societal benefits.
Biostatistics & Epidemiology
Biostatistics and epidemiology is a research area that focuses on the application of statistical methods to understand public health issues and disease patterns within populations. By combining statistical analysis with biological and health-related data, researchers aim to identify risk factors, evaluate interventions, and inform health policy decisions.
Causal Inference
The causal inference research area develops statistical methodology and theory to accurately model cause-effect relationships from complex observational and experimental data in order to inform effective decision-making across healthcare, technology, and policy domains. The Causal Inference Working Group at CMU started in 2016. We meet weekly to discuss our own research or interesting papers, both new and old; members come from communities in Statistics & Data Science, Machine Learning, Information Systems & Public Policy, Philosophy, Epidemiology, and beyond.
Computational Finance
Computational finance leverages vast amounts of financial data and advanced data analysis techniques to model market behaviors and inform investment strategies. By utilizing big data, machine learning, and statistical methods, it aims to uncover patterns and insights that drive decision-making in finance.
Computational Neuroscience
Researchers throughout the world who investigate neural networks in the brain are trying to answer detailed questions using data sets that are large but noisy, creating new challenges for statistics and machine learning. This group contributes by focusing especially on methods for reliably identifying coordinated neural activity across multiple brain areas.
CSAFE
The Center for Statistics and Applications in Forensic Evidence (CSAFE) is a nationally recognized, NIST-funded center comprising multiple universities and over 60 researchers, working to build a statistically sound foundation for interpreting forensic evidence. CSAFE focuses on developing new methods for analyzing pattern and digital evidence, while also providing education and training to forensic practitioners and stakeholders.
DELPHI
The DELPHI group aims to advance epidemiological forecasting to become as widely accepted and useful as weather forecasting, focusing on developing forecasting methods for high-value targets like Influenza and Dengue. Their work includes creating baseline forecasting methods, establishing accuracy metrics, estimating forecastability limits, and identifying new data sources to improve predictions for public health decision-making.
Foundations of Inference
Inference is the process of drawing logical conclusions from known or assumed premises, studied in the field of logic and applied in various disciplines. There are different types of inference, including human inference studied in cognitive psychology, artificial inference developed in AI, and statistical inference which uses mathematics to draw conclusions in the presence of uncertainty.
Genomics & Genetics
The genomics & genetics research team develops statistical and computational models for multi-omic data to uncover complex associations between biological entities and understand the genetic basis of neuropsychiatric disorders. Their focus is on characterizing cellular functions, particularly in the brain, using single-cell omics studies, with the goal of translating findings into clinical applications.
Graphical Models & Networks
Statistical network analysis provides statistical tools for studying, understanding, and drawing inferences from network data. Our group's research interests span a range of contemporary topics such as graphons, irregularly sampled networks, temporally evolving networks, network embeddings, and various applications.
High-Dimensional Statistics
This group focuses on data whose dimension is larger than dimensions considered in classical multivariate analysis. High-dimensional statistics relies on the theory of random vectors. In many applications, the dimension of the data vectors may be larger than the sample size.
Natural Language Processing & Large Language Models
Natural language processing (NLP) extracts information from large amounts of unstructured texts, making it possible to build statistical models using text data. Faculty leverage NLP techniques in interdisciplinary problems such as understanding the discourse of war in social media posts to comparing styles of writing in student reports. Additionally, faculty are actively engaged in research with large language models (LLMs), both as tools for understanding text and to study the text generated by LLMs.
Nonparametric Methods
Non-parametric (or distribution-free) inferential statistical methods are mathematical procedures for statistical hypothesis testing which, unlike parametric statistics, make no assumptions about the probability distributions of the variables being assessed. These methods are particularly useful for analyzing complex datasets where traditional assumptions may not hold, allowing for greater flexibility and robustness in estimation and inference.
Optimal Transport
Optimal transport is the problem of mapping one distribution to another with minimal cost. It is used in many problems such as domain transfer, image processing, modeling cell trajectories and reconstructing the early universe. Our group focuses on the problem of estimating the transport map from data. We also look at different forms of transport that diverge from the classical approach.
Optimization & Algorithms
This research area advances computational methods and theories like algorithm design, combinatorial optimization, and convex optimization to efficiently analyze complex datasets and scale solutions for high-dimensional problems in statistics and machine learning.
Public Policy & Social Sciences
From questionnaire development to the selection of probability samples to the design of social experiments, faculty and students regularly work with others to develop new methods for analyzing these data and they apply up-to-date methods for drawing inferences from diverse social science data sources ranging from large scale sample surveys to social networks, to educational experiments. A number of statistics graduate students work directly in joint programs bringing statistics to bear on problems in education and public policy.
Sports Analytics/CMSAC
Carnegie Mellon Sports Analytics Center (CMSAC), a national sports analytics hub, aims to advance the field through cutting-edge research, innovative educational experiences, and community outreach events. Their mission encompasses pushing the boundaries of sports analytics, creating educational opportunities like sponsored projects and fellowships, and hosting accessible events such as the annual CMSAConference.
STAMPS
The Statistical Methods for the Physical Sciences (STAMPS) center provides foundational methodology in statistics, data science, machine learning and artificial intelligence for two distinct branches of physical science: Astronomy and Particle Physics, and Climate and Environmental Science, which include applications in e.g. Oceanography, Meteorology, and Remote Sensing.
Statistical Machine Learning
The statistical machine learning research area advances the theoretical foundations of complex machine learning models using statistical learning, nonparametric Bayes, causal inference, and reinforcement learning principles to enable robust, trustworthy predictions.
Statistical Pedagogy & Educational Research
This group focuses on modernizing statistics education through research, curriculum development, and innovative teaching practices. Their work includes designing effective assessment methods, studying statistical writing skills, and developing interactive learning tools to enhance student engagement and understanding of statistics.
Time Series Forecasting
Time series forecasting is a research area focused on predicting future values based on previously observed data points collected over time, leveraging statistical and machine learning techniques. It plays a crucial role in various fields, including finance, economics, weather forecasting, and supply chain management, by enabling organizations to make informed decisions based on trends and patterns.