Carnegie Mellon University

STAMPS@CMU

STAtistical Methods for the Physical Sciences Research Center

Spring 2021

David Gagne headshotJanuary 22: David John Gagne (National Center for Atmospheric Research)

Title: Machine Learning Emulation across the Earth System
[Gagne Talk Recording] [Gagne Talk Slides]

Abstract: Earth system processes can be explicitly modeled to a high degree of complexity and realism. The most complex models also are the most computationally expensive, so in practice they are not used within large weather and climate simulations. Machine learning emulation of these complex models promises to approximate the complex model output at a small fraction of the original computational cost. If the performance is satisfactory, then the computational budget could be steered toward other priorities. The NCAR Analytics and Integrative Machine Learning group is currently working on machine learning emulation problems for microphysics, atmospheric chemistry, and processing holographic observations of rain drops. We will discuss our successes as well as challenges in ensuring robust online performance and incorporating emulators within existing simulations.



Bio: David John Gagne is a Machine Learning Scientist and head of the Analytics and Integrative Machine Learning group at the National Center for Atmospheric Research (NCAR) in Boulder, Colorado. His research focuses on developing machine learning systems to improve the prediction and understanding of high impact weather and to enhance weather and climate models. He received his Ph.D. in meteorology from the University of Oklahoma in 2016 and completed an Advanced Study Program postdoctoral fellowship at NCAR in 2018.

He has collaborated with interdisciplinary teams to produce machine learning systems for hail, tornadoes, hurricanes, and renewable energy. In order to educate atmospheric science students and scientists about machine learning, he has led a series of interactive short courses and hackathons.


Robert Cousins headshotFebruary 12: Robert Cousins (Department of Physics and Astronomy, UCLA)

Title: Testing a sharp null hypothesis versus a continuous alternative: Deep issues regarding this everyday problem in high energy physics
[Cousins Talk Recording] [Cousins Talk Slides]

Abstract: In high energy physics, it is extremely common to test a well-specified null hypothesis (such as the Standard Model of elementary particle physics) that is nested within an alternative hypothesis with unspecified value(s) of parameter(s) of interest (such as the Standard Model plus a new force of nature with unknown strength). As widely discussed in the context of the Jeffreys-Lindley paradox, two experiments with the same p-value for testing the null hypothesis can have differing results for the Bayesian probability that the null hypothesis is true (and for the Bayes factor), since the latter depends on both the sample size and the width of the prior probability density in the parameters(s) of the sought-for discovery. After a reminder of relevant methods for hypothesis testing and the paradox, I will note that the issues are particularly apparent when there are three well-separated independent scales for the parameter of interest, namely (in increasing order) the small (or negligible) width of the null hypothesis, the width of the measurement resolution, and the width of the prior probability density. After giving examples with this hierarchy, I will quote various statements in the statistics literature and discuss their relevance (or not) to usual practice in high energy physics. Much of the talk will draw on arXiv:1310.3791.

Bio: Robert (Bob) Cousins is Distinguished Professor Emeritus in the Department of Physics and Astronomy at UCLA, where he was on the faculty from 1981 through 2020. He completed his A.B from Princeton in 1976, obtained his Stanford Ph.D. under Mel Schwartz while collaborating on a kaon experiment at Fermilab, and then had a position at CERN during 1981 before joining UCLA. Throughout his career, he has worked on experiments measuring or searching for rare processes, at Brookhaven National Lab with kaons, at CERN with neutrinos, and since 2000 on the CMS Experiment at CERN's Large Hadron Collider. This has motivated his career-long interest in statistical data analysis.

Cousins has held various high-level leadership positions in his collaborations, and served on a number of ad hoc and standing advisory and review committees for laboratories and funding agencies. Recent such service included the Particle Physics Project Prioritization Panel (P5) in the U.S. (2013-2014), and CERN’s Scientific Policy Committee (2018-2023).


Raphael Huser headshotMarch 12: Raphaël Huser (Extreme Statistics Research Group, King Abdullah University of Science and Technology)

Title: High-resolution Modeling and Estimation of Extreme Red Sea Surface Temperature Hotspots
[Huser Talk Recording] [Huser Talk Slides]

Abstract: Modeling, estimation and prediction of spatial extremes is key for risk assessment in a wide range of geo-environmental, geo-physical, and climate science applications. In this talk, we will first introduce state-of-the-art models based on extreme-value theory, and discuss their statistical and computational limitations. We will then discuss an alternative flexible approach for modeling and estimating extreme sea surface temperature (SST) hotspots, i.e., high threshold exceedance regions, for the whole Red Sea, a vital region of high biodiversity. In a nutshell, our proposed model is a semiparametric Bayesian spatial mixed-effects linear model with a flexible mean structure to capture spatially-varying trend and seasonality, while the residual spatial variability is modeled through a Dirichlet process mixture of low-rank spatial Student-t processes to efficiently handle high dimensional data with strong tail dependence. With our model, the bulk of the SST residuals influence tail inference and hotspot estimation only moderately, while our approach can automatically identify spatial extreme events without any arbitrary threshold selection. Posterior inference can be drawn efficiently through Gibbs sampling. Moreover, we will show how hotspots can be estimated from the fitted model, and how to make high-resolution projections until the year 2100, based on the Representative Concentration Pathways 4.5 and 8.5. Our results show that the estimated 95% credible region for joint high threshold exceedances includes large areas covering major endangered coral reefs in the southern Red Sea.

Bio: Raphaël Huser is an Assistant Professor of Statistics at the King Abdullah University of Science and Technology (KAUST), where he leads the Extreme Statistics (extSTAT) research group. He obtained his PhD degree in Statistics in 2013 from the Swiss Federal Institute of Technology (EPFL), Lausanne, Switzerland, and he also holds a BS degree in Mathematics and an MS degree in Applied Mathematics from the same institution. His research mainly focuses on the development of novel statistical methodology for the modeling, prediction and assessment of risk related to spatio-temporal extremes arising in a wide range of geo-environmental applications, although he also has interests in other application areas.


Patrick Heimbach headshotApril 9: Patrick Heimbach (Oden Institute for Computational Engineering and Sciences, University of Texas at Austin)

Title: Augmenting a sea of data with dynamics: the global ocean parameter and state estimation problem
[Heimbach Talk Recording] [Heimbach Talk Slides]

Abstract: Because of the formidable challenge of observing the full-depth global ocean circulation in its spatial detail and the many time scales of oceanic motions, numerical simulations play an essential role in quantifying patterns of climate variability and change. For the same reason, predictive capabilities are confounded by the high-dimensional space of uncertain inputs required to perform such simulations (initial conditions, model parameters and external forcings). Inverse methods optimally extract and blend information from observations and models. Parameter and state estimation, in particular, enables rigorously calibrated and initialized predictive models to optimally learn from sparse, heterogeneous data while satisfying fundamental equations of motion. A key enabling computational approach is the use of derivative information (adjoints and Hessians) for solving nonlinear least-squares optimization problems. Emerging capabilities are the uncertainty propagation from the observations through the model to key oceanic metrics such as equator-to-pole oceanic mass and heat transport. A related use of the adjoint method is the use of the time-evolving dual state as sensitivity kernel for dynamical attribution studies. I will give examples of the power of (i) property-conserving data assimilation for reconstruction, (ii) adjoint-based dynamical attribution, and (iii) the use of Hessian information for uncertainty quantification and observing system design.

Bio: Patrick Heimbach is a computational oceanographer at the University of Texas at Austin, with joint appointments in the Jackson School of Geosciences, the Institute for Geophysics, and the Oden Institute for Computational Engineering and Sciences. His research focuses on ocean and ice dynamics and their role in the global climate system. He specializes in the use of inverse methods applied to ocean and ice model parameter and state estimation, uncertainty quantification and observing system design.

Patrick earned his Ph.D. in 1998 from the Max-Planck-Institute for Meteorology and the University of Hamburg, Germany. Among his professional activities, Patrick serves on the National Academy of Sciences’ Ocean Studies Board, the CLIVAR/CliC Northern Ocean Regional Panel, and the US CLIVAR Ocean Uncertainty Quantification working group.


Daniela Huppenkothen headshotMay 7: Daniela Huppenkothen (SRON Netherlands Institute for Space Research)

Title: Unravelling the Physics of Black Holes Using Astronomical Time Series
[Huppenkothen Talk Recording] [Huppenkothen Talk Slides]

Abstract: Black holes are at the heart of many open questions in astrophysics. They are prime laboratories to study the effects of strong gravity, and are thought to play a significant role in the evolution of the universe. Much of our knowledge of these sources comes from studies of black holes in X-ray binaries, where a black hole exists in a binary system with a star, and is observed through the radiation emitted by stellar material as it falls into the black hole. Of particular interest are their time series, measurements of their brightness as a function of time. Connecting properties of these (often stochastic) time series to physical models of how matter falls into black holes enables probes of fundamental physics, but requires sophisticated statistical methods commonly grouped under the term “spectral timing”. In addition, data analysis is often complicated by systematic biases introduced by the detectors used to gather the data.

In this talk, I will introduce black holes as important astrophysical sources and give an overview of the types of data we observe from them with X-ray telescopes. I will give an overview of spectral timing as an approach to characterizing the information of the physical system contained in these data sets, and present both the state-of-the-art and future directions of time series analysis for black holes. I will also present recent work on mitigating systematic biases in X-ray detectors using simulation-based inference and deep neural networks.

Bio: Daniela Huppenkothen is a staff scientist at the SRON Netherlands Institute for Space Research. Previously, she was Associate Director of the Center for Data-Intensive Research in Astrophysics and Cosmology (DIRAC) at the University of Washington. Before that, she spent time at New York University as a Moore-Sloan Data Science Postdoctoral Fellow, after receiving her PhD at the University of Amsterdam in Astronomy in 2014.

Daniela is interested in leveraging new statistical and computational methods to improve inference within astronomy and space science. Her current research focuses mostly on time series analysis across all parts of astronomy, including asteroids, neutron stars and black holes. She is interested in how we can use machine learning and statistics to mitigate biases introduced into our data by detectors and telescopes. She is lead developer of the open-source software project Stingray, which implements a collection of commonly used time series methods in astronomy. She is interested in finding new ways to teach data science to astronomers (often with candy), and she develops new strategies for facilitating interdisciplinary collaborations in her role as co-organizer of Astro Hack Week.