Spring/Summer 2022

January 21: Amanda Lenzi (Argonne National Laboratory)

Title: Can Neural Networks be used for Parameter Estimation?
[Lenzi Talk Recording] [Lenzi Talk Slides]

Abstract: Neural networks have proved successful in various applications in approximating nonlinear maps based on training datasets. Can they also be used to estimate parameters in statistical models when the standard likelihood estimation or Bayesian methods are not (computationally) feasible? In this talk, I will discuss this topic towards the aim of estimating parameters from a model for multivariate extremes, where inference is exceptionally challenging, but simulation from the model is easy and fast. I will demonstrate that in this example, neural networks can provide a competitive alternative to current approaches, with considerable improvements in accuracy and computational time. A key ingredient for this result is to actively use our statistical knowledge about parameters and data to make the problem more palatable for the neural network.

Bio: Amanda Lenzi is Postdoctoral Appointee at Argonne National Laboratory. She was a Postdoctoral Fellow at King Abdullah School of Science and Technology (KAUST) before coming to Argonne. She obtained her PhD degree in Statistics from the Technical University of Denmark in 2017 and her BS and MS degrees at the University of Campinas, São Paulo, Brazil. Her main research interests concern statistical modeling, prediction, simulation, and uncertainty quantification of spatiotemporal data from applications relating to energy as well as environmental science. She is also interested in computational methods for large datasets and the use of machine learning to improve the modeling of these complex spatiotemporal processes.

February 18: Nicholas Wardle (Department of Physics, Imperial College London)

Title: The Discrete Profiling Method: Handling Uncertainties in Background Shapes
[Wardle Talk Recording] [Wardle Talk Slides]

Abstract: Model selection is a huge topic in statistics and often in HEP experiments, we don’t know the exact model appropriate for a particular process. Typically HEP experiments will rely on using data to directly constrain or choose which (parametric) models are best suited to extract the underlying physics, however this choice naturally represents a systematic uncertainty in the analysis of the data. While there are several methods to incorporate these uncertainties related to choices of continuous parameter values, the uncertainty associated to the choice of discrete model is less clear. In this presentation, Nicholas will describe a method developed in the context of the search for the Higgs boson at CMS that aims to incorporate the uncertainty related to model selection into statistical analysis of data “the discrete profiling method”. Nicholas will discuss various studies on the bias and coverage properties of the method and open extensions where further work is needed.

Bio: Nicholas did his Ph.D. at Imperial College where he started working in early W/Z cross-section measurements with electrons at CMS, and then moved onto searching for the Higgs boson in the diphoton decay channel, and the discovery in that channel formed his thesis in 2013. After that he held a fellowship at CERN where he spent most of his time on searches for dark matter and H->invisible decays. He moved back to London in 2017 as an STFC fellow at Imperial College and now as a lecturer where he mainly focuses on Higgs combinations and interpretations of precision Higgs boson measurements in the search for physics beyond the SM, and teaches postgraduate courses on statistics and machine learning for physicists.

March 18: Derek Bingham (Department of Statistics and Actuarial Science, Simon Fraser University)

Title: Computer Model Emulation and Uncertainty Quantification Using a Deep Gaussian Process
[Bingham Talk Recording] [Bingham Talk Slides]

Abstract: Computer models are often used to explore physical systems. Increasingly, there are cases where the model is fast, the code is not readily accessible to scientists, but a large suite of model evaluations is available. In these cases, an “emulator” is used to stand in for the computer model. This work was motivated by a simulator for the chirp mass of binary black hole mergers where no output is observed for large portions of the input space and more than 10^6 simulator evaluations are available. This poses two problems: (i) the need to address the discontinuity when observing no chirp mass; and (ii) performing statistical inference with a large number of simulator evaluations. The traditional approach for emulation is to use a stationary Gaussian process (GP) because it provides a foundation for uncertainty quantification for deterministic systems. We explore the impact of the choices when setting up the deep GP on posterior inference and apply the proposed approach to the real application.

Bio: Derek is a Professor of Statistics and Actuarial Science at Simon Fraser University. He completed his PhD in Statistics in 1999 with Randy Sitter at SFU on the design and analysis of fractional factorial split-plot experiments. After graduating, he moved to the Department of Statistics at the University of Michigan as an Assistant Professor. In 2003, he joined the Department of Statistics and Actuarial Science at Simon Fraser as the Canada Research Chair in Industrial Statistics.

The focus of his current research is developing statistical methods for combining physical observations with large-scale computer simulators. This includes new methodology for Bayesian computer model calibration, emulation, uncertainty quantification and experimental design. His work is generally motivated by real-world applications. His recent collaborations have been with scientists at USA national laboratories (Argonne National Lab and Los Alamos National Lab) and also USA Department of Energy sponsored projects (Center for Radiative Shock Hydrodynamics; Center for Exascale Radiation Transport).

April 22: Jakob Runge (Institute of Data Science, German Aerospace Center)

Title: Causal Inference and discovery with perspectives in Earth sciences
[Runge Talk Recording] [Runge Talk Slides]

Abstract: The heart of the scientific enterprise is a rational effort to understand the causes behind the phenomena we observe. In disciplines dealing with complex dynamical systems, such as the Earth system, replicated real experiments are rarely feasible. However, a rapidly increasing amount of observational and simulated data opens up the use of novel data-driven causal inference methods beyond the commonly adopted correlation techniques. Causal inference provides the theory and methods to learn and utilize qualitative knowledge about causal relations that is often available in Earth sciences. In this talk I will present an overview of this exciting and widely applicable framework and illustrate it with some examples from Earth sciences. I will also present recent work on statistically optimal estimators of causal effects.

Bio: Jakob Runge heads the Causal Inference group at the German Aerospace Center’s Institute of Data Science in Jena since 2017 and is guest professor of computer science at TU Berlin since 2021. His group combines innovative data science methods from different fields (graphical models, causal inference, nonlinear dynamics, deep learning) and closely works with experts in the climate sciences and beyond. Jakob studied physics at Humboldt University Berlin and finished his Ph.D. project at the Potsdam Institute for Climate Impact Research in 2014. For his studies he was funded by the German National Foundation (Studienstiftung) and his thesis was awarded the Carl-Ramsauer prize by the Berlin Physical Society.

In 2014 he won a $200.000 Fellowship Award in Studying Complex Systems by the James S. McDonnell Foundation and joined the Grantham Institute, Imperial College London, from 2016 to 2017. In 2020 he won an ERC Starting Grant with his interdisciplinary project CausalEarth.

On https://github.com/jakobrunge/tigramite.git he provides Tigramite, a time series analysis python module for causal inference. For more details, see: www.climateinformaticslab.com

June 16: ISSI-STAMPS Joint Seminar - Ann Lee (Department of Statistics and Data Science, Carnegie Mellon University)

Title: Likelihood-Free Frequentist Inference: Confidence Sets with Correct Conditional Coverage
[Lee Talk Recording] [Lee Talk Slides]

Abstract: Many areas of science make extensive use of computer simulators that implicitly encode likelihood functions of complex systems. Classical statistical methods are poorly suited for these so-called likelihood-free inference (LFI) settings, outside the asymptotic and low-dimensional regimes. Although new machine learning methods, such as normalizing flows, have revolutionized the sample efficiency and capacity of LFI methods, it remains an open question whether they produce confidence sets with correct conditional coverage. In this talk, I will describe our group’s recent and ongoing research on developing scalable and modular procedures for (i) constructing Neyman confidence sets with finite-sample guarantees of nominal coverage, and for (ii) computing diagnostics that estimate conditional coverage over the entire parameter space. We refer to our framework as likelihood-free frequentist inference (LF2I). Any method that defines a test statistic, like the likelihood ratio, can be adapted to LF2I to create valid confidence sets and diagnostics, without costly Monte Carlo samples at fixed parameter settings. In my talk, I will discuss where we stand with LF2I and challenges that still remain. (Part of these efforts are joint with Niccolo Dalmasso, Rafael Izbicki, Luca Masserano, Tommaso Dorigo, Mikael Kuusela, and David Zhao. The original LF2I framework is described in https://arxiv.org/abs/2107.03920 with a recent version in https://arxiv.org/abs/2205.15680)

Bio: Ann Lee is a a professor in the Department of Statistics & Data Science at Carnegie Mellon University (CMU), with a joint appointment in the Machine Learning Department. Dr. Lee's interests are in developing statistical methodology for complex data and problems in the physical and environmental sciences. She co-directs the Statistical Methods for the Physical Sciences (STAMPS) research group at CMU, and is senior personnel in the NSF AI Planning Institute for Data-Driven Discovery in Physics at CMU.

Prior to joining CMU in 2005, Dr. Lee was the J.W. Gibbs Assistant Professor in the Department of Mathematics at Yale University, and before that she served a year as a visiting research associate at the Department of Applied Mathematics at Brown University. She received her Ph.D. degree in Physics at Brown University, and her BSc/MS degree in Engineering Physics at Chalmers University of Technology in Sweden.

STAtistical Methods for the Physical Sciences Research Center