Summer/Fall 2020
July 10: Adam Sykulski (Department of Mathematics and Statistics, Lancaster University)
Title: Stochastic modeling of the ocean using drifters: The Lagrangian perspective
[Sykulski Talk Recording]
Abstract: Drifter deployments continue to be a popular observational method for understanding ocean currents and circulation, with numerous recent regional deployments, as well as the continued growth of the Global Drifter Program. Drifter data, however, is highly heterogenous, prone to measurement error, and captures an array of physical processes that are difficult to disentangle. Moreover, the data is “Lagrangian” in that each drifter moves through space and time, thus posing a unique statistical and physical modelling challenge. In this talk I will start by overviewing some novel techniques for preprocessing and interpolating noisy GPS data using smoothing splines and non-Gaussian error structures. We then examine how the interpolated data can be uniquely visualised and interpreted using time-varying spectral densities. Finally we highlight some parametric stochastic models which separate physical processes such as diffusivity, inertial oscillations and tides from the background flow.
Bio: Adam is a Lecturer in Data Science at Lancaster University in the UK. Adam’s research interests are in time series analysis and spatial statistics, with a focus on spectral techniques using Fourier transforms. Adam’s main application area is in oceanography, but he also studies problems more broadly across geophysical and medical applications.
August 14: Tommaso Dorigo (INFN-Padova)
Title: Frequentist Statistics, the Particle Physicists’ Way: How To Claim Discovery or Rule Out Theories
[Dorigo Talk Recording] [Dorigo Talk Slides]
Abstract: Fundamental research in particle physics progresses by investigating the merits of theories that describe matter and its interactions at the smallest distance scales, as well as by looking for new phenomena in high-energy particle collisions. The large datasets today commonly handled by experiments at facilities such as the CERN Large Hadron Collider, together with the well-defined nature of the questions posed to the data, have fostered the development of an arsenal of specialized Frequentist methods for hypothesis testing and parameter estimation, which strive for severity and calibrated coverage, and which enforce type-I error rates below 3 x 10-7 for discovery claims. In this lecture I will describe the generalities and needs of inference problems at particle physics experiments, and examine the statistical procedures that allow us to rule out or confirm new phenomena.
Bio: Tommaso Dorigo is an experimental particle physicist who works as a First Researcher at the INFN in Italy. He obtained his Ph.D. in Physics in 1999 with a thesis on data analysis for the CDF experiment at the Fermilab Tevatron. After two years as a post-doctoral fellow with Harvard University, when he contributed to the upgrade of the muon system of the CDF-II experiment, he has worked as a researcher for INFN in Padova, Italy.
He collaborates with the CMS experiment at the CERN LHC, where he is a member (formerly chair) of the Statistics Commitee of the experiment. He is the author of several innovative algorithms and machine learning tools for data analysis in particle physics. In 2014-2019 Dorigo has been the founder and scientific coordinator of the ETN “AMVA4NewPhysics” which focused on training Ph.D. students in machine learning applications to physics. His current interests focus on end-to-end optimization of physics experiments and measurements with machine learning. He is also very active in science outreach with a blog, and in 2016 he published the book “Anomaly! Collider Physics and the Quest for New Phenomena at Fermilab”.
September 11: Parker Holzer (Department of Statistics & Data Science, Yale University)
Title: Discovering Exoplanets With Hermite-Gaussian Linear Regression
[Holzer Talk Recording] [Holzer Talk Slides]
Abstract: One growing focus in modern astronomy is the discovery of exoplanets through the radial velocity (or Doppler) method. This method aims to detect an oscillation in the motion of distant stars, indicating the presence of orbiting planetary companions. Since the radial velocity imposed on a star by an planetary companion is small, however, such a signal is often difficult to detect. By assuming the relative radial velocity is small and using Hermite-Gaussian functions, we show that the problem of detecting the signal of exoplanets can be formulated as simple (weighted) linear regression. We also demonstrate the new Hermite-Gaussian Radial Velocity (HGRV) method on recently collected data for the star 51 Pegasi. In this demonstration, as well as in simulation studies, the HGRV approach is found to outperform the traditional cross-correlation function approach.
Bio: I am a current Ph.D. student in the Department of Statistics & Data Science at Yale University. I got my undergraduate at the University of Utah as a double-major in Mathematics and Applied Physics. My research primarily centers on applying statistics to astronomy, with a current focus on exoplanet detection. I am married with a 1-year-old son and another son expected in January.
October 9: Amy Braverman (Jet Propulsion Laboratory, California Institute of Technology)
Title: Post-hoc Uncertainty Quantification for Remote Sensing Observing Systems
[Braverman Talk Recording] [Braverman Talk Slides]
Abstract: The ability of spaceborne remote sensing data to address important Earth and climate science problems rests crucially on how well the underlying geophysical quantities can be inferred from these observations. Remote sensing instruments measure parts of the electromagnetic spectrum and use computational algorithms to infer the unobserved true physical states. However, the accompanying uncertainties, if they are provided at all, are usually incomplete. There are many reasons why including but not limited to unknown physics, computational artifacts and compromises, unknown uncertainties in the inputs, and more.
In this talk I will describe a practical methodology for uncertainty quantification of physical state estimates derived from remote sensing observing systems. The method we propose combines Monte Carlo simulation experiments with statistical modeling to approximate conditional distributions of unknown true states given point estimates produced by imperfect operational algorithms. Our procedure is carried out post-hoc; that is, after the operational processing step because it is not feasible to redesign and rerun operational code. I demonstrate the procedure using four months of data from NASA’s Orbiting Carbon Observatory-2 mission, and compare our results to those obtained by validation against data from the Total Carbon Column Observing Network where it exists.
Bio: Amy Braverman is Principal Statistician at the Jet Propulsion Laboratory, California Institute of Technology. She received her Ph.D. in Statistics from UCLA in 1999. Prior to that she earned an M.A. in Mathematics, also from UCLA (1992), and a B.A. in Economics from Swarthmore College in 1982. From 1983 to 1990, she worked in litigation support consulting for two different firms in Los Angeles. Her research interests include massive data set analysis, spatial and spatio-temporal statistics, data fusion, decision making in complex systems, and uncertainty quantification.
October 23: Collin Politsch (Machine Learning Department, Carnegie Mellon University)
Title: Three-dimensional cosmography of the high redshift Universe using intergalactic absorption
[Politsch Talk Recording] [Politsch Talk Slides]
Abstract: The Lyman-α forest – a dense series of hydrogen absorptions seen in the spectra of distant quasars – provides a unique observational probe of the redshift z>2 Universe. The density of spectroscopically measured quasars across the sky has recently risen to a level that has enabled secure measurements of large-scale structure in the three-dimensional distribution of intergalactic gas using the inhomogeneous hydrogen absorption patterns imprinted in the densely sampled quasar sightlines. In principle, these modern Lyman-α forest observations can be used to statistically reconstruct three-dimensional density maps of the intergalactic medium over the massive cosmological volumes illuminated by current spectroscopic quasar surveys. However, until now, such maps have been impossible to produce without the development of scalable and statistically rigorous spatial modeling techniques. Using a sample of approximately 160,000 quasar sightlines measured across 25 percent of the sky by the SDSS-III Baryon Oscillation Spectroscopic Survey, here we present a 154 Gpc3 large-scale structure map of the redshift 1.98≤z≤3.15 intergalactic medium — the largest volume large-scale structure map of the Universe to date — accompanied by rigorous quantification of the statistical uncertainty in the reconstruction.
Bio: Collin is a Postdoctoral Fellow in the Machine Learning Department at Carnegie Mellon University. He received his joint Ph.D. in Statistics and Machine Learning from CMU in the summer of 2020 with his thesis titled "Statistical Astrophysics: From Extrasolar Planets to the Large-scale Structure of the Universe." Prior to that, he received a M.Sc. in Machine Learning from CMU in 2017 and a B.Sc. in Mathematics from the University of Kansas in 2014. His research interests include applications of statistical machine learning methods to problems in astrophysics, spatio-temporal data analysis, uncertainty quantification, and forecasting COVID-19.
November 13: Murali Haran (Department of Statistics, Pennsylvania State University)
Title: Statistical Methods for Ice Sheet Model Calibration
[Haran Talk Recording] [Haran Talk Slides]
Abstract: In this talk I will consider the scientifically challenging task of understanding the past and projecting the future dynamics of the Antarctic ice sheet; this ice sheet is of particular interest as its melting may lead to drastic sea level rise. The scientific questions lead to the following statistical and computational question: How do we combine information from noisy observations of an ice sheet with a physical model of the ice sheet to learn about the parameters governing the dynamics of the ice sheet? I will discuss two classes of methods: (i) approaches that perform inference based on an emulator, which is a stochastic approximation of the ice sheet model, and (ii) an inferential approach based on a heavily parallelized sequential Monte Carlo algorithm. I will explain how the choice of method depends on the particulars of the questions we are trying to answer, the data we use, and the complexity of the ice sheet model we work with. This talk is based on joint work with Ben Lee (George Mason U.), Won Chang (U of Cincinnati), Klaus Keller, Rob Fuller, Dave Pollard, and Patrick Applegate (Penn State Geosciences).
Bio: Murali Haran is Professor and Head of the Department of Statistics at Penn State University. He has a PhD in Statistics from the University of Minnesota, and a BS in Computer Science (with minors in Statistics, Mathematics and Film Studies) from Carnegie Mellon University. His research interests are in Monte Carlo algorithms, spatial models, the statistical analysis of complex computer models, and interdisciplinary research in climate science and infectious diseases.
December 4: Jenni Evans (Department of Meteorology & Atmospheric Science, Pennsylvania State University)
Title: Unscrambling ensemble simulations to improve hurricane forecasts
[Evans Talk Recording]
Abstract: In November 2020, Hurricane Iota made landfall in Nicaragua, 15 miles south of where Hurricane Eta had crossed the coast less than 2 weeks earlier. Like Eta, Iota was a Category 4 hurricane at landfall, with maximum sustained winds near 155 mph. In a situation like Eta or Iota, devastation follows landfall due to a combination of winds, rainfall, flooding and mudslides. The storm’s ultimate impact relies on its track, its intensity and its structure. An accurate hurricane forecast can save countless lives. In the drive to produce accurate hurricane forecasts, meteorologists developed detailed deterministic models and refined them endlessly, but large forecast errors still occurred. Modelers began running permutations of the deterministic models 10s, or even 100s, of times. These ensemble simulations of hurricane evolution provide a measure of the uncertainty in the forecast, but translating this into a forecast can mean that much information is lost. I will discuss how we can synthesize the information in the ensemble objectively, and show that the resulting partition distinguishes between different synoptic situations, preserving information on the sources of the uncertainty in the forecast. Examples will be drawn from two US landfalling hurricanes: Hurricane Sandy (October/November 2012) and Hurricane Harvey (August 2017).
Bio: Jenni L. Evans is the Director of Penn State’s Institute for Computational and Data Sciences (ICDS), Professor of Meteorology & Atmospheric Science and served as Centennial President of the American Meteorological Society (AMS) in 2019. Evans earned both her undergraduate and doctoral degrees in applied mathematics at Monash University. The Institute for Computational and Data Sciences (ICDS) is a pan-university research institute and is also the home of Penn State’s high performance computing facility. ICDS jointly employs over 30 tenure track faculty and supports researchers across the disciplinary spectrum.
Dr. Evans was the Centennial President of the American Meteorological Society in 2019, is Fellow of the American Association for the Advancement of Science and also of the American Meteorological Society. She has served on numerous national and international committees and has long been Meteorologist in an interdisciplinary team of scientists and actuaries advising the State of Florida by auditing catastrophe risk models for hurricanes and flood.
Evans’ research spans tropical climate, climate change, and hurricane lifecycles in the tropics, as well as hurricanes that undergo “extratropical transition” (like Hurricane Sandy in 2012) and sonification – the “music of hurricanes.” She uses high performance computing for simulations of hurricanes, and machine learning and advanced statistical techniques, to study formation of hurricanes in the tropics and subtropics, methods for improving hurricane forecasts, theory for the limiting intensity of hurricanes and how this could change with climate change, and the use of climate models to understand the impacts of climate change on our daily lives.