Carnegie Mellon University

The Data Analytics for Science Immersion Experience (DASIE)

June 5-June 14, 2022

CMU campus in Pittsburgh, PA and Dow Headquarters in Midland, MI

Microsoft Logo Dow logo

There is a widespread need for scientists with advanced data skills and a national need to recruit and retain emerging scientists from underrepresented groups.

DASIE, in partnership with Dow and Microsoft, addresses this skills gap by bringing together students from outside of Carnegie Mellon University to start building a pipeline of future science leaders with advanced data skills and to bring awareness of opportunities that exist at CMU, the partner organizations and the industry as a whole.

Participants will spend one week on campus at CMU, experiencing some of the curriculum for the newly launched MS in Data Analytics for Science program, and then travel to Midland, MI for hands-on experiences at Global Dow Center. In addition to their time in the classroom, there will be many opportunities for connection, including on-the-job shadowing and training from Dow and Microsoft industry experts and mentoring from CMU faculty and staff.

THIS IS A FULLY FUNDED PROGRAM for underrepresented undergraduate students in their sophomore, junior, or senior year. STUDENTS SELECTED FOR THE PROGRAM WILL Have TRAVEL AND LODGING COVERED AND RECEIVE A STIPEND.

® The DOW Diamond is a trademark of The Dow Chemical Company (“Dow”) or an affiliated company of Dow


Daisie Schedule


Shape Optimization: From Soap Films to Acoustics to Crystals
Facilitator: Dr. Robin Neumayer

The task of optimizing an object's shape to make it the most efficient, least costly, or most streamlined arises in nature as well as across engineering and design fields. For instance, due to surface tension, soap bubbles take the shape with the least surface area among all possible shapes enclosing a fixed amount of air — a sphere! We will introduce the mathematical study of so-called “shape optimization problems” and survey some classical and contemporary results in the field.

Research in the Future Lab - Science in the Cloud
Facilitator: Dr. Subha Das

You can be at home or anywhere in the world and still run experiments in a lab. These sessions will introduce an automated and remote cloud lab facility (Emerald Cloud Lab in San Francisco). Operations in the cloud lab are conducted through a computer console and internet access that allows the user to program equipment, set up experiments and analyze data. These sessions will offer a virtual tour of the cloud lab, explain how the ECL works at a high level and the key advantages of such a Cloud Lab. The Wolfram symbolic language - Mathematica based Cloud Lab Command Center interface allows one to remotely interact with the facilities and laboratory instruments in the cloud lab. Students will see how experiments are planned, set up and executed and how the data can be plotted. Students will also gain an understanding of the metadata collection in the cloud lab.

Working with molecular data for chemical informatics applications
Facilitator: Dr. Olexandr Isayev

Machine learning (ML) has shown outstanding abilities to uncover patterns and relationships hiding within datasets, providing a transformative technology for advancing science. ML techniques have predicted reaction sequences for synthesis within the broad domain of chemistry, mapped out massive chemical spaces of molecules, materials, and catalysts, and have greatly accelerated ab initio modeling tools. How do you integrate ML toolkits into research workflows? This workshop offers an introduction and hands-on tutorials to showcase how ML can be applied to chemical data. This workshop will be tailored to address working with chemical data and the RDKit library. We assume no prior knowledge requirements. The only prerequisite is that all workshop participants must possess minimal python3 programming skills and knowledge to manipulate data in CSV/XLS files.

Big Data and Data Science and
Deep Learning and AI
Facilitator: Dr. John Urbanic

Big Data and Data Science: We will discuss why data and data analytics have become central to so many domains of science. Then we will discuss how "Big Data", the enormous and complex data sets that are proving so fruitful to analysis, has its own challenges and tools. We will introduce some Spark platform basics as an entry to this world and the field of machine learning applied to data.

Deep Learning and AI: We will venture into the domain of Artificial Intelligence, and in particular using Artificial Neural Networks, also known as Deep Learning. We will look at how these devices function and see why there is so much excitement about their application to science. Is it hype or reality? We will make a case that is it very real by
discussing many recent, and revolutionary, successes.

Using Randomization for Scientific Data-Discovery
Facilitator: Dr. Hayden Schaffer

Artificial intelligence for scientific discovery has gained recent attention to address various modeling and computational issues in the sciences. Some of the goals are to provide automated  approaches for supporting and accelerating growth in data-based discovery, high-consequence decision making, and prototyping. In particular, in high-dimensional datasets, direct methods often suffer curse-of-dimensionality. This workshop will cover a broad set of techniques for approximating high-dimensional data using randomization, with the goal of applying these methods to scientific modeling, engineering design problems, and signal processing.