Carnegie Mellon University

Data Analytics for Science Immersion Experience (DASIE)

There is a widespread need for scientists with advanced data skills and a national need to recruit and retain emerging scientists from underrepresented groups.

DASIE, in partnership with Dow and Microsoft, addresses this skills gap by bringing together students from outside of Carnegie Mellon University to start building a pipeline of future science leaders with advanced data skills and to bring awareness of opportunities that exist at CMU, the partner organizations and the industry as a whole.

Participants will spend one week on campus at CMU, experiencing some of the curriculum for the newly launched MS in Data Analytics for Science program, and then travel to Midland, MI for hands-on experiences at Global Dow Center. In addition to their time in the classroom, there will be many opportunities for connection, including on-the-job shadowing and training from Dow and Microsoft industry experts and mentoring from CMU faculty and staff.

Important Dates and Details

  • 2023 Application season opens: December 16, 2022
  • 2023 Application deadline: February 28, 2023 11:59 p.m. EST
  • Stipend: $500, plus room, board, and travel
  • Program duration: 2 weeks
  • Two campuses: Pittsburgh, PA & Midland, MI

 2022 Admissions Data

  • Applications: 131
  • Awards: 21
  • Profile, Class of 2022: Colleges represented: Florida International University, Johnson C. Smith University, Alabama Agricultural and Mechanical University, Spelman College, Georgia State University, Florida A&M University, California State Polytechnic University, Pomona, Elizabeth City State University, San Francisco State University, The University of Texas at El Paso, Sonoma State University, University of Puerto Rico at Mayagüez, California State University, Fresno, Clark Atlanta University, Philander Smith College, Florida Memorial University, Tuskegee University, Florida Agricultural & Mechanical University

Quick Facts

Program Dates - 2023, subject to change

  • Application Deadline: Tuesday, February 28
  • Notification of Admission: No later than March 31
  • Saturday, June 11 - Arrive at Carnegie Mellon University, Pittsburgh, PA
  • Sunday, June 18 - Leave to DOW Corporate Headquarters, Midland, MI
  • Sunday, June 25 - Depart from DOW

Details

  • Stipened: $500
  • Duration: Up to 2 weeks
  • Travel: Round trip airfare or mileage
  • Room: Mix of doubles, triples, quads
  • Board: Daily breakfast, lunch, dinner
  • Daily Transportation: Pittsburgh - Students walk to lectures and events; Midland - Transportation provided to students
  • Hours: Lectures and planned activities are scheduled from 8:30 am - 7:00 pm Monday through Friday. 
  • Sponsored Trip (examples): White water river rafting, Pittsburgh Pirates baseball game, or amusement park
  • Locations: CMU & Microsoft: Pittsburgh, PA and DOW: Midland, MI
  • Who should apply?  DASIE is for undergraduates majoring in STEM. You must be enrolled in an accredited university at the time of submission. Applicants are required to submit a short essay and name one recommender who can speak to their academic success. There is no GPA minimum and no citizenship requirements. 

— SUMMER 2022 CURRICULUM —

Shape Optimization: From Soap Films to Acoustics to Crystals
Facilitator: Dr. Robin Neumayer

The task of optimizing an object's shape to make it the most efficient, least costly, or most streamlined arises in nature as well as across engineering and design fields. For instance, due to surface tension, soap bubbles take the shape with the least surface area among all possible shapes enclosing a fixed amount of air — a sphere! We will introduce the mathematical study of so-called “shape optimization problems” and survey some classical and contemporary results in the field.

Research in the Future Lab - Science in the Cloud
Facilitator: Dr. Subha Das

You can be at home or anywhere in the world and still run experiments in a lab. These sessions will introduce an automated and remote cloud lab facility (Emerald Cloud Lab in San Francisco). Operations in the cloud lab are conducted through a computer console and internet access that allows the user to program equipment, set up experiments and analyze data. These sessions will offer a virtual tour of the cloud lab, explain how the ECL works at a high level and the key advantages of such a Cloud Lab. The Wolfram symbolic language - Mathematica based Cloud Lab Command Center interface allows one to remotely interact with the facilities and laboratory instruments in the cloud lab. Students will see how experiments are planned, set up and executed and how the data can be plotted. Students will also gain an understanding of the metadata collection in the cloud lab.

Working with molecular data for chemical informatics applications
Facilitator: Dr. Olexandr Isayev

Machine learning (ML) has shown outstanding abilities to uncover patterns and relationships hiding within datasets, providing a transformative technology for advancing science. ML techniques have predicted reaction sequences for synthesis within the broad domain of chemistry, mapped out massive chemical spaces of molecules, materials, and catalysts, and have greatly accelerated ab initio modeling tools. How do you integrate ML toolkits into research workflows? This workshop offers an introduction and hands-on tutorials to showcase how ML can be applied to chemical data. This workshop will be tailored to address working with chemical data and the RDKit library. We assume no prior knowledge requirements. The only prerequisite is that all workshop participants must possess minimal python3 programming skills and knowledge to manipulate data in CSV/XLS files.

Big Data and Data Science and
Deep Learning and AI
Facilitator: Dr. John Urbanic

Big Data and Data Science: We will discuss why data and data analytics have become central to so many domains of science. Then we will discuss how "Big Data", the enormous and complex data sets that are proving so fruitful to analysis, has its own challenges and tools. We will introduce some Spark platform basics as an entry to this world and the field of machine learning applied to data.

Deep Learning and AI: We will venture into the domain of Artificial Intelligence, and in particular using Artificial Neural Networks, also known as Deep Learning. We will look at how these devices function and see why there is so much excitement about their application to science. Is it hype or reality? We will make a case that is it very real by
discussing many recent, and revolutionary, successes.

Using Randomization for Scientific Data-Discovery
Facilitator: Dr. Hayden Schaffer

Artificial intelligence for scientific discovery has gained recent attention to address various modeling and computational issues in the sciences. Some of the goals are to provide automated  approaches for supporting and accelerating growth in data-based discovery, high-consequence decision making, and prototyping. In particular, in high-dimensional datasets, direct methods often suffer curse-of-dimensionality. This workshop will cover a broad set of techniques for approximating high-dimensional data using randomization, with the goal of applying these methods to scientific modeling, engineering design problems, and signal processing.