Curriculum

Program Overview

The Mellon College of Science is striving to ensure that students are better prepared for the next career step. As part of this commitment to training the next generation of scientific leaders, we have created the M.S. in Data Analytics for Science (MS-DAS) program.

The MS-DAS program is designed initially as a one-year program. Students will commence the program in the fall and take a rigorous set of courses through the spring in applied linear algebra, programming, machine learning, statistical methods and neural networks that will equip them with the necessary data analytics techniques to solve modern scientific problems. Courses will be offered through the Mellon College of Science, Department of Statistics and the Pittsburgh Supercomputing Center, a world leader in high-performance computing and data analytics.

The program culminates in a semester-long capstone project in collaboration with industry partners providing students with a concrete understanding of how to impact scientific discovery through applied data analytics skills. To ensure students are prepared for a data science career, a required 6-week mini course on communications and professional development will be completed in the Spring semester.

Students must complete a minimum of 99-units to meet the degree requirements.

Curriculum Overview

Fall

The first semester provides foundational mathematics, statistics and programming skills necessary to understand the basics of computational modeling, analytical tools, and machine learning. Students will take a combination of 6-week mini courses and semester-long courses to provide the depth and breadth needed to move into advanced coursework in the Spring.

21670 - Linear Algebra for Data Science*

21670 - Linear Algebra for Data Science
This course is designed to present and discuss those aspects of Linear Algebra that are most important in Data Analytics. The emphasis will be on developing intuition and understanding how to use linear algebra, rather than on "proofs".

The main topics will include:

Basic matrix operations, linear transformations.
Subspaces, ranges and null spaces, linear combinations and spans, linear independence, bases, dimension, rank and nullity theorem.
Systems of linear equations, symmetric matrices, inverses, determinants, triangular matrices, trace, eigenvalues and eigenvectors.
Positive definite matrices, covariance matrices, minimization problems involving vectors and matrices, minimization and convex functions.
Orthogonal projections, Gram-Schmidt procedure, singular value decomposition.
Tensor structures.

*Students may place out of this course

38613 - Communication Skills and Professional Development

38613 - Communication Skills and Professional Development
Coming soon.

21671 - Computational Linear Algebra
38615 - Computational Modeling, Statistical Analysis and Machine Learning in Science
38614 - Large-Scale Computing in Data Science
36600 - Essentials of Statistical Practice for Graduate Students OR 36617 - Applied Linear Models

21671 - Computational Linear Algebra
This is a survey of methods in computational linear algebra. Topics covered in this course focus around algorithms for solving (dense or large and sparse) linear systems. Regularization and underdetermined systems will be discussed in detail. Rather than assuming prior knowledge in numerical analysis or matrix theory, we will introduce standard methods or results when needed. In this way, much of the material is self-contained. Theoretical and experimental results will be covered accordingly, with an emphasis on cost, stability, and convergence.

The main computational topics include:

QR (Gram-Schmidt, Modified Gram-Schmidt, Householder reflections, Givens rotations).
Singular value decomposition and its application to reduced order modeling.
Least squares problems (over-determined), regularization, and its application to data-driven inverse problems.
Cholesky, incomplete methods, and its application to scientific computing.
Algebraic eigenvalue problems.
Krylov Methods (CG, etc.) and preconditioning.
Least Squares problem (under-determined), sparsity, subspace approximations, LASSO, basis pursuit, with applications to model selection and parameter estimation.

38615 - Computational Modeling, Statistical Analysis and Machine Learning in Science
The purpose of this course is to provide a practical introduction to the core concepts and tools of machine learning in a manner easily understood and intuitive to STEM students. The course begins by covering fundamental concepts in ML, data science, and modern statistics such as the bias-variance tradeoff, overfitting, regularization, and generalization, before moving on to more advanced topics in both supervised and unsupervised learning.

Students will choose a large dataset from a selection of biology, chemistry, math, or physics datasets hosted by PSC and use this dataset throughout the MS program. The topics of the course are taught with students analyzing the chosen dataset. An intensive knowledge of Python or another computing language is not a pre-prerequisite since students will be given at first simple scripts that they work with and then expand upon. This course is required for students enrolled in the MS program in Data Analytics for Science.

Potential topics include:

Efficient data structures (arrays, stacks, queues, lists, trees, heaps, graphs)
Data storage, sorting and searching (binary search trees, hash tables), efficient query
Techniques for handling high-dimensional data (instances with many attributes), including variable selection and dimension reduction, ensemble methods (bagging and boosting)
Large-scale search algorithms, intro to databases
Model accuracy, prediction accuracy
Model selection, dimension reduction, and other high-dimensional considerations
Linear and nonlinear models
Classification, SVM, kernel methods
Decision trees and RF
Probabilistic methods

38614 - Large-Scale Computing in Data Science
This course introduces students to the techniques necessary for manipulating and analyzing big data as encountered in modern scientific computing. This Python based course will introduce modern software engineering techniques and tools, and data science frameworks with an emphasis on large-scale problems and computing platforms. It is hands-on and will use the Spark framework for the mining of large and complex scientific datasets. Students will progress to scalable data analytics and eventually basic machine learning on various scalable platforms such as supercomputers and clouds. This will culminate with an introduction to the TensorFlow framework for deep learning. Lower-level concepts such as performance optimization and concurrent programming techniques will be introduced along the way. Exercises will be motivated by relevant scientific community datasets. This course is required for students enrolled in the MS program in Data Analytics for Science.

The main topics will include:

Introduction to modern software engineering techniques
Intro to Big Data w/ Spark (Databases and formats: JSON, HDF5, XML, graph)
Intro to Spark data analytics (Clustering)
Intro to Dimensionality Reduction with Spark
Spark Machine Learning (Recommender system)
Cloud Computing (including VMs and containers). AWS, Azure, NCCP
HPC Platforms (including GPU's)
Manual concurrency with Python MPI
Optimization and performance with Python (Cython, profilers, debugging)
Introduction to Python alternatives in the sciences (C, C++, Fortran, Java, Julia)

36600 - Essentials of Statistical Practice for Graduate Students
This is a first course in statistical practice, targeted specifically to CMU graduate students outside of statistics and machine learning. It is designed as a high-level introduction both to fundamental concepts of probability and statistics and to the ways by which statisticians go about approaching and analyzing data. The course will cover exploratory data analysis, parameter estimation and hypothesis testing, clustering, and common regression and classification models. If time permits, additional topics such as text mining, experimental design, and time series may be covered. Students will carry out all work using the R programming language.

36617 - Applied Linear Models
Upon successful completion of this course, students should be able to properly analyze real-world atasets using linear regression and related methods in both R and SAS, use exploratory data analysis (EDA) techniques to learn salient features of the data, build appropriate models based on your EDA, diagnose any possible violations of model assumptions and, if necessary, apply remedial measures to overcome violations, perform appropriate analytical/inferential techniques to address objectives of a client/colleague, and clearly communicate the results of an analysis to a layperson.

The main topics will include:

Simple linear regression models (inference, diagnostics, and remedial measures), multiple linear egression models (inference, diagnostics, and remedial measures), analysis of variance, analysis of covariance, variable selection, and extensions of traditional linear regression models (generalized linear models, penalized regression with ridge/LASSO, semiparametric regression/smoothing).

38612 - Information Visualization for Scientists

38612 - Information Visualization for Scientists
This course introduces the student to the concepts and tools of data visualization. Emphasis is placed on information visualization, with some exposure to visualization of scientific datasets with a spatial reference frame. The student will gain hands-on experience with a variety of visualization tools accessible from Python and R, including matplotlib, ggplot, and VisIt. This course is required for students enrolled in the MS program in Data Analytics for Science.

The main topics will include:

Understanding the structure of data and the way it relates to visual idiom. For example, simple tabular data requires a different presentation from data within a spatial reference frame.
Common visual idioms, and the tools to produce them in Python and R.
The encoding of relevant components of the data in the free parameters of the visual idiom.
The impact of color choice, data volume, and complexity on the ability to perceive patterns in data.
The distinction between information visualization and scientific visualization, and the boundary cases in between.

38616 - Neural Networks and Deep Learning in Science
Elective Course
38617 - MS-DAS Capstone Project Course

38616 - Neural Networks and Deep Learning in Science
The course focuses on practice and applications of deep learning by exploring foundational concepts, structuring popular networks and implementing models through modern technologies (python, Jupyter notebooks and PyTorch). Other topics may include image recognition, machine translation, natural language processing, parallelism, GPU distributed computing, cloud technologies, inference and parameter fitting in deep networks. Course uses large datasets hosted by PSC.

Potential topics include:

Basic concepts: Model accuracy, prediction accuracy, interpretability, supervised and un- supervised training, regularization.
Artificial neural networks, feed-forward, activation functions, loss functions.
Non-linear optimization, gradient descent, back-propagation
Deep Learning tools: PyTorch, AWS cloud
Autoencoders, dense embedding, dimensionality reduction
Convolutional networks, transfer learning, applications in image processing and sciences
Recurrent networks, LSTM, GRU, applications in NLP
Other topics: GANs, Reinforcement Learning, Multitask Learning, advanced applications of deep learning in chemical and biological sciences.

38617 - MS-DAS Capstone Project Course
A crucial component of the MS-DAS program is the required semester-long Capstone Course, which will provide student the opportunity to apply the skills developed in the classroom such as the acquired programming, machine learning and data science tools to real-world scenarios from industry partners. Students will gain experience working collaboratively to evaluate and develop a solution in coordination with industry partners following their guidelines. Through this course students begin to make the transition from the academic world to the environments in industry and the marketplace, where the challenges of team building, resource development, client relations, limited information, and pressing deadlines are as real and important as the technical and managerial components of any task.

02604 - Fundamentals of Bioinformatics
How do we find potentially harmful mutations in your genome? How can we reconstruct the Tree of Life? How do we compare similar genes from different species? These are just three of the many central questions of modern biology that can only be answered using computational approaches. This 12-unit course will delve into some of the fundamental computational ideas used in biology and let students apply existing resources that are used in practice every day by thousands of biologists. The course offers an opportunity for students who possess an introductory programming background to become more experienced coders within a biological setting.

02613 or 15650- Algorithms & Advanced Data Structures
The objective of this course is to study algorithms for general computational problems, with a focus on the principles used to design those algorithms. Efficient data structures will be discussed to support these algorithmic concepts. Topics include: Run time analysis, divide-and-conquer algorithms, dynamic programming algorithms, network flow algorithms, linear and integer programming, large-scale search algorithms and heuristics, efficient data storage and query, and NP-completeness. Although this course may have a few programming assignments, it is primarily not a programming course. Instead, it will focus on the design and analysis of algorithms for general classes of problems.

02710 - Computational Genomics
Dramatic advances in experimental technology and computational analysis are fundamentally transforming the basic nature and goal of biological research. The emergence of new frontiers in biology, such as evolutionary genomics and systems biology is demanding new methodologies that can confront quantitative issues of substantial computational and mathematical sophistication. From the computational side this course focuses on modern machine learning methodologies for computational problems in molecular biology and genetics, including probabilistic modeling, inference and learning algorithms, data integration, time series analysis, active learning, etc.

09763 - Molecular Modeling & Computational Chemistry
Computer modeling is playing an increasingly important role in chemical, biological and materials research. This course provides an overview of computational chemistry techniques including molecular mechanics, molecular dynamics, electronic structure theory and continuum medium approaches. Sufficient theoretical background is provided for students to understand the uses and limitations of each technique. An integral part of the course is hands on experience with state-of-the-art computational chemistry tools running on graphics workstations.

09860 - Digital Molecular Design Studio
Digital Molecular Design Studio is a Special Topics course at Carnegie Mellon University aimed at upper-level chemistry undergraduates and graduate students from physical and computer sciences, as well as engineering. The Studio moniker implies a hands-on class where students perform the computational experiments, while learning the theoretical fundamentals. The ultimate goal of this course is how to design molecules and reactions. This course will feature close- and open-ended projects. It will start from fundamentals of cheminformatics, quantum and computational chemistry, and data analysis with Python. Next, the class rapidly progresses to the exploration and understanding of real-world, literature problems in homogeneous catalysis, materials, and engineering. In the third-section, we will explore how different machine learning algorithms can be used to solve problems in chemistry. For example, we will use state of the art ML projects such as AlphaFold2 for protein folding and its potential applications in biocatalysis. Lastly, students will propose their own project using the tools introduced throughout the semester.

10605 - Machine Learning with Large Datasets
Large datasets are difficult to work with for several reasons. They are difficult to visualize, and it is difficult to understand what sort of errors and biases are present in them. They are computationally expensive to process, and often the cost of learning is hard to predict - for instance, and algorithm that runs quickly in a dataset that fits in memory may be exorbitantly expensive when the dataset is too large for memory. Large datasets may also display qualitatively different behavior in terms of which learning methods produce the most accurate predictions. This course is intended to provide a student practical knowledge of, and experience with, the issues involving large datasets. Among the issues considered are: scalable learning techniques, such as streaming machine learning techniques; parallel infrastructures such as map-reduce; practical techniques for reducing the memory requirements for learning methods, such as feature hashing and Bloom filters; and techniques for analysis of programs in terms of memory, disk usage, and (for parallel methods) communication complexity. The class will include programming assignments, and a one-month short project chosen by the student. The project will be designed to compare the scalability of variant learning algorithms on datasets. An introductory course in machine learning, like 10-601 or 10-701, is a prerequisite or a co-requisite. If you plan to take this course and 10-601 concurrently please tell the instructor. The course will include several substantial programming assignments, so an additional prerequisite is 15-211, or 15-214, or comparable familiarity with Python and good programming skills.

10708 - Probabilistic Graphical Models
Many of the problems in artificial intelligence, statistics, computer systems, computer vision, natural language processing, and computational biology, among many other fields, can be viewed as the search for a coherent global conclusion from local information. The probabilistic graphical models' framework provides a unified view for this wide range of problems, enabling efficient inference, decision-making, and learning in problems with a very large number of attributes and huge datasets. This graduate-level course will provide you with a strong foundation for both applying graphical models to complex problems and for addressing core research topics in graphical models. The class will cover classical families of undirected and directed graphical models (i.e. Markov Random Fields and Bayesian Networks), modern deep generative models, as well as topics in causal inference. It will also cover the necessary algorithmic toolkit, including variational inference and Markov Chain Monte Carlo methods. Students entering the class should have a pre-existing working knowledge of probability, statistics, and algorithms, though the class has been designed to allow students with a strong mathematical background to catch up and fully participate.

10725 - Convex Optimization
Nearly every problem in machine learning can be formulated as the optimization of some function, possibly under some set of constraints. This universal reduction may seem to suggest that such optimization tasks are intractable. Fortunately, many real world problems have special structure, such as convexity, smoothness, separability, etc., which allow us to formulate optimization problems that can often be solved efficiently. This course is designed to give a graduate-level student a thorough grounding in the formulation of optimization problems that exploit such structure, and in efficient solution methods for these problems. The main focus is on the formulation and solution of convex optimization problems, though we will discuss some recent advances in nonconvex optimization. These general concepts will also be illustrated through applications in machine learning and statistics. Students entering the class should have a pre-existing working knowledge of algorithms, though the class has been designed to allow students with a strong numerate background to catch up and fully participate.

11711 - Advanced Natural Language Processing
Advanced natural language processing is an introductory graduate-level course on natural language processing aimed at students who are interested in doing cutting-edge research in the field. In it, we describe fundamental tasks in natural language processing such as syntactic, semantic, and discourse analysis, as well as methods to solve these tasks. The course focuses on modern methods using neural networks, and covers the basic modeling and learning algorithms required therefore. The class culminates in a project in which students attempt to reimplement and improve upon a research paper in a topic of their choosing.

11775 - Large-Scale Multi-Media Analysis
Can a robot watch "Youtube" to learn about the world? What makes us laugh? How to bake a cake? Why is Kim Kardashian famous? 12-unit class covering fundamentals of computer vision, audio and speech processing, multi-media files and streaming, multi-modal signal processing, video retrieval, semantics, and text (possibly also: speech, music) generation. Instructors will give an overview of relevant recent work and benchmarking efforts (Trecvid, Mediaeval, etc.). Students will work on research projects to explore these ideas and learn to perform multi-modal retrieval, summarization and inference on large amounts of "Youtube"-style data. The experimental environment for the practical part of the course will be given to students in the form of Virtual Machines.

16720 - Computer Vision
This course introduces the fundamental techniques used in computer vision, that is, the analysis of patterns in visual images to reconstruct and understand the objects and scenes that generated them. Topics covered include image formation and representation, camera geometry, and calibration, computational imaging, multi-view geometry, stereo, 3D reconstruction from images, motion analysis, physics-based vision, image segmentation and object recognition. The material is based on graduate-level texts augmented with research papers, as appropriate. Evaluation is based on homework and a final project. The homework involve considerable Matlab programming exercises.

17628 - Applied Quantum Computing
Quantum computers can solve specific problems more efficiently than their classical counterpartsin theory. But in practice, today's quantum devices are too small and noisy to run many flagship algorithms. However, with carefully crafted hardware-software-algorithm-application stacks, quantum computers can already be used as interesting scientific tools for applications including fundamental physics, chemistry, and machine learning. In this course, we will learn about quantum applications in practice by understanding each component of a full-stack quantum computer. First, we will survey potential applications for quantum computers and identify which ones are both feasible and useful in the near-term. We will focus on applications involving simulation and machine learning, and then dive deep into relevant hybrid quantum-classical and all-quantum algorithms. Then, to achieve a deep understanding of the need for hardware-efficient algorithms, we will survey various physical platforms for quantum computers coupled with strategies for mitigating quantum errors based on fundamental concepts in quantum information theory. Throughout the course, concepts will be reinforced through practical coding exercises using modern software tools that will culminate in a final project to implement a quantum application at the cutting-edge of the field.

NOTE: This is a 6-unit course. Students must take 9-12 units of elective coursework to meet degree requirements.

17630 - Prompt Engineering
Students in this course will learn a brief history of large language models and learn about contemporary prompt engineering strategies and techniques. The course will cover in context learning theory with an emphasis on practice and building an intuition for prompt design and evaluation. Topics covered include chain of thought prompting, prompt tuning with hard and soft prompts, and self-consistency. Students will learn about standard prompt engineering benchmarks, evaluation metrics and calibration to evaluate the efficacy of prompt designs. Finally, the course will cover alignment and the ethics of large language models, while reviewing sample and cross-section of domain-specific applications. Students in the course will need to purchase access to a cloud-based language model to complete coursework, which is estimated to cost $100-150. Various options exist, including GPT3.5 by OpenAI or Claude by Anthropic, as well as running T5 on a Lambda server. Class tutorials exists to guide students on how to setup and use one of these services.

21270 – Introduction to Mathematical Finance
This is a first course for those considering majoring or minoring in Computational Finance. The theme of this course is pricing derivative securities by replication. The simplest case of this idea, static hedging, is used to discuss net present value of a non-random cash flow, in-ternal rate of return, and put-call option parity. Pricing by replication is then considered in a one-period random model. Risk-neutral probability measures, the Fundamental Theorems of Asset Pricing, and an introduction to expected utility maximization and mean-variance analysis are presented in this model. Finally, replication is studied in a multi-period bino-mial model. Within this model, the replicating strategies for European and American op-tions are determined.

21690 – Methods of Optimization
An introduction to the theory and algorithms of linear and nonlinear programming with an emphasis on modern computational considerations. The simplex method and its variants, duality theory and sensitivity analysis. Large-scale linear programming. Optimality condi-tions for unconstrained nonlinear optimization. Newton's method, line searches, trust re-gions and convergence rates. Constrained problems, feasible-point methods, penalty and barrier methods, interior-point methods.

21765 – Intro to Parallel Computing & Scientific Computation
The objectives of this course are to:

develop structural intuition of how the hardware and the software work, starting from simple systems to complex shared resource architectures;
provide guidelines about how to write and document a software package;
familiarize the audience with the main parallel programming techniques and the common software packages/libraries.

33456 – Advanced Computational Physics
This course extends the study of the topics of 33-241 emphasizing practical numerical, sym-bolic and data-driven computational techniques as applied to a selection of currently active research areas. It is taught by faculty and staff actively engaged in a variety of areas of com-putational science. Numerical methods may include SVD decomposition, chi-squared mini-mization, and Fast Fourier Transforms and Monte Carlo simulation of experiments. Applica-tions may include data analysis, eigenvalue problems and others depending on the research activities of the instructors. The students will be expected to become proficient in a specif-ic programming language and to gain the ability to move to other languages and algorithms as their future computationally intensive efforts may require.

36662 – Methods of Statistical Learning
Data mining is the science of discovering patterns and learning structure in large data sets. Covered topics include information retrieval, clustering, dimension reduction, regression, classification, and decision trees.

42685 – Biostatistics
This course introduces statistical methods for making inferences in engineering, biology and medicine. Students will learn how to select the most appropriate methods, how to apply these methods to actual data, and how to read and interpret computer output from a commonly used statistical package. The topics covered are descriptive statistics; elementary probability; discrete and continuous random variables and their distributions; hypothesis testing involving interval (continuous and discrete) and categorical (nominal and ordinal) variables, for two and three or more treatments; simple and multiple linear regression; time-series analysis; clustering and classification; and time-to-event (survival) analysis. Students will also learn how to write the statistical component of a "Results" section for a scientific paper and learn about the limitations of the statistical analyses. Basic familiarity with probability and probability distribution preferred but not required.

90834 - Health Care Geographical Info Systems
This course is taught with asynchronous video lectures and readings with in-person and remote office hours. A geographic information system (GIS) provides an effective way to visualize, organize and manage a wide variety of information including administrative and medical record data, environmental health, social services, and other location data. Public health departments, hospitals, and medical research agencies are using GIS to map health-related events, identify disease clusters, investigate environmental health problems, and understand the spread of disease. This course uses a unique approach for teaching GIS in health care. It involves learning how to use GIS software in the context of carrying out projects for visualizing and analyzing health-related data. Each week includes lectures and computer labs that focus on a health, technical, or policy issue which use Esri's ArcGIS Pro and Platform technologies to analyze data or solve a problem. Students learn to create Story Maps and Dashboards to convey their maps and associated text to the public and decision makers. Through assignments and projects students will not only learn how to use the software but will also learn the many distinctive advantages of using GIS for health care policy making and planning. By the end of the course, students will have sufficient background so that they can become expert users of GIS in health organizations - building, managing, and using GIS maps and health data. Prerequisites: 90-728 Introduction to Database Management, 91-802 Information Systems for Managers or permission of instructor.

94823 - Measuring Social
This class reflects an experiential learning environment where students will be placed into teams with students from across Heinz and CMU to work with clients on projects involving measuring social content and activity. Each semester (Spring and Fall), we bring in 7 companies to provide challenging projects for the student teams. Teams are provided with commercially available tools to listen to online social conversations, measure activity, assess different market segments, understand social influence and identify how information get disseminated across social channels. Previous sponsors have included: Under Armour, Netflix, Target, The Washington Post, HBO, Daimler, eBay, Google, AT&T, The Pittsburgh Steelers, etc. The class is designed to teach social analytics, consulting methodologies, critical thinking to weed through ambiguity, project management as well as team and relationship development. Lectures focus on how social is impacting different industries, culture and communication, as well as the future of work. Teams work with their clients throughout the semester and present their findings/deliverables during final presentations when all 7 sponsors come to CMU. In the past, teams have built social applications, social algorithms, experimental methodologies, crowdsourced campaigns and real time information dash boards for their clients. This class offers an opportunity for students interested in analyzing social data, working in a consultative fashion with actual clients on real issues, and learning about global issues associated with an increasingly social culture.

94865 - Data Analytics for Decision Making
The rise in data availability and analytical tools has transformed the decision-making process in organizations. In this course we will explore achieving organizational goals through descriptive, predictive, and prescriptive analytics. The format for this course will be a combination of weekly lectures, interactive in-class activities, and case study team sessions. We will focus on the development of analytical models in the presence of conflicting and complementary objectives. We will also discuss how these models should be presented to various stakeholders. At the end of the course, you will be able to identify the links between data-driven decisions and the mission outcomes of an organization, propose and develop an analytical model that contributes to the mission, and communicate its benefits.

NOTE: This is a 6-unit course. MS-DAS students are required to complete a total of 9-12 units of elective coursework to complete degree requirements.

95835 - Time Series Forecasting in Python
Time Series Forecasting is something of a dark horse in the field of data science. It is one of the most applied data science techniques in business - used extensively in finance, in supply chain management, and in production and inventory planning. Moreover, it has a well established theoretical grounding in statistics and dynamic systems theory. Yet, it retains something of an outsider status in data science compared to more recent and popular machine learning methods such as image recognition and natural language processing. Consequently, Time Series Forecasting gets little or no treatment at all in introductory data science and machine learning courses. This course is intended to provide a comprehensive introduction to forecasting methods without deep diving into the theoretical details behind each method. Although, the references at the end of each week will fill in many of those details. The course is intended for the following three audiences. Graduate students studying in STEM or business fields. People doing forecasting in business who may not have had any formal training in the area. MBA students doing a data elective. Also relevant for those studying public policy, healthcare management, and related disciplines.

NOTE: This is a 6-unit course. MS-DAS students are required to complete a total of 9-12 units of elective coursework to complete degree requirements.

The required, semester-long capstone course will allow students to engage with industry partners and develop a solution to real-world, data-driven issues.

The MS-DAS program will establish partnerships from a variety of relevant industry areas. Teams of students will collaborate with industry partners and the faculty course lead and apply techniques learned in the program to complex, scientific problems. Regular meetings with industry partners will allow students to receive feedback and enhance communications and presentation skills. Further, students will have the opportunity to network with potential employers and expand their professional network through the capstone project.

Are you an employer who is interested in becoming a capstone partner? Check out our Employer page for more information.

Curriculum

Program Overview

Curriculum Overview

Fall

Spring

Electives

Capstone

Request Information

Curriculum

Program Overview

Curriculum Overview

Fall

Mini I (August – October)

Mini II (October – December)

Full Semester (August – December)

Spring

Mini III (January – March)

Full Semester (January – May)

Electives

Elective Courses

Capstone

Capstone Project Course

Request Information