38615 - Computational Modeling, Statistical Analysis and Machine Learning in Science - Mellon College of Science

Mellon College of Science › Graduate › Programs › Data Analytics for Science › Courses › 38615 - Computational Modeling, Statistical Analysis and Machine Learning in Science

38615 - Computational Modeling, Statistical Analysis and Machine Learning in Science

The purpose of this course is to provide a practical introduction to the core concepts and tools of machine learning in a manner easily understood and intuitive to STEM students. The course begins by covering fundamental concepts in ML, data science, and modern statistics such as the bias-variance tradeoff, overfitting, regularization, and generalization, before moving on to more advanced topics in both supervised and unsupervised learning.

Students will choose a large dataset from a selection of biology, chemistry, math, or physics datasets hosted by PSC and use this dataset throughout the MS program. The topics of the course are taught with students analyzing the chosen dataset. An intensive knowledge of Python or another computing language is not a pre-prerequisite since students will be given at first simple scripts that they work with and then
expand upon. This course is required for students enrolled in the MS program in Data Analytics for Science.

Potential topics include:

Efficient data structures (arrays, stacks, queues, lists, trees, heaps, graphs)
Data storage, sorting and searching (binary search trees, hash tables), efficient query
Techniques for handling high-dimensional data (instances with many attributes), including variable selection and dimension reduction, ensemble methods (bagging and boosting)
Large-scale search algorithms, intro to databases
Model accuracy, prediction accuracy
Model selection, dimension reduction, and other high-dimensional considerations
Linear and nonlinear models
Classification, SVM, kernel methods
Decision trees and RF
Probabilistic methods