38610 - Modern Programming for Data Scientists - Mellon College of Science

Mellon College of Science › Graduate › Programs › Data Analytics for Science › Courses › 38610 - Modern Programming for Data Scientists

38610 - Modern Programming for Data Scientists

A hands-on introductory course to the fundamentals of Python programming in data science for students with minimal or no programming experience. Students will learn while working on scientific problems and leveraging scientific datasets. The data science Python ecosystem includes easy-to-use packages for working with data and is the foundation for most deep learning frameworks, which will be used in subsequent courses. Students will develop skills in object-oriented programming in Python3; usage of packages for efficiently working with scientific data; customizing their environment; Anaconda; developing electronic notebooks for reusing and sharing code; reading data specific to the sciences (Biology, Chemistry, Math, or Physics); improving the efficiency of Python code; and visualizing data. At the end of the courses, students will have the skills to design and deploy a python-based data science solution for a small scientific challenge. This course is required for students enrolled in the MS program in Data Analytics for Science.

The main topics will include:

Brief introduction to Python3 and the development ecosystem
Introduction to object-oriented and procedural programming models and basic software architecture principles
Professional programming techniques for modern software development: version control and team development (Git and GitHub), coding standards, unit and regression testing (PyTest) and continuous integration (TravisCI)
Introduction to R and RStudio
Developing reusable, sharable, and interactive electronic notebooks with Jupyter
Python environment management: Virtualenv and Anaconda
Fundamentals of data structures and their implementation in Python
Python packages for science and data science: NumPy, SciPy, Pandas, StatsModels
Data processing techniques for small, medium and large datasets
Manual and programmatic metadata standards
Data Analytics with Python: Optimization, Linear and Non-Linear Regression, Mathematical Modeling, Monte Carlo Sampling, Distributions, and Clustering