Carnegie Mellon University
March 29, 2023

Data Science in Finance

What is Data Science?

Data science is the field of study that analyzes data to discover actionable insights. This discipline combines several skill sets in a multidisciplinary approach, from mathematics, statistics, programming skills, artificial intelligence, computer engineering and domain expertise. Leveraging these capabilities, data scientists analyze large quantities of data to provide insights into potential business opportunities. These vast volumes of data are studied using modern tools which search for previously unseen patterns and meaningful information that can guide organizational decisions.

What is Data Science?

Data scientists create artificial intelligence (AI) systems designed to execute tasks that would have previously been performed by human intelligence. Scientists apply machine learning algorithms to substantial amounts of data in the forms of text, numbers, images, audio, and video to create these new applications. AI systems can subsequently generate insights from these data sets, information which guides choices and adds tangible value to the organization. 

Why is Data Science Important?

Data is omnipresent in today’s world. More information is collected each passing day and year from various sources including the Internet, social media, cell phones, medical records and financial transactions. This explosion of recorded data means that data science professionals are also needed everywhere, across all industries.

Big data is the common term that refers to the rise of larger, more complex sets of data, usually from new sources. Big data sets are behemoths - so enormous that they are unmanageable with traditional processing software. However, they are essential to decision-making and forecasting business situations that were previously unforeseeable with smaller data sets.

Businesses use big data for a variety of functions across their organizations. When properly analyzed, these data sets can show bottlenecks or failures in the current system and improve operations. They help provide better quality customer service and customer retention. Big data can also assist marketers in creating personalized, targeted campaigns for current and potential customers, increasing customer acquisition. The size and number of these data sets continue to grow exponentially in our technological age, along with the demand for professional data scientists. 

Data Science in Finance

While data science is pivotal to nearly every industry, a financial data scientist plays an especially vital role. This sector constitutes approximately 20-25% of the worldwide economy, making it a cornerstone of our global market. This vast industry deals with enormous quantities of extremely sensitive data. Subsequently, the industry is heavily regulated to prevent illegal activities and misuse of data so as to safeguard the privacy of sensitive data of both individuals and businesses.

The financial industry was one of the trailblazers in data science, putting it years ahead of other sectors. With a considerable initial investment of time and resources into identifying the correct data for new insights, many data science in finance efforts are focusing on building and operationalizing new AI models. However, the technological operations of collecting, cleaning, and organizing data is a critical and ongoing requirement for success.

There is huge demand for big data solutions in finance due to the nature of the industry players; banks and other financial institutions strive to profit from correct predictions of market performance or the price of goods. Banks have been attempting to anticipate financial and economic changes for decades to gain a competitive advantage and make better investments. Upon the emergence of big data, it became a primary focus in the industry - organizations are now dependent on it to gain and maintain a competitive edge. 

Data Science in Finance: Opportunities and Challenges

Data Science Opps

Every industry must adapt to the technological age of big data. Despite its early adoption of data science, the finance industry faces a unique set of opportunities and challenges than other sectors.

Short Feedback Loops

The finance industry is known for its fast, high-pressure work environment. This is because of the various decisions firms, financial managers, and banks must make daily with a multitude of available factors. There is no other industry quite like it - you get positive or negative feedback on most decisions remarkably quickly, often on the same day. This provides a unique opportunity for data scientists in finance who analyze real-time data in incomparably short feedback loops to make impactful decisions.

Fraud Risk Analysis

One of the biggest challenges in the finance industry is the potential for fraud. Due to the enormous size of the industry and the value of the personal data held by financial institutions, there is a high probability that someone is always trying to take advantage of financial technology. Preventing illegal activities and safeguarding private data are paramount to every  organization. Although fraud is a substantial threat to the industry, the potential for new data science in finance efforts to overcome it is significant. The opportunities to combat fraudulent activity using AI systems and data analysis are unique to this industry.

New Data Discovery

Financial data has been collected for a long time and is highly processed, cleaned, and organized compared to other industries. This poses a challenge - familiarity with the typical forms of data means it is rare to gain innovative insights. However, this can be an opportunity for the industry to expand its interest in new data types. New data can serve as economic and investment indicators, giving the companies that cultivate and use it a competitive edge. For example, using satellite images of production areas to gauge the output of a product or studying Internet prices of goods to anticipate inflation. Every company and financial data scientist has access to the same data, but the key is finding an innovative way to analyze it in potentially new formats, reports, or analyses.

Explainable Models

The paradox of finance is that it is an exceptionally complex industry, yet its actions need to be easily explainable to non-technically inclined employees and clients. Financial clients and institutions need to understand what their investors and the markets are doing. Unfortunately, in its complexity, it is common for algorithms to have many variables and inputs. The more accurate a prediction is, the more complicated and unexplainable the model is. This lack of intuitiveness is a challenge for the industry and an opportunity for future data scientists in finance to overcome.

ESG Commitments

Companies are increasingly focused on environmental, social, and governance (ESG) goals. These are non-financial factors or commitments companies make towards sustainability, social impacts, and governmental regulations. ESGs are increasingly becoming another indicator for investors who use these goals to assess a firm’s long-term viability, and this analysis identifies material risks and growth opportunities. It is predicted that future investors will consume ESG data with equal importance to traditional financial data, which poses a significant opportunity within data science in finance for analyzing new ESG data.

Increased Digitization

A potential space for growth in the finance industry is an increase in digitization. While some sectors, such as investing, rely heavily on legacy systems, spreadsheets, and the analog world, the industry generally is moving in a more technologically advanced direction. With a growing portion of technologically able workers, finance has significant opportunities to increase digitization in all aspects of the industry. Data scientists will be valuable in creating technologies that usher in this new age. 

Financial Data Science Use Cases

There are many unique application opportunities of data science in finance. The industry is becoming increasingly dependent on data scientists to use the latest, most powerful algorithms to analyze data. One of the significant requirements of the industry is general data management, as it deals with an ever-growing amount of data.

Specific applications of data science in finance include:

  • Real-time stock market insights
  • Automated risk management
  • Fraud detection
  • Algorithmic trading
  • Consumer analytics and personalized services
  • Financial product development, pricing, and revenue optimization

graphic-data-science-in-risk-mgmt_900x600-min.png

As a financial data scientist, the opportunities are endless - from hedge funds, banking, insurance, brokerage firms, government, and many other financial institutions.

Financial Data Science Skill Sets

Wall Street is increasingly focused on using massive data sets for strategic goals such as forecasting future market behavior, identifying fraudulent trading activity, pricing sophisticated instruments and modeling the effects of news events to name a few.

This has spurred rapid growth in the development of data analysis methods in recent years. Whether it’s called data science, machine learning, or AI, these tools hold the potential to extract valuable insight from large, complex data sets.

graphic-ai-machine-learning_900x600-min.png

Unfortunately, indiscriminate application of these data analysis methods often leads to the exact opposite result. For example, a deep neural network has great flexibility to fit even to weak relationships found in a data set. But there are key questions one needs to ask:

  •  Are those data set relationships real?
  • Will the relationships persist when exploring a new data set from a different time period?

In fact, this problem of overfitting (as it is called) of complex models has led some to conclude that such approaches are not appropriate in finance, where spurious correlations, if overinterpreted, can lead to massive losses.

The issue is not in the method itself, but in its non-critical application. A neural network requires design decisions that can only be made appropriately when one has a strong understanding of its statistical and mathematical foundations.

The Carnegie Mellon University Master of Science in Computational Finance (MSCF) data science and machine learning curriculum is designed to address this issue of overfitting. It builds the math and statistics foundations underlying advanced data science methods, so that graduates will be capable of critically assessing implementations and their results.

This focus on foundational knowledge has the added benefit of making students better equipped to understand, adapt, and implement novel methods developed in the future, as these fields advance rapidly. With a strong framework, graduates will be able to pick up a paper and understand the newest methods, and their potential and limitations. They will also be able to identify where the method is best suited to help their firm, reducing the risk of falling into a trap of viewing machine learning methods as solely “black box” Python packages to be installed and fed data.

MSCF students learn practical issues of deploying data science and machine learning methods via linear and nonparametric regression models. The coursework also covers unsupervised learning, including clustering and dimension reduction approaches. MSCF students learn a host of other data science in finance topics such as decision trees, random forests, boosting, anomaly detection, markov decision processes, reinforcement learning, and mixture & topic models. Overall, the MSCF data science curriculum provides the same statistical content/methods as a one-year data science master’s degree program. 

Data Science in Quantitative Finance

MSCF alumni are successfully using data science skills in their current quant finance roles and are enjoying the many advantages of this dynamic industry; challenging & impactful work, a variety of roles/responsibilities, and the ability to innovate in their jobs.

Alumni Survey Report Cover

Download the MSCF alumni career report for information on salary statistics, job locations, promotions, top skill sets and more!