May 12, 2023
Tepper School Researchers Team With Fidelity Investments’ AI Center of Excellence
In a long-term industrial-academic collaboration, researchers from Carnegie Mellon University’s Tepper School of Business have partnered with researchers from Fidelity Investments’ AI Center of Excellence on a study about pattern mining — an essential part of knowledge discovery and data analytics that is powerful, especially when combined with constraint reasoning.
In a recent study published in AI Magazine, researchers from the Tepper School and Fidelity Investments showcased Seq2Pat, a research library for sequence-to-pattern generation, to discover sequential patterns that occur frequently in large-scale sequence databases. In addition, the library supports constraint-based reasoning to specify the desired properties of patterns.
In a case study, the researchers used Spotify to demonstrate how Seq2Pat can find patterns that help explain why users skip songs. Other case studies include shopper intent prediction in e-commerce from browsing activity and intrusion detection of hostile users in security applications. These highlight the benefits in industrial settings where scalability, explainability, rapid experimentation, reusability, and reproducibility are of practical interest.
The study also bridged sequential pattern mining (SPM) with supervised machine learning via dichotomic pattern mining (DPM). DPM uses the dichotomy between outcomes correlated with patterns that distinguish them uniquely. The authors presented an automatic feature extraction powered by Seq2Pat and DPM to produce insights and boost machine learning models in downstream applications.
“SPM can be used to analyze medical treatment history and customer purchases, among other applications,” notes Willem-Jan van Hoeve, Carnegie Bosch Professor of Operations Research and Senior Associate Dean of Education of the Tepper School.
The technical development of this work started at the Tepper School with an article at the Association for the Advancement of Artificial Intelligence (AAAI) 2019 conference. The pattern mining algorithm was then embedded into the Seq2Pat Python library as a joint effort between Fidelity and Carnegie Mellon University, which was published in AAAI 2022. Later on, the method was extended into dichotomic pattern mining at the Knowledge Discovery from Unstructured Data in Financial Services (KDF@AAAI’22) and shown to be effective for industrially-relevant applications in digital behavior analysis (Frontiers in AI’2022).
The Seq2Pat library takes advantage of multi-valued decision diagrams. It is based on the state-of-the-art approach for sequential pattern mining developed by Van Hoeve and two of his former Ph.D. students, Amin Hosseininasab (now at the University of Florida) and Andre Cire (now at the University of Toronto) from AAAI 2019.
“Sequential data is ubiquitous in the industry ranging from digital activity in clickstream to user journeys across multiple touchpoints. However, despite the value of sequential information, there is a lack of established methodology and toolset to bridge the gap between machine learning and sequential pattern mining, and this poses a major barrier to developing large-scale applications in the industry,” suggests Serdar Kadioglu, Group Vice President of Artificial Intelligence at Fidelity Investments and Adjunct Associate Professor of Computer Science at Brown University.
“With our open-source Seq2Pat library, we contribute an efficient tool for sequence modeling, and with our DPM framework, we provide an effective methodology to associate frequent patterns with positive outcomes. We look forward to further innovative applications of this technology.”
The article, Seq2Pat: Sequence-to-Pattern Generation to Bridge Pattern Mining with Machine Learning, appears in the AI Magazine and is authored by Kadioglu, S (Fidelity Investments and Brown University), Wang, X (Fidelity Investments), Hosseininasab, A (University of Florida), and van Hoeve, W-J (Carnegie Mellon University. Copyright 2023 The Authors. All rights reserved.