Carnegie Mellon University

Humans in Loop for Machine Learning in Financial Services and other E-Commerce Companies

iMerit is a unique for-profit social impact company providing on-demand digital dataset services to global clients. iMerit performs data tasks in the image and text domains, in application areas of e-commerce and machine learning. This project with select an iMerit project, and build a usable filter algorithm. This will include selecting the technique, working to build a minimal useful dataset, and integrating it with a workflow tool for performing tasks with the human-in-loop. We list below the potential candidates, from among iMerit projects.

1. Entity Extraction and Classification Project

A major ecommerce company provides text paragraphs which are general descriptive paragraphs (like Wiki entries). Several predefined entity categories are provided, for example, “person”, celebrity, object, place, quantitative data.

The categories can be overlapping, they can be concrete or abstract concepts. The task is to select as many useful terms from the text and classify each word or term into as many applicable categories as possible.

For example, the term “Ten Dollars” would be quantitative and financial data. But “50 Cent” (the rapper) would be a person and celebrity, based on context.

This is a hard task for impact employees, due to the open-ended nature and the amount of language, world and contextual knowledge needed, to perform it. Also, as it is judgment based, finding agreement / QCing it are equally challenging tasks. An ML algorithm can ease this task as checker.

2. Image style classification

An e-commerce retailer wishes for various home furnishing items to be classified in predetermined design styles, e.g. neo-classical, rustic, modern, baroque. Examples of each style category are given as a reference. The end result improves search ability and recommend ability of similar products, for the customer experience. The interesting challenge here would be to find suitable features to build a good enough classifier as a pre-filter, or else to narrow down the recommendations of candidate styles, for the human-in-loop to need to consider fewer categories while seeking the answer.

Rahul Telang

Rahul Telang

Project Lead

Sunder Kekre

Sunder Kekre

Project Lead

Roni Rosenfeld

Roni Rosenfeld

Project Lead