Center for Informed Democracy & Social - cybersecurity (IDeaS) › News › Research Spotlight: Quantifying Polarization through Machine Translation

Center for IDeaS

Research Spotlight: Quantifying Polarization through Machine Translation

In this paper, the authors propose a novel framework to quantify political polarization using machine translation, focusing on user comments on YouTube news videos from four prominent US cable news networks (CNN, Fox News, MSNBC, One America News Network)

The Center for Informed Democracy and Social-cybersecurity (IDeaS), CMU's center for Disinformation, Hate Speech and Extremism Online has a new publication:

We Don't Speak the Same Language: Interpreting Polarization through Machine Translation. Ashiqur R. KhudaBukhsh*, Rupak Sarkar*, Mark S. Kamlet, Tom M. Mitchell. 35th AAAI Conference on Artificial Intelligence (AAAI 2021).

Available online here: https://arxiv.org/pdf/2010.02339.pdf

Polarization among US political parties, media and elites is a widely studied topic. Prominent lines of prior research across multiple disciplines have observed and analyzed growing polarization in social media. In this paper, we present a new methodology that offers a fresh perspective on interpreting polarization through the lens of machine translation. With a novel proposition that two sub-communities are speaking in two different languages, we demonstrate that modern machine translation methods can provide a simple yet powerful and interpretable framework to understand the differences between two (or more) large-scale social media discussion data sets at the granularity of words.

This paper:

Develops a quantifiable framework to evaluate how similar or dissimilar web-scale discussions of two sub-communities are by offering a fresh perspective on interpreting linguistic manifestation of polarization through the lens of machine translation.
Presents an efficient way to identify and understand issue-centric differences by examining a few hundred salient translation pairs, rather than millions of social media posts.
Demonstrates that modern machine translation methods can provide a simple yet powerful and interpretable framework to understand the difference between two or more large-scale social media discussion data sets at the granularity of words.
Opens the possibility for using machine translation on other social media platforms and expanding use to the level of phrases and sentences.

Authors Ashiqur R. KhudaBukhsh, Rupak Sarkar, Mark S. Kamlet and Tom M. Mitchell