Carnegie Mellon University

Center for Informed Democracy & Social - cybersecurity (IDeaS)

CMU's center for the study of disinformation, hate speech and extremism online

IDeaS Center for Informed Democracy & Social-cybersecurity

city ominous

March 20, 2023

Search engine manipulation to spread pro-Kremlin propaganda

By Evan Williams

Tags: Search Engines; Propaganda; Russia; Social Media

Image Generated using DALL-E Mini via craiyon.com 

Background

In 2014, Vitaly Bespalov was hired as a writer by the Internet Research Agency, a troll farm that was indicted by a US federal grand jury for US election interference in 2018. On an average day, Bespalov recounts being given an article on Ukraine and being asked to rewrite it 20 times, each time keeping about 70% of the text [1]. Bespalov would change words like “terrorist” to “militia” or write “national guard” instead of “Ukranian Army.” These articles would be posted on websites that appeared to be Ukrainian, but which were secretly hosted in Russia [1]. The goal of this operation, according to Bespalov, was to get the articles to the top of search engine results.

While Kremlin-linked attempts to manipulate social media have received widespread attention in academia and popular media, very little work has probed the Search Engine Optimization (SEO) ecosystems of Kremlin-linked domains. However, this is an important area to explore, as search engine rankings can have a substantial impact on what people see and how they make decisions. A 2013 analysis of 300 million search engine clicks found that 92% were on the first page, and 51% of those clicks were for the first or second result [2]. This also has political implications, as lab studies have found search results can impact decisions of undecided voters by 20% or more [3]. This blog post summarizes our exploration of this ecosystem in a recent Harvard Kennedy School Misinformation Review paper of the same name.

Data

To search for signs of pro-Kremlin SEO manipulation, we take advantage of the Kremlin’s investment in soft-power initiatives. We explore the webgraph networks of 1) Kremlin-linked Russian think tanks that target Russian audiences, and 2) pro-Kremlin “Pseudo” think tanks. We define pseudo-think tanks as Pro-Kremlin entities that blur the line between propaganda, misinformation, and think tanks that target primarily Western audiences. We contrast these think tank networks with 3) US conservative think tanks and 4) western European think tanks. For a complete list of think tanks and how each think tank was identified and chosen, see the Methods section of our paper [4]. We use the tool Ahrefs—the largest commercial webcrawler after Google—to pull the top 1,000 domains which link most frequently to each of the 31 think tanks we examine. We also pull the top 1,000 keyphrases—search terms for which they most highly rank on Google—for each of the domains.

Findings

1) Pro-Kremlin websites are heavily amplified by domains seemingly built for generating backlinks

evan-blog-2.png

Figure 1. Left: top 15 think tanks by backlink volume, Right: Top 15 backlinking websites  

We find highly imbalanced backlink volume across the networks. Global Research, a single pro-Kremlin pseudo think tank, received 22.1 million backlinks, more than all US, European, and Russian think tanks combined. The American think tanks we observed received 14.1 million backlinks, European think tanks received 4.6 million, and Russian think tanks received 1 million links. While pseudo think tanks received the most links, most came from very low-quality domains, i.e., domains with low pagerank scores. Figure 1 shows the top 15 think tanks by backlink volume and the top 15 backlinking domains.

Within the set of backlinking domains, Distinctionsmatter.com generated the highest volume of links to think tanks, and all of those links were to pseudo think tank domains. Distinctionsmatter linked to Global Research over 6.6 million times alone. This site bears many of the markers of a “link scheme” website—a website created to solely generate links for other domains. Distinctionsmatter has no “about” or contact information for the author, there are no ads on the website, and the page generates millions of links to unreliable news domains. The site also consistently posts content in line with Kremlin geopolitical interests. We find these same patterns in all but 2 of the top 15 backlinking domains; a more detailed exploration can be found in [4].

2) Keyphrases of pseudo-think tanks exhibit high internal overlap and appear to target conspiracy theorists

While many of the mentioned keywords are names of people, we also observed many highly-specific conspiratorial keyphrases. These keyphrases may be exploiting “data voids.” Search engines assume all queries have a relevant result, but when a query is new or related to a highly-specific conspiracy for which there are no relevant results, unreliable sites can rank highly [5]. Within the pseudo think tank keyphrase group, we saw shared keyphrases like “Is Zelenksy a drug addict,” “Zelensky on cocaine,” “neutron bomb in Yemen,” and “subcortical dementia Hillary Clinton.” There were some shared keyphrases between the US and pseudo think tank groups. The Hudson Institute shared “climate change money trail,” three variations of “mike Pompeo speech,” and six keyphrases suggesting former CIA director John Brennan voted communist. Between EU and pseudo think tanks, the only shared keyphrases were “great prophent 17” and “europe’s reaction to Donald trump.” The Russian think tanks publish articles primarily in Russian, so had very little keyphrase overlap with other think tank groups. Additionally, we find that the average Google position rankings for keywords in each think tank group are US(7), Europe(11), Russia (35), and pseudo think tanks (34). This suggests that, on average, US and European think tanks rank higher than Pseudo think tanks despite receiving fewer backlinks. This suggests that Google may be penalizing pseudo think tank domains in spite of or perhaps because of their suspicious backlink activity.

evan-blog-3.png 

Figure 2: keyphrase network visualization: Grey nodes are think tanks. Blue nodes are European Keyphrases, Teal nodes are Russian Keyphrases, Green nodes are US keyphrases, yellow nodes are pseudo think tank key phrases, and red nodes are keyphrases shared across think tank groups.

3) Many pseudo think tanks are strongly amplified by the same websites

We constructed a co-amplification network to look at the pairwise amplification between each set of domains. We define co-amplification as the sum of the minimum of each tie between websites i and j. A more formal definition and details on co-amplification construction can be found in [4]. We find that many of the same domains heavily linked to the same set of websites. The sites with the highest overall co-amplification scores were Global Research (4.4M), Strategic Culture Foundation (4M), Heritage Foudnation (3.8M), New Eastern Outlook (3.2M), and American Enterprise Institute (2.7M). These domains all received high volumes of links from the same sets of websites.

Conclusion

We conclude that further research of this space is needed. Our analysis finds evidence that there may be attempts to manipulate search engines to drive users to pro-Kremlin pseudo think tanks. While Google appears to be penalizing Russian and pseudo think tanks, we do not have enough data to determine how widespread these manipulation behaviors are or how Google’s penalization algorithm generalizes to other contexts. We also note that even if Google perfectly penalized these domains, that would not necessarily stop traffic- as users can still traverse the web through links found on other webpages or on social media. We note that search engines have monetary interest in stopping search manipulation, which makes this an area where regulators, search engine companies, and stakeholder interests are aligned. More research is needed to flesh out the activity we observe and to better understand how manipulation of search engines relates to the spread of misinformation on social media.

References

[1] Popken, B., & Cobiella, K. (2017, November 16). Russian troll describes work in the infamous misinformation factory. NBC News. https://www.nbcnews.com/news/all/russian-troll-describes-work-infamous-misinformation-factory-n821486

[2] Chitika Insights (2013). The value of Google result positioning. Chitika Insights. https://research.chitika.com/wp-content/uploads/2022/02/chitikainsights-valueofgoogleresultspositioning.pdf

[3] Epstein, R., & Robertson, R. E. (2015). The search engine manipulation effect (SEME) and its possible impact on the outcomes of elections. Proceedings of the National Academy of Sciences, 112(33), E4512–E4521. https://doi.org/10.1073/pnas.1419828112

[4] Williams, E. M. & Carley, M. C., (2023). Search Engine Manipulation to Spread pro-Kremlin propaganda. The HKS Misinfo Review. 3(2), 1-14. https://misinforeview.hks.harvard.edu/article/search-engine-manipulation-to-spread-pro-kremlin-propaganda/

[5] Golebiewski, M. & Boyd, D. (2019). Data Voids: where missing data can easily be exploited. Data & Society Research Institute. https://datasociety.net/library/data-voids/