Carnegie Mellon University

Center for Informed Democracy & Social - cybersecurity (IDeaS)

CMU's center for disinformation, hate speech and extremism online

IDeaS Center for Informed Democracy & Social-cybersecurity

robots stand at laptop computers

February 08, 2022

Stabilizing a Supervised Bot Detection Algorithm

By Lynnette Ng

Tags: bot detection, algorithm stability, statistical analysis

Image caption: Robots stand at laptop computers
Image credit: Canva

Recent Publication:

Reference for paper: Lynnette Hui Xian Ng, Dawn C. Robertson, Kathleen M. Carley, Stabilizing a supervised bot detection algorithm: How much data is needed for consistent predictions?,Online Social Networks and Media,Volume 28,2022,100198,ISSN 2468-6964,

Direct link to paper, published Feb 8, 2022:


Social media bots are automated accounts controlled by algorithms rather than human users. Bot detection algorithms are a critical part of an information operations pipeline, typically used to identify the impact of these inauthentic accounts in an information manipulation campaign. 

Stable bot detection algorithms are thus critical to estimating the impact and effectiveness of these accounts in an information campaign. 

An unstable algorithm means that the classification of a user agent as a bot or not keeps changing: it can be classified as a bot in one time frame and classified as a non-bot in another timeframe. This will affect the reliability of characterizing user agents as bots to understand how bots affect an event like the US elections, because the user agents can well be misclassified, rendering the analysis invalid.

What is a stable bot detection algorithm?

A stable bot score is one that changes minimally across an investigation time frame, thus providing reliable characterization of an account - or agent’s - bot classification. A stable bot detection algorithm is one that consistently characterizes an agent as a bot or non-bot.

Bot detection algorithms typically return a bot probability score of the agent, identifying the likelihood of the agent being a bot or not. Characterization of bots typically rely on thresholding bot score values: an agent is a bot if the probability is above a certain threshold; and not a bot if it is below a threshold. 

We profiled stabilizing the bot detection algorithm, BotHunter  through answering the following two questions: 

  1. How much data is required for a stable bot probability score?
  2. What threshold is recommended for a stable bot detection algorithm?

To do so, we collected data on 5000 Twitter agents with 150 days worth of data. We ran the BotHunter service on the data of each user per day, and per threshold. BotHunter provides a bot-probability score of an agent that is between 0 and 1. An agent with a score that is closer to 0 is likely to be a non-bot, while an agent with a score that is closer to 1 is likely to be a bot.

How do we determine the amount of data required? 

We analyze the amount of data required through the stability of bot probability scores. That is, we analyze how bot scores change across an increasing number of days, and an increasing number of tweets. Through analyzing the evolution of bot scores across time and volume,  we observe that the bot score measured by a single day’s worth of tweets is more stable than that of a single tweet. This is likely because a single day worth of data has multiple tweets for the algorithm to work on. The ideal data size for bot score determination is at least 100 days worth of tweets or at least 400 tweets, because after that the changes in bot scores are negligible.

How do we determine the recommended threshold? 

Bot detection algorithms typically rely on classifying bot-probability score values to determine whether an agent is a bot or non-bot. Throughout literature, many different values have been used: Zhang et al  used 0.25; Boichak et al used 0.50; Ng and Carley used 0.70; Rauchfleisch and Kaiser used 0.76.

We set out to establish a baseline threshold value. To do so, we analyze the stability of bot classification, and by extension, the bot detection algorithm through patterns of agents that flip bot classification. An agent flips its bot classification when it was previously classified as a bot and currently classified as a non-bot, or vice versa

We do this against five threshold values: [0.25, 0.30, 0.50, 0.70, 0.75]. These threshold values are gathered from the values that are commonly used in bot classification using bot probability scores in the literature. We annotate the number of times an agent flips per threshold value. We analyze the percentage of agents that flip at least once during the analysis timeframe, dividing the dataset into bots and non-bots based on their initial classification. 

At the 0.75 threshold value, approximately 13% of agents flip bot classification, which is consistent with other studies Additionally, the peak number of days where bot classification changes is 10 days. At 10 days, 8.98% of agents change classification. After which, the proportion of agents changing classification decreases across time. 

What about flipping bot classification is important? 

Agents that flip bot classification means that they are classified as a bot in one time frame but are characterized as a non-bot in another time frame. Most agents flip only once throughout the analysis time frame. 61.9% of agents that are originally classified as non-bots flip classification, while 71.2% of bots flip classification. This larger proportion of initial non-bots flipping indicates that when the algorithm initially classifies an agent as a bot, there is a higher probability of it staying as a bot. The algorithm is thus conservative in labeling an agent as a bot.

We identify a subset of accounts that flip multiple times as an indicator that they are cyborg accounts - accounts that interweave characteristics between bots and non-bots . The largest number of flips observed is 19 times with about 500 tweets for 0.02% of the agents in the dataset. While this piece does not study cyborg accounts in detail, this finding opens up avenues for further in-depth work in profiling these agents that could appear as both bots and non-bots to bot-detectors and human users. 

Any other observations?

We make further observations at the 0.70 bot classification level. We observe that bot agents have a higher follower count and a larger number of tweets across their entire life span as compared to non-bot agents. However, bot agents have a higher friend count. That means they garner a huge following, but these following are not reciprocated. 

13.6% of the original 5000 agents were suspended six months after the initial data collection. In these accounts, 60% of them were classified as bots and 66.6% of these accounts flipped at least once. We statistically test hypotheses relating to bot-probability scores, number of flips and suspension of agents and we conclude that the bot probability score and the number of flips are related to the suspension of an agent. 

Summarization of parameters for BotHunter algorithm

In applying a bot-detection algorithm, while more data is preferred for bot detection analyses, there are instances where more data cannot be obtained. In such cases, we have a couple of suggestions when using the BotHunter algorithm: 

  1. For a consistent bot probability score, a reasonable data collection size is at least 20 days of tweets or 40 tweets.
  2. In terms of bot prediction algorithm stability, a recommended threshold level is 0.70. For a consistent bot classification score, a recommended collection size is at least 10 days of tweets or 20 tweets.