ANNA JACOBSON

The Askeladden Algorithm

UC Berkeley | Applied Machine Learning
Published in Towards Data Science
Anna Jacobson, Laura Pintos, and Ramiro Cadavid

Project Description

A machine learning project to develop an algorithmic classification system to help Twitter to identify Internet Research Agency “fake news” Russian troll tweets.

Skills

Machine Learning, Natural Language Processing (NLP), Web Scraping, Data Cleansing, Written Communication, Data Visualization

Tools

Python (with SciKit-Learn, Pandas, NumPy, and Matplotlib) and Jupyter Notebook

Motivation

In February 2019, as part of special counsel Robert Mueller’s investigation of the Russian government’s efforts to interfere in the 2016 presidential election, the United States Department of Justice charged 13 Russian nationals with illegally meddling in American political processes. The defendants worked for a well-funded “troll factory” called the Internet Research Agency (IRA), which reportedly had 400 employees, or “trolls”, working 12-hour shifts from a nondescript business center in St. Petersburg. The IRA ran a sophisticated, coordinated campaign to spread disinformation and sow discord into American politics via social media, often Twitter.

Twitter has identified and suspended thousands of these malicious accounts, deleting millions of the trolls’ tweets from public view on the platform. While other news outlets have published samples, it has been difficult to understand the full scale and scope of the IRA’s efforts, as well as the details of its strategy and tactics. According to Alina Polyakova, a foreign policy fellow at the Brookings Institution, “Wiping the content doesn’t wipe out the damage caused, and it prevents us from learning about how to be better prepared for such attacks in the future.” To address this problem, and “in line with our principles of transparency and to improve public understanding of alleged foreign influence campaigns,” Twitter has now made publicly available archives of Tweets and media that it believes resulted from potentially state-backed information operations.

According to a December 2018 United States Senate Select Committee on Intelligence briefing, there were approximately 109 Twitter accounts masquerading as news organizations, including U.S. local news organizations. The 44 U.S.-related accounts had amassed 660,335 followers between them, with an average of 15,000 followers. Many of these accounts behaved similarly, posting links to articles and local content dozens of times per day.

The purpose of this project is to develop a machine learning algorithm to predict “fake news” troll tweets. Our algorithm is named after Askeladden, a boy in Norwegian folklore who outwits trolls.

See the complete project repository with code at The Askeladden Algorithm.