Artificial intelligence tools are now capable of identifying individuals behind anonymous social media accounts with alarming accuracy, according to novel research published this week. The findings, released by AI researchers Simon Lermen and Daniel Paleka, demonstrate that large language models (LLMs) – the same technology powering chatbots like ChatGPT – can effectively “deanonymize” online users by correlating their posts with information available elsewhere on the internet.
The study, detailed in a pre-press paper titled “Large-scale online deanonymization with LLMs,” reveals that LLMs can match pseudonymous profiles to real-world identities based on seemingly innocuous details. Researchers found that even subtle clues, such as a user mentioning struggles in school and a dog named Biscuit while referencing Dolores Park, were sufficient for the AI to identify the individual with a high degree of confidence. While the example is hypothetical, the implications are far-reaching.
“We show that LLM agents can figure out who you are from your anonymous online posts,” said Lermen, an AI engineer at MATS Research. “Across Hacker News, Reddit, LinkedIn, and anonymized interview transcripts, our method identifies users with high precision – and scales to substantial populations.”
The research highlights a fundamental shift in online privacy. Previously, sophisticated privacy attacks required significant manual effort and expertise. Now, the study indicates that malicious actors need only access publicly available language models and an internet connection to perform these attacks. This dramatically lowers the barrier to entry for those seeking to unmask anonymous users.
The potential consequences are varied and concerning. Researchers warn of increased risks of highly personalized scams, such as spear-phishing attacks, where hackers leverage gathered information to pose as trusted contacts. The study also raises alarms about the potential for governments to use AI for surveillance of dissidents and activists who rely on anonymity to express their views safely. According to the Guardian, this forces a “fundamental reassessment of what can be considered private online.”
The technology isn’t foolproof, however. Experts caution that LLMs are prone to errors and can falsely link accounts, potentially leading to wrongful accusations. “People are going to be accused of things they haven’t done,” warned Peter Bentley, a professor of computer science at UCL. The effectiveness of deanonymization depends on the consistency of information shared across different platforms. “They can only link across platforms where someone consistently shares the same bits of information in both places,” explained Prof Marti Hearst of UC Berkeley’s school of information.
Beyond social media, cybersecurity lecturer Marc Juárez at the University of Edinburgh, points to the vulnerability of seemingly anonymized datasets. He argues that hospital records, admissions data, and statistical releases may not meet the increasingly stringent standards of anonymization required in the age of AI. “It’s quite alarming. I think this paper is showing that we should reconsider our practices,” Juárez said.
Lermen recommends that platforms implement measures to restrict data access, including enforcing rate limits on user data downloads, detecting automated scraping, and limiting bulk data exports. He also emphasizes the importance of individual users being more cautious about the information they share online.
As of today, no major social media platforms have publicly announced plans to alter their data access policies in response to the research. The study’s authors are continuing to investigate the limitations of the technology and potential mitigation strategies.