AI Data Rush: How People Are Paid to Train Artificial Intelligence

Jacobus Louw, a 27-year-old resident of Cape Town, South Africa, earned $14 last year by recording videos of his daily walk to feed seagulls. The footage, capturing his feet and the surrounding pavement, was submitted to Kled AI, an application that compensates users for providing data used to train artificial intelligence models. For Louw, the earnings represented roughly half a week’s worth of groceries, and approximately ten times the country’s minimum wage.

Louw’s experience is part of a growing trend of “gig AI training,” where individuals worldwide are monetizing their data – images, videos, audio recordings, and even private conversations – to fuel the rapidly expanding artificial intelligence industry. Thousands of miles away in Ranchi, India, Sahil Tigga, a 22-year-old student, supplements his income by allowing Silencio, an audio data crowdsourcing platform, access to his phone’s microphone to capture ambient sounds. In Chicago, 18-year-old welding apprentice Ramelio Hill earned several hundred dollars by selling recordings of personal chats to Neon Mobile, a conversational AI training platform.

This emerging data marketplace addresses a critical challenge facing AI developers: a shortage of high-quality, human-generated data. As Silicon Valley’s appetite for training material outpaces what can be readily scraped from the public internet, companies are turning to individuals for direct contributions. Apps like Kled AI and Silencio are facilitating this exchange, offering financial incentives in return for access to personal data.

However, this new gig economy is not without its drawbacks. Participants are potentially fueling an industry that could automate their own jobs, while simultaneously exposing themselves to risks such as deepfakes, identity theft, and digital exploitation. The terms of service offered by these platforms often grant broad, irrevocable licenses to user data, raising concerns about long-term control and potential misuse.

The demand for data is driven by the limitations of existing datasets. Language models like ChatGPT and Gemini require vast amounts of learning material to improve, but sources like C4, RefinedWeb, and Dolma – which account for a significant portion of high-quality data – are increasingly restricting access for generative AI training. Researchers predict that AI companies could exhaust readily available high-quality text data as early as 2026. Relying on AI-generated synthetic data as a replacement can lead to errors and model instability.

Beyond Kled AI, Silencio, and Neon Mobile, a range of platforms are entering the AI training space. Luel AI, backed by Y-Combinator, offers approximately $0.15 per minute for multilingual conversations. ElevenLabs allows users to digitally clone their voice for a fee of $0.02 per minute of usage. Bouke Klein Teeselink, an economics professor at King’s College London, anticipates substantial growth in this emerging category of work.

AI companies are motivated to pay for data to mitigate copyright risks associated with web scraping, and to secure the high-quality data needed to refine their models, according to Veniamin Veselovsky, an AI researcher. “Human data, for now, is the gold standard to sample from outside of the distribution of the model,” Veselovsky said.

The economic incentives are particularly strong in developing countries, where individuals may have limited alternative employment options and the opportunity to earn US currency can be highly valuable. Jacobus Louw, for example, struggled to find work due to a nervous disorder but used earnings from AI marketplaces to fund a spa training course to become a masseur. “As a South African, being paid in USD is more worth it than people think,” he said.

However, experts caution that this work is often precarious and offers limited long-term prospects. Mark Graham, a professor of internet geography at the University of Oxford and author of Feeding the Machine, warns of a “race to the bottom in wages” and a “temporary demand for human data.” He argues that workers are left vulnerable with “no protections, no transferable skills, and no safety net” when demand shifts.

Concerns about data privacy and security are likewise mounting. Ramelio Hill, the Chicago-based AI trainer, earned $200 by selling his phone calls to Neon Mobile, but became concerned when the app frequently went offline and failed to release payments. His worries were amplified when TechCrunch reported a security flaw in September that exposed user data, including phone numbers and call transcripts. Neon Mobile did not notify users about the breach, leaving Hill uncertain about how his voice data might be misused.

Data privacy researcher Jennifer King, at the Stanford Institute for Human-Centered Artificial Intelligence, highlights the lack of transparency surrounding data deployment. Users often grant broad licenses without fully understanding how their data will be used or having recourse if it is repurposed in undesirable ways.

Agreements with platforms like Kled AI and Neon Mobile typically grant companies worldwide, exclusive, irrevocable, and royalty-free licenses to utilize, sell, and create derivative works from user data. Avi Patel, founder of Kled AI, maintains that his company limits data use to AI training and research, vets businesses before selling datasets, and avoids collaborations with entities involved in pornography or potentially harmful government applications.

Legal experts, such as Enrico Bonadio, a law professor at City St George’s, University of London, point out that these agreements effectively grant platforms and their clients carte blanche to exploit user data with minimal restrictions. The potential for misuse extends to deepfakes and impersonation, even if platforms claim to anonymize data, as biometric patterns are difficult to fully obscure.

Adam Coy, an actor from New York, sold his likeness to Captions (now Mirage) in 2024 with certain restrictions, including prohibitions against political use or association with alcohol or tobacco. However, he later discovered videos online featuring his AI replica promoting unproven medical supplements, prompting feelings of embarrassment and regret. He has since refrained from participating in AI data gigs, expressing a willingness to consider them only with significantly higher compensation.

Photo of author

Iran’s Khamenei Losing Control? US & Israel Intelligence Reports

Ancient Sharks Fed on Whales in North Sea 5 Million Years Ago: Study

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.