Home » Technology » Know where the artificial intelligence information you use comes from: Google Gemini, ChatGPT, Meta and more

Know where the artificial intelligence information you use comes from: Google Gemini, ChatGPT, Meta and more

by James Carter Senior News Editor

AI Chatbots Rely Heavily on Reddit and Wikipedia, New Investigation Reveals

Breaking News: Ever wondered where ChatGPT, Google’s AI Overviews, and other chatbots get their information? A new study from marketing platform Semrush has pulled back the curtain, revealing that Reddit and Wikipedia are the dominant sources fueling these increasingly popular AI systems. This discovery has significant implications for how we perceive the accuracy and reliability of AI-generated content, and it’s a story that’s rapidly gaining traction in Google News.

Reddit Takes the Lead: Why Forums Are AI Goldmines

The Semrush investigation, analyzing 150,000 AI responses from June 2025, found that a staggering 40.1% of the information originated from Reddit. It’s not just the sheer volume of content on the platform; it’s the type of content. Reddit’s forum structure fosters a dynamic exchange of experiences, technical advice, and user reviews – precisely the kind of real-world data AI algorithms crave. Think of it as AI learning directly from the collective wisdom (and sometimes, the collective missteps) of millions of users. This makes Reddit a central reference point for AI’s automatic information gathering and synthesis.

Wikipedia: The Collaborative Encyclopedia Remains Crucial

Following closely behind, Wikipedia accounted for 26.3% of the sources identified. The open, collaborative nature of Wikipedia, allowing for constant updates and broad thematic coverage, makes it an invaluable resource for AI. It’s a constantly evolving knowledge base, and AI systems are designed to tap into that continuous flow of information. This isn’t a new phenomenon; a 2023 Washington Post report, in collaboration with the Allen Institute for AI, highlighted Wikipedia’s prominence in the datasets used to train large language models like Google’s T5 and Meta’s LLaMA.

Beyond Reddit & Wikipedia: The Data Ecosystem of AI

While Reddit and Wikipedia dominate, the picture is more complex. The analysis also identified the importance of specialized sources like patents.google.com (a leader in text volume), Scribd’s digital library, video game forums, and technical catalogs. The common thread? High data density and collaborative use. AI isn’t just scraping random websites; it’s prioritizing repositories where information is concentrated and actively maintained. This is a key insight for anyone interested in SEO and understanding how content ranks in AI-powered search results.

AI Developers Weigh In: Transparency and Source Disclosure

Interestingly, the developers of leading AI models are beginning to address the question of source material. OpenAI’s ChatGPT acknowledges its training included “a combination of public sources, licensed texts, and material produced by human instructors.” However, ChatGPT also admits it “does not have direct access to private databases” and can’t cite specific posts from platforms like Reddit or X. Google’s AI Mode, powering AI Overviews, emphasizes its reliance on Google’s vast public index, with a focus on filtering reliable and current sources and providing direct links to original sites. This push for transparency is a crucial step in building trust in AI-generated information.

The Future of AI and Information: A Shifting Landscape

The interplay between independent analyses like Semrush’s and statements from AI developers reveals a dynamic and evolving data ecosystem. As AI continues to advance, the sources it relies on will undoubtedly shift. Understanding these sources – and their inherent biases – is critical for evaluating the information we receive from AI systems. It’s a reminder that AI isn’t a magical oracle; it’s a powerful tool built on the foundation of human-created content. Staying informed about these developments is essential in navigating the increasingly AI-driven world, and Archyde will continue to bring you the latest breaking news and insightful analysis.

You may also like

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Adblock Detected

Please support us by disabling your AdBlocker extension from your browsers for our website.