Google recently revealed a sustained effort by “commercially motivated” actors to replicate its Gemini AI chatbot through a large-scale prompting campaign. The attack involved over 100,000 prompts, many in non-English languages, aimed at extracting information to train a competing model – a tactic Google terms “model extraction.” This incident highlights the growing concern around intellectual property protection in the rapidly evolving landscape of generative artificial intelligence.
The company detailed the attempted cloning in a self-assessment of threats to its products, framing itself as both the target and the defender. While such reports are common from tech giants, this instance sheds light on the increasingly sophisticated methods being used to circumvent the substantial costs and expertise required to build advanced LLMs like Gemini. The core issue revolves around the potential for unauthorized duplication of AI capabilities, raising questions about the future of innovation and competitive advantage in the AI sector.
Google’s response underscores the challenges of safeguarding proprietary AI models. The company’s terms of service explicitly prohibit data extraction, but enforcement remains a complex issue. This isn’t the first time Google has faced accusations related to AI model training practices. In 2023, The Information reported that Google’s Bard team was accused of utilizing outputs from ChatGPT, sourced from the ShareGPT website, to enhance its own chatbot’s training data. Jacob Devlin, a senior AI researcher at Google and creator of the BERT language model, reportedly raised concerns about violations of OpenAI’s terms of service before resigning and joining OpenAI. Google denied the allegations but reportedly ceased using the data in question.
What is Model Distillation?
The technique employed in these cloning attempts is often referred to as “distillation” within the AI industry. Essentially, distillation allows developers with limited resources to leverage the knowledge embedded in larger, more complex models – like Gemini – to train smaller, more efficient models. Instead of building an LLM from scratch, which requires massive datasets and computational power, developers can use a pre-trained model as a “teacher” to guide the learning process of a “student” model. This shortcut can significantly reduce development time and costs, but also raises ethical and legal concerns when performed without authorization.
Google believes the actors behind the recent prompting campaign are primarily private companies and researchers seeking a competitive edge. The attacks originated from various locations worldwide, but the company has not publicly identified any specific suspects. The scale of the attack – over 100,000 prompts – suggests a coordinated and deliberate effort to bypass Gemini’s safeguards and extract valuable training data. According to Google’s Gemini website, the model is capable of processing text, code, images, audio, and video simultaneously, making it a particularly attractive target for replication.
The incident comes as Google continues to refine and release new versions of Gemini. As of February 16, 2026, the latest models include Gemini 3 Pro, released November 18, 2025, and Gemini 3 Deep Think, released just two days prior on February 12, 2026. Other models in the Gemini family include Flash and various iterations of the 2.5 and 1.0 versions, as detailed on Wikipedia.
The broader implications of this attempted cloning extend beyond Google. It highlights the need for robust security measures and intellectual property protections within the AI ecosystem. As generative AI becomes increasingly integrated into various aspects of life, safeguarding these technologies from unauthorized replication will be crucial for fostering innovation and maintaining trust. The incident also raises questions about the effectiveness of current terms of service and the challenges of enforcing them in a globalized digital environment.
Looking ahead, we can expect to see continued efforts to refine AI security protocols and develop new methods for detecting and preventing model extraction attacks. The ongoing cat-and-mouse game between AI developers and those seeking to exploit their technology will likely shape the future of AI innovation and deployment. The development of more sophisticated watermarking techniques and access controls could play a key role in protecting AI models from unauthorized copying.
What are your thoughts on the security challenges facing generative AI? Share your comments below and let us know what you think should be done to protect these powerful technologies.