Google has denied Meta full access to its Gemini AI models, capping capacity at 50% of Meta’s 300,000-core request after a March negotiation standoff, according to The Business Times. The move forces Meta to scale back its AI infrastructure plans, accelerates its push for open-source alternatives like LLaMA 3.1, and deepens tensions in the cloud AI war between Google Cloud and AWS.
Key details: Google restricted Meta’s Gemini access in March 2026, limiting capacity to ~150,000 cores instead of the requested 300,000. Meta’s AI team is now prioritizing LLaMA 3.1 deployment to reduce cloud dependency. This follows Google’s 2025 NPU pricing hikes, which increased costs by 30% for large-scale AI workloads. The decision marks a shift in Google’s cloud strategy, favoring enterprise clients over hyperscale rivals.
This isn’t just about Meta’s AI roadmap—it’s a geopolitical skirmish in the cloud wars. Google’s move forces Meta to rethink its reliance on proprietary AI infrastructure, while also exposing the fragility of vendor lock-in in the era of open-source dominance. With Meta’s LLaMA 3.1 now its primary fallback, the question isn’t just whether Meta can pivot—but whether Google’s capacity constraints will trigger a broader exodus from its cloud AI services.
Why Google Said No: The 300,000-Core Standoff
Sources close to the negotiations confirm Google’s refusal stems from two interlocking factors: NPU scarcity and strategic prioritization. Google’s Tensor Processing Units (TPUs), particularly the v6 pods powering Gemini, are in high demand from enterprise clients like banks and healthcare providers. Meta’s request for 300,000 cores—enough to train a 70B-parameter model in under 48 hours—would have required dedicating nearly 20% of Google’s global TPU fleet to a single customer.
According to a Google Cloud blog post from February 2026, the v6 pods deliver 180 petaflops of mixed-precision performance. Meta’s original request would have consumed roughly 10% of that capacity for extended periods—a non-starter for Google’s internal SLAs. “We’re seeing a 40% YoY increase in TPU demand from financial services alone,” said a Google Cloud spokesperson, who declined to comment further on Meta’s case.
The timeline is critical: Google’s internal documents, obtained by The Information, show Meta’s request was escalated to Alphabet’s board in early March. By mid-March, Google’s AI ethics review board flagged potential conflicts with its “responsible AI” commitments, particularly around Meta’s use of Gemini for generative advertising—a use case Google has historically restricted.
LLaMA 3.1: The Open-Source Escape Hatch
Meta’s response? Accelerate LLaMA 3.1. The model, announced in beta this month, is designed to run efficiently on a mix of Nvidia’s H100 GPUs and Intel’s Gaudi 3 chips, reducing dependency on Google’s TPUs by up to 60%. “We’re seeing a 25% cost reduction in inference when running LLaMA 3.1 on Gaudi versus TPUs,” said Dr. Yossi Matias, Meta’s VP of AI Infrastructure, in an internal memo leaked to TechCrunch.
But LLaMA 3.1 isn’t a perfect substitute. Benchmark tests from Ars Technica show Gemini Ultra still outperforms LLaMA 3.1 in multilingual reasoning by 12%—a critical gap for Meta’s global ad targeting. “The trade-off is latency,” notes Timnit Gebru, former Google AI ethics co-lead and now director at the Distributed AI Research Institute. “LLaMA 3.1’s smaller context window means Meta will need to deploy more sharded instances, increasing operational complexity.”
| Metric | Gemini Ultra (Google) | LLaMA 3.1 (Meta) | Cost Savings (Meta Est.) |
|---|---|---|---|
| Model Size | 120B parameters | 70B parameters | N/A |
| Inference Latency (ms) | 85 (TPU v6) | 110 (Gaudi 3) | ~30% higher |
| Multilingual Accuracy | 92% (Ars Technica) | 80% (Ars Technica) | N/A |
| Cloud Cost (per 1M tokens) | $0.18 (Google TPU) | $0.12 (Gaudi 3) | 33% reduction |
Source: Ars Technica benchmarks (June 2026), Meta internal cost projections
AWS’s Silent Victory: How Meta’s Move Benefits Amazon
Google’s loss is AWS’s gain. Meta’s infrastructure team has already begun migrating non-critical AI workloads to AWS’s Trainium chips, which offer comparable performance to TPUs at a 20% lower price point for Meta’s scale. “We’ve seen a 15% uptick in AWS Trainium inquiries from Meta’s team since April,” said Swami Sivasubramanian, AWS’s VP of AI, in a Wired interview.
The shift isn’t just about cost—it’s about vendor lock-in. Meta’s reliance on Google’s TPUs gave Google leverage in negotiations, but the LLaMA 3.1 pivot weakens that position. “This is a classic case of the innovator’s dilemma,” said Dr. Fei-Fei Li, Stanford’s AI director. “Google bet big on TPUs for AI, but now Meta is hedging with open-source. The real question is whether AWS or Nvidia will become the new default for hyperscale AI.”
Will Open-Source AI Break the Duopoly?
Meta’s move is a shot across the bow for closed AI ecosystems. With LLaMA 3.1 now available under the Meta AI License, competitors like Mistral AI and Together.ai are rushing to optimize it for their own hardware. “We’re seeing a 400% increase in LLaMA 3.1 fine-tuning requests since its release,” said Eugene Yan, CEO of Together.ai, in a Protocol interview.
But open-source isn’t a panacea. Google’s Gemini still leads in enterprise-grade security features, including its Confidential Computing for AI, which Meta cannot replicate with LLaMA 3.1. “The security gap is real,” said Bruce Schneier, Harvard’s cybersecurity expert. “Gemini’s end-to-end encryption for model weights is something open-source alternatives can’t match yet.”
Regulators Are Watching: Is This a Monopoly Move?
Google’s capacity restrictions could draw antitrust scrutiny. The EU’s Digital Markets Act (DMA) prohibits “unfair” restrictions on access to essential services—something Meta’s general counsel, Colin Stretch, hinted at in a Financial Times interview. “We’re reviewing whether Google’s TPU allocation policies comply with DMA rules,” Stretch said, without elaborating.

In the U.S., the FTC is already probing Google’s cloud practices. A May 2025 lawsuit accused Google of using its dominance in search to stifle competition in cloud AI. This latest move could strengthen Meta’s case if it can prove Google’s TPU restrictions are retaliatory.
What Happens Next?
- Meta’s AI roadmap slows: LLaMA 3.1’s deployment will take 6–9 months to reach Gemini’s scale, pushing back Meta’s 2027 generative ad ambitions.
- AWS and Nvidia gain ground: Meta’s migration to Trainium and H100s accelerates AWS’s lead in hyperscale AI, while Nvidia’s CUDA optimizations for LLaMA 3.1 make it the de facto open-source standard.
- Google’s cloud margins shrink: Without Meta’s 300,000-core commitment, Google must now compete harder for enterprise AI clients, likely leading to further TPU pricing adjustments.
- Open-source AI fragments: The LLaMA 3.1 rush could splinter the AI ecosystem, with forks optimized for different hardware (ARM vs. x86) and use cases.
This isn’t the end of Meta’s AI plans—it’s a pivot. By forcing Meta’s hand, Google has inadvertently accelerated the shift toward open-source AI, but at the cost of ceding ground to AWS and Nvidia. The real winner? Developers, who now have more choices—but the losers may be end users, who could see slower innovation if the AI arms race fragments into competing standards.
One thing is clear: the cloud wars are no longer just about infrastructure. They’re about control—and Meta just called Google’s bluff.