The “Great Rotation” represents a market pivot from speculative AI software applications to the hard-asset infrastructure enabling them. By May 2026, investors are prioritizing power-efficient compute, advanced interconnect fabrics and thermal management over raw LLM parameter counts, favoring companies that control the physical AI stack.
For the last two years, the market has been intoxicated by the “scaling laws”βthe belief that simply adding more compute and more data would yield emergent intelligence. But as we navigate the second quarter of 2026, we have hit the “Inference Wall.” The cost of running these models at a global scale has become an existential threat to the margins of every SaaS company that slapped an API wrapper on a frontier model.
The rotation is simple: money is moving from the “brains” (the models) to the “nervous system” and “circulatory system” (the chips, the power, and the cooling). If you are still betting on the next “AI-powered PDF reader,” you are playing a game that ended in 2024.
The Inference Wall and the Rise of Custom ASICs
The industry has transitioned from the training era to the inference era. While NVIDIA’s H100s and B200s were the gold standard for training, the operational reality of 2026 is that general-purpose GPUs are too power-hungry for massive-scale deployment. We are seeing a violent shift toward Application-Specific Integrated Circuits (ASICs).
Custom silicon, designed for specific workloads, offers a performance-per-watt ratio that general GPUs cannot touch. This is where the real growth lies. When a hyperscaler like Google or AWS moves a workload from a general GPU to a custom TPU or Inferentia chip, the latency drops and the energy bill plummets. The “Great Rotation” is essentially a bet on the engineers who can shrink the footprint of a trillion-parameter model without sacrificing its reasoning capabilities.
“The era of ‘brute force’ AI is over. We are no longer asking how many H100s People can cram into a cluster; we are asking how we can achieve the same tokens-per-second using 40% less power. The winners are those optimizing the data path, not just the compute.” β Marcus Thorne, Principal Systems Architect at a Tier-1 Cloud Provider.
This shift favors companies like Broadcom, which doesn’t just sell chips but designs the custom silicon architecture for the giants. Their dominance in the custom ASIC market makes them the silent landlord of the AI era. They provide the plumbing that allows LLM parameter scaling to remain economically viable.
Thermal Throttling at Scale: Why Power is the New Gold
We have reached a point where the bottleneck is no longer the chip, but the socket. Modern AI clusters are generating heat densities that make traditional air cooling look like a joke. We are seeing a mandatory migration to liquid-to-chip cooling and immersive cooling tanks.
The physics are brutal. When you push NPUs (Neural Processing Units) to their limits, thermal throttling kicks in, slashing performance to prevent the silicon from literally melting. This has created a massive opportunity for infrastructure plays like Vertiv and Eaton. They aren’t “AI companies” in the traditional sense, but they are the only reason AI companies can retain their servers running.
Beyond cooling, the power grid itself is the ultimate constraint. The surge in data center demand has pushed aging grids to the brink. This is why we are seeing a sudden, aggressive pivot toward Small Modular Reactors (SMRs) and dedicated nuclear power agreements. The “Great Rotation” is moving capital into the energy sector because a chip is useless without a stable, massive flow of electrons.
The 30-Second Verdict: Infrastructure vs. Application
- The Play: Move away from “AI-enabled” software; move toward the physical layer (Power, Cooling, Interconnects).
- The Risk: Over-reliance on a single foundry (TSMC) for CoWoS (Chip on Wafer on Substrate) packaging.
- The Metric: Stop tracking “User Growth” and start tracking “Tokens per Watt.”
The Interconnect Bottleneck: Beyond the GPU
One of the most overlooked aspects of the AI stack is the “East-West” trafficβthe data moving between GPUs within a cluster. As clusters grow to hundreds of thousands of chips, the bottleneck isn’t the compute speed; it’s the interconnect speed. If the data can’t move fast enough between nodes, the GPUs sit idle, wasting expensive power.
The war is currently being fought between proprietary fabrics like NVLink and open standards like the Ultra Ethernet Consortium (UEC). The market is rotating toward companies that can standardize this communication. Marvell, for instance, is positioning itself as the leader in optical interconnects, moving data via light rather than electricity to reduce latency and heat.
To understand the technical divide, look at the following comparison of infrastructure requirements as we move deeper into 2026:
| Metric | Training Era (2023-2024) | Inference Era (2025-2026) |
|---|---|---|
| Primary Hardware | General Purpose GPUs (H100/B200) | Custom ASICs & Specialized NPUs |
| Cooling Method | Forced Air / Hybrid | Direct-to-Chip Liquid Cooling |
| Bottleneck | VRAM Capacity / Compute Power | Power Grid / Interconnect Latency |
| Key Architecture | Dense Transformer Models | MoE (Mixture of Experts) / Quantized Models |
The Geopolitical Chip War and the Open-Source Hedge
We cannot discuss the rotation without mentioning the “Chip Wars.” The reliance on TSMC’s advanced nodes (3nm and below) creates a single point of failure for the entire global AI economy. Any instability in the Taiwan Strait doesn’t just affect stock prices; it halts the physical production of intelligence.

This risk is driving a secondary rotation toward “sovereign AI”βnations building their own compute clusters and designing their own chips. Simultaneously, the open-source community, led by projects on GitHub and Hugging Face, is optimizing models to run on consumer-grade hardware. This “democratization of inference” threatens the moat of the hyperscalers.
“The strategic goal is no longer just ‘more power,’ but ‘distributed resilience.’ We are seeing a push toward edge-AI where the model lives on the device, not the cloud. This shifts the value from the data center to the SoC (System on a Chip) designer.” β Sarah Chen, Cybersecurity Analyst specializing in Hardware Root-of-Trust.
The final rotation will be the most significant: the move from the Cloud to the Edge. When the LLM is running locally on an ARM-based NPU in your laptop or phone, the demand for massive, centralized data centers decreases, but the need for highly efficient, low-power silicon increases exponentially.
The Great Rotation is not a sign that AI has failed. It is a sign that AI is maturing. We are moving from the “magic trick” phase to the “industrial utility” phase. In this new era, the winners aren’t the ones who can write the best prompt; they are the ones who own the copper, the silicon, and the power plants.