The narrative around AI infrastructure is broadening beyond the well-known GPU arms race.
For the past 18 months, the story was simple: more AI means more GPUs. But a new chapter is unfolding, centered on Agentic AI. HSBC's recent analysis suggests this new wave of AI is creating a critical bottleneck not in GPU processing, but in CPU orchestration and memory capacity. This insight led them to raise their 2026 global server shipment forecast to +20% year-over-year, a figure that points to a significant shift in the market.
So, what's driving this change? It boils down to how Agentic AI works. First, these agents break down complex tasks into many smaller steps, involving tool use, logical branching, and data input/output. Research, such as a recent paper from 'AgentCgroup,' shows that the majority of the time an agent takes to complete a task—up to 74% of the latency—is spent on these non-GPU activities. These are precisely the jobs where CPUs and DRAM excel, explaining why they are now feeling the strain.
Second, we see this trend reflected in the actions of major tech players. Companies like OpenAI, Google, and AWS have all launched platforms to help businesses build and manage AI agents, fueling their adoption. More directly, Meta's recent partnership with NVIDIA explicitly includes large-scale deployments of standalone Grace CPUs for agent-related workloads. This is a clear signal that the world's biggest data center operators are dedicating significant resources to CPU-heavy computing.
Finally, the supply chain is telling the same story. DRAM prices have been spiking dramatically, with some categories seeing price increases of up to 95% quarter-over-quarter. This 'memory crunch' is a direct consequence of the rising demand from memory-intensive agent workflows. The market isn't just building for GPU power anymore; it's racing to secure the CPU and memory resources needed to make intelligent agents work efficiently at scale.
- Agentic AI: AI systems designed to autonomously perform complex, multi-step tasks by using tools, reasoning, and planning, much like a human agent would.
- Hyperscaler: A large-scale cloud service provider that operates massive data centers, such as Amazon Web Services (AWS), Microsoft Azure, and Google Cloud.
- Latency: The delay between a user's action and the time it takes for a system to respond. In AI, it's the time from when a query is sent to when the answer is received.
