The brief era of falling AI GPU rental prices appears to be over, with a clear rebound now underway.
After a period of decline in the second half of 2025, largely driven by AWS's significant price cuts, the market is heating up again. Rental rates for top-tier data center GPUs like NVIDIA's H100 are on the rise. What's particularly noteworthy is that even older-generation GPUs, such as those from the Ampere and early Hopper series, are seeing their prices climb—a phenomenon that defies typical tech hardware depreciation.
So, what's causing this widespread price surge? The answer lies in a combination of three powerful forces. First is the explosive growth in demand, which is undergoing a structural shift. While demand for AI training remains strong, the commercialization of AI services has triggered a massive need for inference—the process of using a trained model to make predictions. This is supported by hyperscalers' aggressive capital expenditures, with projections for AI infrastructure spending reaching between $500 and $700 billion in 2026.
Second, critical supply chain bottlenecks persist. The production of high-performance GPUs is constrained by the availability of high-bandwidth memory (HBM) and advanced packaging technology like CoWoS. TSMC has indicated that alternative packaging solutions won't be a viable replacement for CoWoS for the largest AI chips anytime soon, meaning supply relief isn't expected until late 2026 or even 2027.
Third, a geopolitical shift has expanded the demand pool. The U.S. government's decision to partially allow exports of high-end chips like the H200 to China has opened up a significant new market. This adds further strain on the already tight supply of HBM and packaging, reinforcing price rigidity across the board.
The fact that older GPUs are increasing in price is a crucial indicator. It suggests that the demand for inference is so vast that companies are seeking out any available compute power that is 'good enough' for their specific workloads, tightening supply across all tiers. The previous narrative that the GPU shortage was ending was based on temporary factors. The market is now showing that the underlying structural imbalance between scorching demand and constrained supply remains firmly in place.
- Inference: The process of using a trained AI model to make a prediction or decision based on new data. It's the 'live' operational phase after the initial 'training' phase.
- HBM (High-Bandwidth Memory): A type of high-performance RAM used in GPUs and other processors, essential for handling the massive datasets required for AI.
- CoWoS (Chip-on-Wafer-on-Substrate): An advanced 2.5D packaging technology developed by TSMC that allows multiple chips to be integrated side-by-side on a single interposer, crucial for building powerful AI accelerators.
