A new Google AI technology that initially sent shockwaves through the memory chip market is now facing serious questions about its originality.
The chain of events began when Google Research publicized its TurboQuant paper on March 25th. The headline claims—a "6x" reduction in KV Cache memory and an "8x" speedup—were powerful enough to trigger an immediate market reaction. By the next day, investors fearing a collapse in HBM demand sold off shares in major memory makers like Samsung, SK hynix, and Micron, wiping billions off their market value.
However, the story took a sharp turn. On March 30th, researchers from ETH Zurich publicly challenged the paper, alleging "major distortions" and claiming it bears strong similarities to their prior work, "RaBitQ." For investors, the takeaway isn't about picking a side in an academic debate; it's that this dispute introduces significant uncertainty. The paper might be revised, and its adoption into real-world systems could be much slower than initially feared, blunting its immediate market impact.
This new context forces a re-evaluation of the initial panic. First, the technical claims were likely over-interpreted. The "6x" figure applies only to the KV Cache, a specific component of memory used in AI models. The actual system-wide memory saving is closer to 15%—a meaningful improvement, but not a revolution that would crater HBM demand overnight. Often, such efficiency gains from quantization are used to run larger models or handle more users, not to buy less hardware.
Second, and more importantly, the market seems to have overlooked the bigger picture: a severe and ongoing memory supply shortage. Industry leaders, including the chairman of SK Group, have been warning that supply will lag demand until at least 2027, and possibly even 2030. This supply bottleneck is a much stronger force shaping the market than a single software optimization technique.
In conclusion, the initial sell-off appears to have been an overreaction to a headline figure. The academic challenge now adds a layer of doubt, and when viewed against the backdrop of persistent supply constraints, the narrative shifts from a potential demand collapse to one of continued, steady growth for the memory sector.
- HBM (High Bandwidth Memory): A type of high-performance stacked memory essential for modern AI accelerators like GPUs, used to quickly access large amounts of data.
- KV Cache (Key-Value Cache): In large language models, this is a memory-intensive component that stores intermediate calculations to speed up the generation of new text. Compressing it is a key area of research.
- Quantization: A technique to reduce the memory footprint and computational cost of AI models by converting numbers from high-precision formats (like 16-bit) to lower-precision ones (like 3-bit).
