Kioxia Unveils Super High IOPS SSD, Enabling Direct GPU Access to Tackle AI Memory Bottlenecks

Kioxia has announced a new ultra-fast SSD that allows GPUs to access storage directly, bypassing the CPU.

This technology addresses the "memory wall" in AI, where powerful GPUs are limited by their small on-package memory (HBM).

By creating a massive "extended memory" tier, this SSD aims to accelerate large-scale AI tasks like inference and data retrieval.

Kioxia has just revealed a groundbreaking "Super High IOPS SSD" designed to talk directly to the GPU.

This isn't just another fast drive; it's a fundamental change in how AI systems handle data. First, it tackles the 'memory wall'. GPUs have incredibly fast but small HBM memory. This SSD acts as a vast extension, a secondary memory pool for less frequently accessed data, dramatically increasing the total memory available. Second, it reshapes the data path. Using technology like NVIDIA’s GPUDirect Storage (GDS), data moves directly from the SSD to the GPU's memory, completely bypassing the CPU and system RAM. This eliminates a major bottleneck, speeding up AI workloads that are constantly waiting for data.

The timing is driven by the explosive growth in AI. AI accelerators like NVIDIA's Blackwell and upcoming Rubin chips are incredibly powerful, but their on-package HBM memory, while fast, is limited to a few hundred gigabytes. As AI models grow larger, they need a much bigger memory space for tasks like handling large language model (LLM) context or searching through massive vector databases. This has created a critical need for a new storage tier—one that is much larger than HBM but still significantly faster than traditional storage.

This development didn't happen overnight. It's the result of a long chain of events. The journey began over a year ago when industry leaders like NVIDIA signaled the need for SSDs with up to 100 million IOPS—a massive leap from today's standards. This prompted Kioxia to formalize its roadmap. Over the following months, the technical pieces fell into place: Kioxia demonstrated prototypes and emulations of the direct-to-GPU connection, while the software ecosystem, including standards like NVMe 2.0 and tools like SPDK, matured to handle such high speeds.

Simultaneously, market forces made this innovation not just possible, but necessary. The high cost and tight supply of HBM and DDR5 memory made a flash-based alternative economically compelling. The relentless growth in AI server deployments guaranteed a hungry market. When NVIDIA announced its next-generation Rubin GPU, which is even more memory-intensive, it solidified the case for this new class of storage. Kioxia's announcement is the culmination of these technological and market pressures.

IOPS (Input/Output Operations Per Second): A measure of storage device performance, indicating how many read and write operations it can perform per second. Higher is better.
HBM (High Bandwidth Memory): A type of very fast, high-performance RAM stacked directly on the same package as a processor like a GPU, providing massive bandwidth but limited capacity.
GPUDirect Storage (GDS): An NVIDIA technology that creates a direct data path between storage (like an NVMe SSD) and the GPU's memory, bypassing the CPU to reduce latency and increase bandwidth.

Kioxia Unveils Super High IOPS SSD, Enabling Direct GPU Access to Tackle AI Memory Bottlenecks

Kioxia has announced a new ultra-fast SSD that allows GPUs to access storage directly, bypassing the CPU.

This technology addresses the "memory wall" in AI, where powerful GPUs are limited by their small on-package memory (HBM).

By creating a massive "extended memory" tier, this SSD aims to accelerate large-scale AI tasks like inference and data retrieval.

Kioxia has just revealed a groundbreaking "Super High IOPS SSD" designed to talk directly to the GPU.

IOPS (Input/Output Operations Per Second): A measure of storage device performance, indicating how many read and write operations it can perform per second. Higher is better.
HBM (High Bandwidth Memory): A type of very fast, high-performance RAM stacked directly on the same package as a processor like a GPU, providing massive bandwidth but limited capacity.
GPUDirect Storage (GDS): An NVIDIA technology that creates a direct data path between storage (like an NVMe SSD) and the GPU's memory, bypassing the CPU to reduce latency and increase bandwidth.