Groq Unveils 3rd Gen LPU at GTC 2026; Samsung to Manufacture on 4nm, Shipping Q3 2026

At GTC 2026, Groq announced its 3rd generation LPU, a specialized AI chip for inference, will be manufactured by Samsung using its 4nm process and begin shipping in Q3 2026.

This chip uses on-die SRAM instead of HBM memory, a strategic design choice to bypass the persistent supply bottlenecks in HBM and advanced packaging.

The move signals a significant shift in the AI hardware landscape, diversifying the foundry market away from TSMC and expanding NVIDIA's strategy to a hybrid "GPU+LPU" inference stack.

At its GTC 2026 conference, a major announcement was made that could reshape the AI hardware landscape: Groq's third-generation Language Processing Unit (LPU) will be manufactured by Samsung Foundry on its 4nm process, with shipments starting in the third quarter of 2026.

So, what exactly is this LPU? Think of it as a highly specialized processor designed for one thing: AI inference. While powerful GPUs are great for training AI models, inference is the process of actually using those models to get answers, like powering a chatbot. Groq's LPU is engineered to do this incredibly fast. Its secret weapon is using on-die SRAM memory instead of the more common HBM. This design allows for extremely low latency, meaning near-instantaneous responses.

This architectural choice is a brilliant strategic move. For the past couple of years, the entire AI industry has been constrained by a severe bottleneck in the supply of HBM memory and the advanced packaging technology, like CoWoS, needed to assemble it. First, this shortage created intense demand for alternative solutions. Second, by designing a chip that doesn't rely on HBM, Groq can sidestep this entire problem, potentially delivering powerful inference chips to customers faster and more reliably.

Furthermore, Samsung's role as the manufacturer is highly significant. For a long time, Taiwan's TSMC has been the undisputed leader in advanced chip manufacturing. Groq's decision to partner with Samsung, a collaboration that officially began back in 2023, is a major win for the Korean tech giant. It validates Samsung's efforts to compete at the highest level, especially as it has been offering aggressive pricing to attract major clients like Groq.

Finally, this isn't just Groq's story; it's also about NVIDIA. In a February 2026 earnings call, NVIDIA revealed it had licensed Groq's low-latency technology. This move suggests NVIDIA is evolving its strategy. Instead of relying solely on its own GPUs, it's building a more diverse ecosystem. The GTC announcement confirms this vision of a hybrid "GPU+LPU" stack, where each processor is used for the task it's best suited for. This event, therefore, is more than a new product launch; it's a signal of diversifying supply chains, intensifying foundry competition, and a more sophisticated future for AI computing.

LPU (Language Processing Unit): A specialized processor designed specifically for running AI language models very quickly (inference).
SRAM (Static Random-Access Memory): A type of very fast memory located directly on the chip. It's faster than external memory like HBM but typically available in smaller capacities.
Foundry: A company that manufactures semiconductor chips for other companies that design them, like Samsung making chips for Groq.

Groq Unveils 3rd Gen LPU at GTC 2026; Samsung to Manufacture on 4nm, Shipping Q3 2026

At GTC 2026, Groq announced its 3rd generation LPU, a specialized AI chip for inference, will be manufactured by Samsung using its 4nm process and begin shipping in Q3 2026.

This chip uses on-die SRAM instead of HBM memory, a strategic design choice to bypass the persistent supply bottlenecks in HBM and advanced packaging.

The move signals a significant shift in the AI hardware landscape, diversifying the foundry market away from TSMC and expanding NVIDIA's strategy to a hybrid "GPU+LPU" inference stack.

LPU (Language Processing Unit): A specialized processor designed specifically for running AI language models very quickly (inference).
SRAM (Static Random-Access Memory): A type of very fast memory located directly on the chip. It's faster than external memory like HBM but typically available in smaller capacities.
Foundry: A company that manufactures semiconductor chips for other companies that design them, like Samsung making chips for Groq.