Blog | AI21

Skip to Main Menu
Skip to Main Content
Skip to Footer

Stride and Prejudice: How a 32-bit overflow corrupted a CUDA kernel (and stayed hidden for weeks)

Mar 25, 2026

Stride and prejudice: How a 32-bit overflow corrupted a CUDA kernel (and stayed hidden for weeks)

TL;DR While training our Jamba 3B model with GRPO, we hit a mysterious logprob mismatch between rollout and training. The…

Mar 17, 2026

Mind the gap: What separates demo agents from production systems

Where enterprise AI deployments actually get stuck

Mar 10, 2026

Where enterprise AI deployments actually get stuck

Feb 26, 2026

Modular intelligence: a human-like model for agent orchestration

Feb 11, 2026

Reducing LLM training waste with model-agnostic padding minimization

Feb 5, 2026

Go big or go OOM: the art of scaling vLLM

Jan 29, 2026

One token to corrupt them all: a vLLM debugging tale

Jan 29, 2026

Chunk size is query-dependent: a simple multi-scale approach to RAG retrieval

Jan 22, 2026

When sleeping in saves you money: dynamic data snoozing for efficient online RL

Jan 22, 2026

Closing the parsing gap: reaching SOTA RTL parsing by leveraging LTR capabilities

Boring isn't easy

Jan 15, 2026

Boring isn’t easy

Introducing Jamba2: The Open Source Model Family for Enterprise Reliability and Efficiency

Jan 8, 2026

Introducing Jamba2: The open source model family for enterprise reliability and efficiency

Jan 8, 2026

How to scale agentic evaluation: lessons from 200,000 SWE-bench runs

1 2 3 … 11