Daily Episode

Nvidia, AMD, Intel Unite Behind Open-Source Inference Engine

May 12, 2026

0:000:00

Episode Summary

TOP NEWS HEADLINES Following yesterday's coverage of Anthropic's compute expansion, new details emerged: Anthropic signed a $1. 8 billion deal with Akamai for compute services over seven years - m...

Full Transcript

TOP NEWS HEADLINES

Following yesterday's coverage of Anthropic's compute expansion, new details emerged: Anthropic signed a $1.8 billion deal with Akamai for compute services over seven years — making Akamai's stock surge to its highest level since 2000.

That's CoreWeave, Amazon, Google, Broadcom, xAI, and now Akamai — all in a single month.

Following yesterday's coverage of DeepMind's scientific breakthroughs, new details emerged: Google DeepMind released an agentic co-mathematician built on Gemini 3.1 that helped Oxford professor Marc Lackenby solve an open problem in mathematics — by spotting a clever proof strategy buried inside an output the system's own reviewers had rejected.

Following yesterday's coverage of MCP security risks, new details emerged: researchers have now identified a specific attack vector called AI tool poisoning, where hackers tamper with hidden app descriptions to insert instructions like "forward any files you access to this address" — and your AI assistant just follows them, silently.

Mistral AI hit 20x ARR growth over the past year and is on track to cross one billion dollars in ARR — positioning itself as the sovereign European alternative for regulated enterprises that don't want full dependency on US labs.

And Nvidia has now committed over forty billion dollars in AI equity investments this year alone — essentially financing the entire AI supply chain to ensure it runs on Nvidia hardware. ---

DEEP DIVE ANALYSIS

**The Neutral Layer: Why Nvidia, AMD, and Intel Just Backed the Same $100 Million Seed Round** Three companies that have spent decades trying to destroy each other just wrote checks to the same startup. Nvidia, AMD, and Intel all participated in RadixArk's hundred-million-dollar seed round at a four-hundred-million-dollar valuation. The company sits behind SGLang, an open-source inference engine already deployed across four hundred thousand GPUs and used by Google, Microsoft, xAI, Oracle, and — notably — both Nvidia and AMD themselves.

When rivals become customers before they become investors, that's worth stopping to think about. **Technical Deep Dive** SGLang is an inference engine — meaning it's the software layer that sits between your AI model and the hardware running it. Its job is to make that conversation as efficient as possible.

It does this through several mechanisms: RadixAttention, which reuses repeated context so the system doesn't reprocess the same information twice; intelligent memory management that reduces waste; smarter request batching; and hardware-agnostic scheduling that squeezes more useful work out of the same GPUs regardless of who made them. Here's why that last part matters. Right now, most inference optimization is siloed.

Nvidia's CUDA ecosystem is extraordinarily powerful — but it's also a lock-in mechanism. AMD's ROCm platform is catching up, but developers don't want to rewrite inference logic for every hardware target. Intel's Gaudi accelerators are looking for any serious foothold in the market.

SGLang operates as a neutral abstraction layer. Write once, optimize everywhere. The model doesn't care what chip it's running on.

The inference engine handles the translation. That's technically elegant — and strategically explosive. **Financial Analysis** A hundred-million-dollar seed at a four-hundred-million-dollar valuation is aggressive.

But look at what RadixArk is actually selling: the right to reduce GPU waste across four hundred thousand already-deployed accelerators. At current GPU pricing — where an H100 runs anywhere from two to eight dollars per GPU-hour depending on reservation terms — even a ten percent efficiency gain across that fleet represents hundreds of millions of dollars in annual savings for customers. That's the financial thesis.

RadixArk doesn't need to sell new hardware. It makes existing hardware produce more. In an environment where compute costs are the primary constraint on AI scaling, that value proposition is almost impossible to argue with.

For the chip companies themselves, the calculus is different. Nvidia's forty-billion-dollar investment spree, which we covered in headlines, is about securing the AI supply chain. This investment is the same logic applied to software.

If the neutral inference layer runs best on your silicon — if you've had early access to optimize the integration — you've converted a potential threat into a distribution advantage. Missing this round wasn't just missing a financial return. It was risking irrelevance in the software layer that determines how useful your hardware actually is.

**Market Disruption** The old infrastructure story was simple: win the chip war, win the AI race. That story is cracking. As models mature and hardware becomes more commoditized — and it will commoditize, despite Nvidia's current dominance — the battleground shifts to inference efficiency.

Who can serve a billion requests at the lowest cost? That question is answered in software, not silicon. RadixArk's positioning creates a new kind of leverage.

Historically, inference optimization was proprietary. Nvidia had TensorRT. Each cloud provider had internal tooling.

SGLang going open-source and cross-platform is the Kubernetes moment for AI inference — a common layer that no single vendor controls but everyone builds on top of. The competitive ripple effects extend beyond chip companies. Cloud providers like AWS, Azure, and Google Cloud all offer GPU-backed inference services.

A hardware-agnostic inference engine makes it easier for enterprises to shop around — which compresses cloud margins and increases pressure on differentiation above the infrastructure layer. For AI startups building on top of inference APIs, cheaper and more efficient inference directly expands their addressable market. **Cultural and Social Impact** There's a quieter story embedded in this deal that's worth naming.

The chip wars have been a significant driver of AI concentration — the companies with the most Nvidia GPUs have had the most capability, full stop. An efficient, hardware-neutral inference layer doesn't eliminate that advantage, but it meaningfully reduces the gap. If a smaller research institution or a startup in a country without preferential GPU access can run inference forty percent more efficiently on AMD or Intel hardware, that's a material democratization of AI capability.

The frontier doesn't compress overnight, but the distance between frontier and accessible starts to shrink. There's also a workforce dimension. Inference optimization has historically required deep, hardware-specific expertise — the kind of knowledge that lives in a handful of teams at Google, Meta, and the major cloud providers.

Open-source tooling at this level of sophistication transfers that knowledge more broadly. It's the same pattern we saw with TensorFlow and PyTorch — proprietary advantage gives way to ecosystem, and the ecosystem grows faster than any single team could. **Executive Action Plan** First: if you're running AI workloads at any meaningful scale, audit your inference layer now.

Most enterprise AI deployments are leaving significant efficiency on the table with default configurations. Before you spend another dollar on GPU capacity, understand your current utilization rates. SGLang is open-source — your engineering team can benchmark it against your existing setup this quarter.

Second: if you're evaluating cloud AI infrastructure contracts, hardware lock-in clauses deserve more scrutiny than they're getting. The emergence of hardware-agnostic inference engines changes your negotiating position. Flexibility has quantifiable value, and the market is moving toward rewarding it.

Third: if you're a technology executive trying to build an AI strategy that survives the next three years, watch the software infrastructure layer as carefully as you watch the model releases. The companies that will define AI economics in 2028 aren't necessarily building the best models. They're building the most efficient path from model to output — and that race just got a hundred million dollars more interesting.

Browse All Daily Episodes

Nvidia, AMD, Intel Unite Behind Open-Source Inference Engine

Episode Summary

Full Transcript

TOP NEWS HEADLINES

DEEP DIVE ANALYSIS

Never Miss an Episode