Yeah, I’m tired of DeepSeek articles too.
This is not another review, and I’ll keep it brief. What I’ve been thinking about is how the deluge of commentary revealed more about the goals of the authors than about how DeepSeek actually works and what it means for the future of AI.
First, we heard the biggest VCs and AI startups “talking their book” by trying to make the narrative all about US vs. China. For example, when Marc Andreesen of A16Z calls this a Sputnik moment, he’s asking for the Trump administration to help his AI portfolio companies. Dario at Anthropic jumped in to champion chip export controls which was, I thought, a little embarrassing because he ignored DeepSeek’s innovations in RL/GPRO for the reasoning model, as well as Multi-Head Latent Attention in the base model. And of course, Sam Altman and Masa chose this moment to announce a doubled Open AI valuation of $300B and tie it to the Stargate initiative. OpenAI and Anthropic make great products, but they do not currently have a great business model or justification for their valuations.
The primary way this is about China is *not* a recent change, and should not come as a surprise — the DeepSeek R1 paper included masterful techniques to improve the efficiency of computation (actually highlighted by the DeepSeek-V3 base model paper in Dec, that didn’t get much US press at the time). The Chinese tech sector has excelled at driving efficiency gains when implementing pre-existing technology innovations for generations now; hence, globalization. Those arguing that the the innovative parts of DeepSeek’s paper are a *result* of constraints from chip export controls would prefer more free trade, but from a timing perspective, that take has been debunked.
Next, and related to the first point above, came endless social media posts from industry pundits about whether DeepSeek tried to mislead the world about their costs, whether the press got it wrong, or something in between. Which primarily revealed that lots of intelligent people still get confused about the difference between capex and opex. If you go to the primary source (DeepSeek papers and interviews), it’s pretty clear that the $6M claim was about their final training run, and also that they had significant capex expense. Although the exact amount is unknown, many seem to agree it’s much less than the spend of top AI startups in BOTH China and the US). Keep in mind, DeepSeek gave it away; you can already get versions of R1 running on US servers across the AI landscape from HuggingFace to Perplexity.
While DeepSeek may have timed their PR for maximum US impact, they are outsiders within the China ecosystem and don’t practice hiring or business practices in the same way as the AI giants there who are more deeply interwoven with the government. DeepSeek is aiming for basic AGI R&D, not trying to sell an API, so whatever they claim about costs, it is not for the purpose of wooing investors.
And last, there was the reaction to the NVIDIA stock market sell-off — chip sector bulls would only talk about the Jevons paradox for a solid week straight. There is clearly a shift from base training to RL and inference compute, in order to get results of a quality comparable to OpenAI’s best models. Regardless of Jevons paradox, chip usage should keep scaling to support the high end business and government use cases. The more interesting question not addressed head on in most articles I saw is whether this gives a boost to NVIDIA’s competitors that make special inference-optimized chips (e.g. Groq, Cerebras).
As for what I think of DeepSeek’s work, for me the takeaway is about open-source beating closed source, and RL + inference compute driving a significant modification to training’s “bitter lesson”. I also hope to see more articles linking to interviews with DeepSeek CEO Liang Wenfeng, who is now undoubtedly driving industry change.
These excellent articles echo my POV:
Stratechery — DeepSeek FAQ (note — Ben says the success of RL here is an affirmation of the bitter lesson; it is, but it’s not the sales/business pitch that OpenAI and Anthropic have been making for years — most investors understand the bitter lesson to be advice to scale transformer training hardware CAPEX rather than spending time testing new algorithms and architectures.
Forbes — Open Source is the Biggest Winner (along with Meta/LeCun)