Thoughts on Opus and Mythos
Anthropic launched Opus 4.7 with what it is calling Adaptive Thinking. We are putting it through its paces at NonBioS, but our initial findings are that we might just sit this one out - Opus 4.7 does not seems to represent a meaningful improvement over Opus 4.6 for our use cases. We plan to continue using Opus 4.6 for now - our latest model, nonbios-1.143, still relies on Opus 4.6 in its harness, albeit with upgrades to the surrounding infrastructure.
The gotcha that Opus 4.7 gets wrong, and I confirmed:
Me: I want to wash my car. The car wash is 100 meters away - should i walk or drive ?
Opus 4.7: Walk. 100 meters is about a one-minute stroll — by the time you’ve started the engine and backed out, you’d basically be there on foot.
Opus 4.6 did get it right btw.
With Opus 4.7, Anthropic’s strategy seems to reprise a debate that was central in late 2024. OpenAI’s o1 demonstrated what appeared to be a scaling law for inference-time compute, raising the prospect of AI performance being improved not just by training larger models, but by allocating more computational resources during the inference step itself - letting a model “think longer” about difficult problems. For a period, this generated genuine excitement.
The benchmark stamps soon followed - but real world tests - at NonBioS and elsewhere soon closed that debate: Scaling inference time compute could uplift performance on specific tasks but reports that a 70 bn parameter model would outperform a 200 bn parameter model was far fetched. The broadly accepted picture now is more nuanced: smaller models combined with advanced inference algorithms can offer competitive cost-performance trade-offs, but this holds primarily within specific problem types and not as a general substitute for a larger, more capable base model.
The scaling laws for model size broadly continue to hold. Larger models tend to demonstrate more general intelligence by a significant margin.
What Anthropic appears to be doing with Opus 4.7, in my assessment, is something adjacent to this older playbook. You see about a week back, users at nonbios started complaining that nonbios-1.142 was showing degradation in performance. nonbios-1.142 uses Opus 4.6 heavily in its harness. Wider internet reports confirmed our suspiciion - Anthropic had quietely degraded Opus 4.6.
Our working hypothesis - is that Adaptive Thinking in Opus 4.7 is intended to compensate for a model that may have been adjusted (maybe using quantization adjacent techniques) for cost efficiency, by having it reason more extensively on complex tasks.
In other news, Anthropic announced Mythos as a frontier model, but withheld it from general release on the grounds that its offensive cyber capabilities were too dangerous. On the benchmarks, Mythos appears to be a substantially more capable model than Opus. This broadly checks out - a larger, more capable model tends to demonstrate correspondingly better general intelligence - but it will also be considerably more expensive to serve. Whether the restricted rollout is primarily a safety decision, or a cost decision is something only Anthropic knows.
The more consequential question, in my view, is a geopolitical one. India has emerged as Anthropic’s second-largest consumer market globally. At the same time, Mythos - Anthropic’s most capable model - is being shared selectively within the US national security ecosystem. There are reports about Anthropic pushing back against using AI for autonomous weapons, but the practical upshot is that the US national security apparatus has some access to Mythos in its restricted form, while major commercial partners like India do not.
As awareness grows that Anthropic’s most powerful model is being made available to US defence agencies while being withheld from allied-but-non-US markets, it could prompt difficult questions. Governments in such markets may begin to ask whether they should allow market access to a technology whose frontier capabilities are effectively reserved for American national security purposes. Especially, in a competitive landscape where OpenAI is actively courting the same market.
What makes this situation geopolitically charged is that AI is not like previous general-purpose technologies in its relationship to military power. Mobile phones, the internet, even GPS all proliferated globally with relatively symmetric access. Their military applications were real, but derivative - they improved communication, logistics, coordination. AI is fundamentally suited for deployment in warfare - it will increasingly be the primary intelligence layer ingesting data, generating options, and compressing decision cycles from hours to seconds, and maybe a structural shift in what determines military effectiveness.

