Mira Murati bets against the autonomous agent

Thinking Machines released its first model on Monday, arguing the AI bottleneck is bandwidth, not autonomy — and timed to a round that would value the lab at $50 billion

// Share
Mira Murati bets against the autonomous agent

MIRA MURATI introduced GPT-4o, OpenAI's omnimodal flagship, at a launch event in San Francisco on May 13, 2024. The real-time voice and translation demos — Italian to English on the fly, the model interrupting itself when prompted, the "her" comparisons everyone in the room had already made before Murati got to them — defined the consumer voice-AI moment. Four months later Murati stepped down as the company's chief technology officer, and in February 2025 she founded Thinking Machines Lab. On Monday, the lab released TML-Interaction-Small: a 276-billion-parameter mixture-of-experts model with 12 billion active parameters, trained from scratch to do real-time multimodal interaction in a way that, in the lab's telling, the existing industry has been getting wrong. A "research preview," the disclosure called it. Modest, the model was not.

In the April system card for Claude Mythos Preview, the most consequential model release of the cybersecurity year, Anthropic had conceded something candid about how its own models worked best. "Autonomous, long-running agent harnesses better elicited the model's coding capabilities" than interactive use, the company wrote, where "some users perceived [our model] as too slow and did not realize as much value." Inside the AI labs, the line read as a vindication of the agent-first push: humans were the bottleneck, and the play was to get them out of the loop. Thinking Machines, on Monday, read it the other way and built a model around the opposite conclusion. The bottleneck, in Murati's reading, is bandwidth, not autonomy, and pushing humans out of the loop is the wrong response to the constraint.

Today's models, Murati's lab argues, experience reality "in a single thread," waiting for the user to finish, generating, and then freezing again, blind to what is happening on the other side of the connection. The frontier labs have responded to that constraint by shrinking the user's role rather than the model's: send the model away with a task, come back when it returns. Most of the recent product energy at the frontier has gone in that direction — Anthropic's Claude Code, OpenAI's Operator, the deep-research products that every major lab now ships — built to minimize the human's role in the loop. Real work, Murati argues, wants a collaborator and not a contractor, and the interface, not the model, has been doing the contorting.

// Members only

This article is for Vector members. Start a 7-day free trial to keep reading.

Start your free trial

// The Daily

Get Vector in your inbox.

A free morning briefing on the AI revolution. Weekdays at 6am CT.