Zhengxu Yu

Zhengxu Yu

AI Researcher, Huawei London Research Centre (ex-Alibaba)

Email: yuzxfred AT gmail.com

© 2026

The Infrastructure for Autonomous AI Companies

TL;DR

There is a new kind of company shape emerging. It does not feel like a robot CEO. It feels like a ticket queue that learned to hire temporary workers. In human companies, scaling means hiring. In AI companies, scaling means routing: sending each task through the right agents, tools, permissions, budgets, evals, and human approvals without losing the plot.

Recent AI agent progress made me start thinking about what comes next. My answer is AI organization.

A single agent has a ceiling. The context is finite. The tool list gets too long. Skills start to overlap. At some point, making one agent more capable also makes it more complicated to steer.

The next step is not to keep stuffing more context, tools, and skills into one universal agent. It is to split the work across smaller agents with narrower roles, then build the layer that routes tasks between them.

Take a normal founder question: “Is this idea worth building?”

That is not one task. First you have to clarify the user, the pain, the wedge, the market, the obvious objections, and what would make the idea false. Then someone should check whether people already search for this, complain about it, or pay for bad alternatives. Then the vague idea has to become a narrow first version. Then someone has to estimate whether that version is a weekend prototype or a three-month build. Then someone has to ask who the first ten users would be and how to reach them. Finally, someone has to make the founder decision: kill it, park it, or define the smallest test that would make the next decision less stupid.

A single agent can try to hold all of that in its head, but that is exactly where the context gets muddy. The cleaner version is to route the idea through agents with different scopes, keep the intermediate artifacts, and only bring the human back for the real decision.

That is the moment the “AI company” idea stops sounding like a sci-fi org chart and starts sounding like infrastructure.

MCP and A2A matter because they make this less imaginary. MCP lets agents reach tools and data. A2A lets agents discover, talk to, and delegate to other agents. So the interesting question is no longer just “can the model do the task?” It becomes:

which agent should do which part, with which tool, under which permission, paid from which budget, checked by which eval, and escalated to which human?

In human companies, scaling means hiring. In AI companies, scaling means routing.

The Data

The data does not prove that autonomous AI companies already exist. It says something more interesting: agents are entering workflows faster than companies are learning how to control them.

Signal Data Meaning
Adoption McKinsey 2025: 23% scaling agentic AI in at least one function; 39% experimenting. Agents are moving into real operations.
Distribution Gartner: up to 40% of enterprise apps may include task-specific AI agents by end of 2026, up from less than 5% in 2025. Agents become a software distribution pattern.
Experimentation Deloitte: 25% of gen-AI companies expected to launch agentic pilots in 2025; 50% by 2027. The pilot curve is steep.
Governance Deloitte 2026: only about one in five companies has mature governance for autonomous agents. Control is behind adoption.
Security IBM 2025: 13% reported breaches of AI models or apps; 97% of those lacked proper AI access controls. Unmanaged agency becomes security risk.

The ingredients are arriving in the wrong order. First agents show up inside products. Then pilots spread. Then access-control problems appear. Then everyone realizes the hard part was not “can the agent do the task?” It was “who let it do that, with what authority, and how do we know it was right?”

Software development already has a small version of this. Stack Overflow 2025 found 84% of developers use or plan to use AI tools, and 51% of professional developers use them daily, while sentiment and trust remain fragile. People use the tools anyway. They just stop trusting the output by default.

That is the whole thing: agency is getting cheap; accountability is not.

The Company Is Not the Agent

The tempting mistake is to think the agent is the interesting unit.

It isn’t. The agent is just labor. The company is the system around the labor.

An agent can complete a task. A company has to remember what happened, allocate risk, decide who is allowed to spend money, know which facts are official, evaluate the output, and explain itself later.

If every AI worker is just a prompt plus tools, the system cannot answer basic company questions:

  • Who is acting?
  • What can they access?
  • What are they allowed to spend?
  • Which memory is official?
  • How is output judged?
  • Who approves risky actions?
  • What record exists after the action?

This is where the infrastructure gets boring, which is usually a sign that it is real. Identity. Permissions. Memory. Budgets. Evals. Audit logs. Human approval paths.

Without that layer, “autonomous company” mostly means giving a model too many tools and hoping nothing expensive happens.

Hiring Becomes Routing

This is the cleanest way I can say the shift:

in a human company, you scale by adding people;

in an AI company, you scale by adding routable capabilities.

Suppose a customer asks for a custom enterprise integration. A naive agent system gives the request to a sales agent and hopes for the best. The more company-like version routes it:

  • sales qualifies the opportunity
  • engineering checks feasibility
  • finance estimates margin
  • legal flags contractual risk
  • a reviewer checks policy compliance
  • a human approves only if risk or value crosses a threshold

The question quietly changes from “who do we employ?” to “which capability should receive this task under these constraints?”

This is the part that feels new. A team can exist for one task, get a budget, use a few tools, produce an artifact, get evaluated, leave a log, and dissolve.

The org chart becomes executable. Or maybe the org chart disappears and what remains is a routing graph.

The Infrastructure Stack

The stack is not conceptually hard. It is just the stuff you suddenly need when work can act on its own.

Identity. Every agent needs a persistent role, scope, skill profile, work history, and permission boundary. Without identity, there is no responsibility.

Memory. The organization cannot live in the context window. It needs personal memory, team memory, and institutional memory, with ownership, correction, and approval.

Permissions. Autonomy means action inside boundaries. Tool access is too coarse. Permissions must cover data sensitivity, action type, dollar amount, reversibility, external visibility, and approval thresholds.

Budgets. Digital labor still has economics. Compute, tool spend, model tier, and human review time all need allocation. The AI company needs a P&L for intelligence.

Evaluation. The manager becomes a test harness. Code has tests. Support has satisfaction and escalation rates. Marketing has conversion. Strategy has delayed outcomes. Agents are useful only where output can be judged.

Audit. Every meaningful action needs a ledger: agent, context, tool call, change, cost, approval, and outcome. The audit log is not just compliance. It is how the organization learns.

Governance. I do not want to manage a hundred chat windows. I want to set mission, risk appetite, approval thresholds, capital allocation, and escalation policy. NIST’s AI Risk Management Framework talks about Govern, Map, Measure, and Manage. For autonomous companies, those become runtime primitives.

Integration. The protocol layer is improving. Anthropic’s Model Context Protocol gives agents a standard way to connect to tools and data. Google’s Agent2Agent protocol goes sideways: agents discovering each other’s capabilities, delegating tasks, exchanging context, and collaborating across frameworks or vendors. MCP is agent-to-tool. A2A is agent-to-agent. Both are directly relevant to this routing view of the company.

But connection is not control. A2A can help agents talk to each other; it does not decide whether they should. The real question is not “can this agent call that tool or delegate to that agent?” It is “under which identity, budget, permission, evaluation, and audit context is that action allowed?”

The Funny Part

The funny part is that after all this, it still looks like a company.

You can replace humans with agents and somehow end up reinventing roles, permissions, budgets, review, compliance, escalation, and performance tracking.

Maybe that is disappointing. I find it useful. It means the future autonomous AI company is not a swarm of agents doing vibes in the cloud. It is more like:

company = labor + routing + memory + permission + budget + evaluation + accountability.

The agent is the worker. The routing layer is the company.

Sources

No references found.

- views

Comments