Back to Blog
physical-aiagentsmcparchitecture

"Physical AI Needs a Typed World Model, Not a Vector DB"

May 14, 2026AimDB Team6 min read

I pasted a URL into Claude Desktop, asked "what's the temperature in Munich right now?" and got a live answer streamed from a sensor running on a microcontroller. No RAG. No vector DB. No tool-calling glue I had to write. The agent was reading the world directly.

That's the moment I realized the conversation about agent infrastructure has been pointed at the wrong question.

We've spent two years optimizing the retrieval path, better embeddings, better chunking, better re-ranking. That work is real and it matters, for the problem it solves. But an agent that is going to act on the physical world doesn't primarily need to retrieve things it heard about once. It needs to know what is, right now, in the system it's about to touch.

That is a different shape of data and it needs different infrastructure.

Vector DBs and world models aren't in opposition

Before going further, let me head off the comment I'd write under this post if I were reading it cold:

"Vector DBs and world models aren't competing. You need both."

Yes. You do.

Vector databases are the right tool for unstructured retrieval over a corpus that does not change in real time: documents the agent has read, conversations it has had, knowledge it must look up. Pinecone, pgvector, Qdrant, LanceDB, these solve that problem well.

A world model is a different thing. It is the live, typed, current state of the system the agent is operating in. Temperatures, setpoints, joint positions, queue depths, error conditions, who-owns-what. Three properties that vector DBs don't try to provide:

  • Current. Not "what was true when we last indexed." What is true now.
  • Typed. Not a chunk of text the agent has to parse. A Temperature { celsius: 22.4 }.
  • Bidirectional. The agent doesn't only read, it acts. Writes flow back into the same substrate.

You wouldn't store a robot's joint positions in a vector DB. You also wouldn't store the company handbook in a real-time graph of typed records. The two are orthogonal. The mistake is using one because the other doesn't exist yet.

What an embodied agent actually needs

Strip everything down. An agent that operates in the physical world has three jobs:

  1. Observe. Know the current state of the system.
  2. Decide. Reason about what to do.
  3. Act. Make a change to the system.

LLMs do (2). The plumbing for (1) and (3) is what's missing. Today, the typical solution is a stack of bespoke tool implementations: write a Python wrapper around an MQTT client, around a Modbus library, around a REST API, expose each as a tool the agent can call. That works for one robot, one factory, one demo. It collapses the moment you have ten heterogeneous systems and one agent that needs a coherent view across them.

What you want is a substrate where:

  • The same data contract spans the device, the edge, the cloud and the agent's view of the world.
  • Reading current state is one mechanism, not N integrations.
  • Writing back is the same mechanism in reverse.
  • The agent doesn't need to know whether the temperature came from a microcontroller, an edge gateway or a cloud service. It's just Temperature.

One Rust struct, MCU to LLM

AimDB's claim, plain: a Rust type is the contract and the contract spans the entire stack.

┌───────────────────────────────────────────────────────────────────────────────┐
│                           Temperature Contract                                │
├───────────────────┬───────────────────┬───────────────────┬───────────────────┤
│  MCU (Embassy)    │  Edge (Tokio)     │  Cloud (Tokio)    │  Browser (WASM)   │
│  no_std + alloc   │  std              │  Kubernetes       │  wasm32           │
│  Cortex-M4        │  Linux / RPi      │  Full featured    │  Single-threaded  │
└───────────────────┴───────────────────┴───────────────────┴───────────────────┘
                                       │
                                       ▼
                         ┌───────────────────────────┐
                         │   Agent (over MCP)        │
                         │   "what's the temp?"      │
                         │   "set zone-a to 22°C"    │
                         └───────────────────────────┘

A single struct Temperature { celsius: f32 } compiles unchanged for a Cortex-M4 running Embassy on a rooftop, the Tokio service ingesting it on a Raspberry Pi, the Kubernetes pod aggregating it across thirty buildings and the WASM module running in the operator's browser tab. There is no schema registry. There is no codegen. There is no translation layer.

And then the same record graph is exposed to an MCP-compatible LLM. The agent doesn't see "an MQTT topic" or "a Modbus register" or "a REST endpoint." It sees Temperature, the type, with its current value and its lineage, independent of what wire format moved it.

That's what makes the demo possible.

The demo, walked through

The MCP server ships with AimDB. Point any MCP-compatible client — Claude Desktop, VSCode + Copilot, whatever speaks MCP — at a running instance and ask: "What's the temperature in Munich right now?"

The agent gets a typed, current reading from a real sensor on a rooftop. The live demo at aimdb.dev keeps it running, no install required.

This is not magic. It is what falls out when the data layer of the system is uniform from sensor to LLM. The MCP tool surface is generated from the record graph itself: every record type the system knows about becomes a queryable concept, every key becomes an addressable instance.

Read, write, observe

Three things the world-model substrate gives the agent that a retrieval-only stack doesn't.

Read: current, not retrieved

The agent reads the current value of Temperature at key climate/zone-a/reading. Not an embedding of a chunk of text describing the temperature an hour ago. The actual current value. There is no staleness window introduced by a re-indexing job. The buffer underneath the record is a real-time primitive and the agent's read goes against that.

Write: the agent acts via the same substrate

Reading is half the loop. The other half is the agent producing records that downstream consumers act on. AimDB models multi-writer arbitration as a graph: each writer (an operator, an autotuner, the LLM) owns its own request stream and a single arbiter consumes all of them and emits the applied result. The conflict becomes a transform, not a merge inside a buffer.

This means the LLM is just one more writer in a system that already had multiple writers. There is no special "agent path" through the architecture. The agent's setpoint requests flow through the same arbiter as the human operators' requests, governed by the same logic, observable the same way.

The mechanics are in Record Ownership: Which Side Is Right?, that's the deeper-dive on this point.

Observe: agent actions are auditable

Every record in AimDB can implement Observable and when it does, every produce, consume and transform emits metrics automatically. The agent's writes are visible in the same dashboards as a human operator's. The same alerts fire. The same audit log captures them. There is no separate "AI activity log" because there is no separate path.

This matters for the same reason it matters in any system that gives an autonomous component the ability to act: the people responsible for the system need to see what it did. Auditable agency, not a black-box tool call.

What this isn't

Three honest framings, because the comments will ask:

  • Not robotics middleware. AimDB is not ROS. There is no kinematics, no transform tree, no motion planner, no hard real-time scheduler. The data substrate itself can carry a 1 kHz control loop, but only if you keep that loop in-process. The moment it crosses a process or network boundary, you're paying for transport and that's on you to design around.
  • Not a vector store. Don't put your documents in AimDB. Use a vector DB for that. The two complement each other.
  • Not OLTP. The hot path is in-memory and real-time, designed for current state. Optional persistence backends (SQLite today, others pluggable) give you bounded record history with a retention window. Useful, but not the same as ACID transactions over years of business data. If that's what you need, reach for Postgres.

What it is: a typed real-time data substrate that happens to be uniquely well-positioned as a world model for physical AI agents, because the same contract that already spans your devices and services can extend to the agent without a new integration layer.

Where this goes

Sensor on a rooftop, MQTT, edge, MCP, LLM, your screen. One Rust struct, end to end. The live demo is the shortest path to seeing it.

If that's interesting:

If this lands, two posts go deeper on the substrate. Record Ownership: Which Side Is Right? takes on the question that comes up the moment multiple actors want to influence the same outcome. A follow-up coming in a few weeks takes on transports: how the same typed contract crosses MQTT, KNX, WebSocket and whatever else lives in your stack and how to plug in your own when the protocol you need isn't on that list.

Physical AI is becoming the dominant frontier in agent work. The infrastructure conversation is still catching up. The retrieval problem is largely solved; the world-model problem is wide open.

This is one shape of the answer.


Spell checks in this post were created with the help of an LLM.

Stay in the loop

Get notified about new posts and releases. No spam — unsubscribe anytime.

Powered by Buttondown