world gives AI agents a structured interface to observe and act on system state.
The motivation is simple: agents that manage real systems — diagnosing why a service is down, checking what's using disk, restarting a container — need to interact with the OS. Today they do this by generating shell commands and parsing terminal output. This is fragile, unscoped, and impossible to constrain safely.
world treats the system as a partially observable environment. Agents observe structured state, act through a finite set of declared verbs, and await conditions instead of polling. Every action declares what it mutates. A compiled-in capability ceiling limits what any given binary can do, regardless of what the agent asks for.
This project grew out of Noah, an AI IT department for small businesses, where the agent needs to observe and manage machines on behalf of non-technical users.
The system is partially observable — agents cannot see everything, only what domains expose. Each domain declares a schema (spec) describing its observations, actions, and what each action mutates. The agent builds a world model from structured observations, changes state through declared verbs (act), and waits for conditions (await) instead of polling.
An agent with shell access can do anything — that's the problem. world is designed around three constraints:
-
Structured observations.
world observe network --jsonreturns a schema, not terminal output. The agent never has to parseifconfigornetstat. Every domain returns the same shape:{details: {...}}. -
Declared mutations. Every action says what observation paths it modifies (
mutates: ["network.interfaces"]). This is a fact about the action, not a policy judgment. -
Structural safety. The binary has a compiled-in capability ceiling — a set of observation schema paths it is allowed to mutate. An agent given a binary compiled with
CEILING: &["network.*"]literally cannot kill processes or uninstall packages. No flag overrides this. To change it, recompile.
The combination means you can hand an agent a world binary and reason about what it can and cannot do, which is not possible with bash.
A domain is a slice of the world that can be observed and acted on. Built-in domains cover macOS system state — processes, networks, containers, services, disks, printers, logs. External plugins extend this to package managers (brew, pip, npm) and anything else with state and actions.
Every domain declares its schema — what can be observed, what actions exist, what each action mutates. Agents use this for discovery instead of guessing.
world spec # all domains
world spec network # one domain# What's using CPU?
world observe process top_cpu --limit 5
# Kill the offender
world act process 5678 kill
# Confirm it's gone
world await process 5678 stoppedobserve reads structured state. act changes it through a declared verb. await blocks until a condition holds, using OS-native events where available (kqueue for process exit) and falling back to exponential backoff polling.
How does the agent know to await stopped after kill? The spec tells it. Actions that produce async effects declare what confirms them:
{ "verbs": ["kill"], "mutates": ["process.processes"], "resolves": "stopped" }resolves means: this action's effect is async — await the named condition to confirm it landed. No resolves = synchronous, the exit code is the answer.
Most domains are ambient — processes, disks, and networks always have state to observe. Some domains are different: they start empty and must be populated by an action before observation is meaningful. A browser has no page until you open one. An SSH connection has no host until you connect.
A domain declares this with "session": true in its spec. The agent sees schema-conforming null observations (all fields null, arrays empty) and knows from the spec that an action like open will populate them. No special state values, no separate lifecycle protocol — just the same observe/act/await loop, where the initial observation happens to be empty.
world observe browser
# → { "url": null, "title": null, "elements": [], "snapshot": null }
world act browser open url=https://example.com
# → { "url": "https://example.com", "title": "Example", "elements": [...], ... }
world act browser close
# → { "url": null, "title": null, "elements": [], "snapshot": null }
world await browser loaded --timeout 10
# blocks until a page is loadedWhen an observation contains an array of items, the first field in each item is the target — the handle the agent uses to act on it. This is convention, not configuration.
processes: [{ pid, name, cpu, ... }] → world act process 5678 kill
interfaces: [{ name, up, addresses, ... }] → world act network en0 disable
elements: [{ ref, role, name, ... }] → world act browser e2 click
lights: [{ id, name, state, ... }] → world act home living_room_light enable
Handlers expose a clean, actable identifier as the first field. Implementation details (like HomeAssistant entity IDs or Docker SHA hashes) never leak — the handler maps internally.
A single observation is a snapshot. For quantities like CPU%, one snapshot is nearly useless. sample takes repeated observations and reduces them statistically:
world sample process top_cpu --limit 5 --count 5 --interval 2sFields that vary become {mean, min, max, delta, rate_per_sec}. Constant fields stay as scalars.
| Domain | Default observation | Verbs | Await conditions |
|---|---|---|---|
| process | Top 20 by CPU | kill, remove, set | running, stopped, port_free |
| network | Interfaces + DNS + VPNs + connectivity | reset, enable, disable, remove, restart | host_reachable, dns_resolves, internet_reachable, port_open |
| container | Running containers | enable, disable, restart, remove, add, clear | running, stopped, healthy, image_exists, volume_exists |
| service | Running non-Apple services | restart, enable, disable, set | healthy, stopped |
| disk | Mounts + space usage | clear, reset, add, remove | writable, mounted, unmounted |
| brew | Installed packages | add, remove, reset, set | installed, uninstalled |
| pip | Installed packages + virtualenv | add, remove, set | installed |
| npm | Project packages (or global) | add, remove, set | installed |
| printer | Printers + status | clear, restart, set, reset | prints |
| log | Recent errors | (read-only) | |
| browser (session) | Page URL + accessibility tree | open, close, click, fill, select, hover, scroll, press, eval | loaded, title_contains |
| ssh (session) | Remote host info + disk usage | open, close, exec | connected |
| home (session) | Lights, climate, sensors, locks, covers | open, close, enable, disable, set, lock, unlock | connected |
Package managers are separate domains (brew, pip, npm) rather than a single "package" abstraction, because they have different scopes (system, virtualenv, node_modules) and the handler should use the runtime it observes.
world COMMAND DOMAIN [TARGET] [PREDICATE] [OPTIONS]
world observe DOMAIN [TARGET] [--limit N] [--since T]
world act DOMAIN [TARGET] VERB [ARGS...] [--dry-run]
world await DOMAIN [TARGET] CONDITION [--timeout N]
world sample DOMAIN [TARGET] [--count N] [--interval T] [--limit N]
world spec [DOMAIN]
Every command follows the same shape: domain, then target, then what to do. TARGET is optional for targetless actions (e.g., world act browser open https://example.com, world await network internet_reachable).
Output is JSON when piped and human-readable in TTY. --json / --pretty to force. -q for exit code only.
A plugin is a directory with three files:
plugins/npm/
spec.json # observations + actions + mutates
dispatch.json # (target, verb) → handler mapping
handler.js # reads JSON from stdin, writes JSON to stdout
The handler can be in any language (.py → python3, .js → node, .sh → sh, or a bare executable). The protocol is one JSON object in, one JSON object out.
Session domains (like browser) follow the same plugin structure. Add "session": true to spec.json. The handler returns null/empty observations when the session is inactive, and actions like open/close manage the lifecycle. The browser plugin delegates to agent-browser, which manages browser state via a background daemon.
cargo build --release
cargo testMIT