world

world gives AI agents a structured interface to observe and act on system state.

The motivation is simple: agents that manage real systems — diagnosing why a service is down, checking what's using disk, restarting a container — need to interact with the OS. Today they do this by generating shell commands and parsing terminal output. This is fragile, unscoped, and impossible to constrain safely.

world treats the system as a partially observable environment. Agents observe structured state, act through a finite set of declared verbs, and await conditions instead of polling. Every action declares what it mutates. A compiled-in capability ceiling limits what any given binary can do, regardless of what the agent asks for.

This project grew out of Noah, an AI IT department for small businesses, where the agent needs to observe and manage machines on behalf of non-technical users.

The system is partially observable — agents cannot see everything, only what domains expose. Each domain declares a schema (spec) describing its observations, actions, and what each action mutates. The agent builds a world model from structured observations, changes state through declared verbs (act), and waits for conditions (await) instead of polling.

Why not just shell commands?

An agent with shell access can do anything — that's the problem. world is designed around three constraints:

Structured observations. world observe network --json returns a schema, not terminal output. The agent never has to parse ifconfig or netstat. Every domain returns the same shape: {details: {...}}.
Declared mutations. Every action says what observation paths it modifies (mutates: ["network.interfaces"]). This is a fact about the action, not a policy judgment.
Structural safety. The binary has a compiled-in capability ceiling — a set of observation schema paths it is allowed to mutate. An agent given a binary compiled with CEILING: &["network.*"] literally cannot kill processes or uninstall packages. No flag overrides this. To change it, recompile.

The combination means you can hand an agent a world binary and reason about what it can and cannot do, which is not possible with bash.

Concepts

Domains

A domain is a slice of the world that can be observed and acted on. Built-in domains cover macOS system state — processes, networks, containers, services, disks, printers, logs. External plugins extend this to package managers (brew, pip, npm) and anything else with state and actions.

spec

Every domain declares its schema — what can be observed, what actions exist, what each action mutates. Agents use this for discovery instead of guessing.

world spec              # all domains
world spec network      # one domain

observe → act → await

# What's using CPU?
world observe process top_cpu --limit 5

# Kill the offender
world act process 5678 kill

# Confirm it's gone
world await process 5678 stopped

observe reads structured state. act changes it through a declared verb. await blocks until a condition holds, using OS-native events where available (kqueue for process exit) and falling back to exponential backoff polling.

How does the agent know to await stopped after kill? The spec tells it. Actions that produce async effects declare what confirms them:

{ "verbs": ["kill"], "mutates": ["process.processes"], "resolves": "stopped" }

resolves means: this action's effect is async — await the named condition to confirm it landed. No resolves = synchronous, the exit code is the answer.

Session domains

Most domains are ambient — processes, disks, and networks always have state to observe. Some domains are different: they start empty and must be populated by an action before observation is meaningful. A browser has no page until you open one. An SSH connection has no host until you connect.

A domain declares this with "session": true in its spec. The agent sees schema-conforming null observations (all fields null, arrays empty) and knows from the spec that an action like open will populate them. No special state values, no separate lifecycle protocol — just the same observe/act/await loop, where the initial observation happens to be empty.

world observe browser
# → { "url": null, "title": null, "elements": [], "snapshot": null }

world act browser open url=https://example.com
# → { "url": "https://example.com", "title": "Example", "elements": [...], ... }

world act browser close
# → { "url": null, "title": null, "elements": [], "snapshot": null }

world await browser loaded --timeout 10
# blocks until a page is loaded

Targets

When an observation contains an array of items, the first field in each item is the target — the handle the agent uses to act on it. This is convention, not configuration.

processes:  [{ pid, name, cpu, ... }]       → world act process 5678 kill
interfaces: [{ name, up, addresses, ... }]  → world act network en0 disable
elements:   [{ ref, role, name, ... }]      → world act browser e2 click
lights:     [{ id, name, state, ... }]      → world act home living_room_light enable

Handlers expose a clean, actable identifier as the first field. Implementation details (like HomeAssistant entity IDs or Docker SHA hashes) never leak — the handler maps internally.

sample

A single observation is a snapshot. For quantities like CPU%, one snapshot is nearly useless. sample takes repeated observations and reduces them statistically:

world sample process top_cpu --limit 5 --count 5 --interval 2s

Fields that vary become {mean, min, max, delta, rate_per_sec}. Constant fields stay as scalars.

Domains

Domain	Default observation	Verbs	Await conditions
process	Top 20 by CPU	kill, remove, set	running, stopped, port_free
network	Interfaces + DNS + VPNs + connectivity	reset, enable, disable, remove, restart	host_reachable, dns_resolves, internet_reachable, port_open
container	Running containers	enable, disable, restart, remove, add, clear	running, stopped, healthy, image_exists, volume_exists
service	Running non-Apple services	restart, enable, disable, set	healthy, stopped
disk	Mounts + space usage	clear, reset, add, remove	writable, mounted, unmounted
brew	Installed packages	add, remove, reset, set	installed, uninstalled
pip	Installed packages + virtualenv	add, remove, set	installed
npm	Project packages (or global)	add, remove, set	installed
printer	Printers + status	clear, restart, set, reset	prints
log	Recent errors	(read-only)
browser (session)	Page URL + accessibility tree	open, close, click, fill, select, hover, scroll, press, eval	loaded, title_contains
ssh (session)	Remote host info + disk usage	open, close, exec	connected
home (session)	Lights, climate, sensors, locks, covers	open, close, enable, disable, set, lock, unlock	connected

Package managers are separate domains (brew, pip, npm) rather than a single "package" abstraction, because they have different scopes (system, virtualenv, node_modules) and the handler should use the runtime it observes.

CLI

world COMMAND DOMAIN [TARGET] [PREDICATE] [OPTIONS]

world observe DOMAIN [TARGET] [--limit N] [--since T]
world act     DOMAIN [TARGET] VERB [ARGS...] [--dry-run]
world await   DOMAIN [TARGET] CONDITION [--timeout N]
world sample  DOMAIN [TARGET] [--count N] [--interval T] [--limit N]
world spec    [DOMAIN]

Every command follows the same shape: domain, then target, then what to do. TARGET is optional for targetless actions (e.g., world act browser open https://example.com, world await network internet_reachable).

Output is JSON when piped and human-readable in TTY. --json / --pretty to force. -q for exit code only.

Extending

A plugin is a directory with three files:

plugins/npm/
  spec.json      # observations + actions + mutates
  dispatch.json  # (target, verb) → handler mapping
  handler.js     # reads JSON from stdin, writes JSON to stdout

The handler can be in any language (.py → python3, .js → node, .sh → sh, or a bare executable). The protocol is one JSON object in, one JSON object out.

Session domains (like browser) follow the same plugin structure. Add "session": true to spec.json. The handler returns null/empty observations when the session is inactive, and actions like open/close manage the lifecycle. The browser plugin delegates to agent-browser, which manages browser state via a background daemon.

Building

cargo build --release
cargo test

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 52 Commits
docs		docs
plugins		plugins
shell		shell
src		src
tests		tests
tla		tla
.gitignore		.gitignore
CLI_SPEC.md		CLI_SPEC.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
README.md		README.md
SPEC.md		SPEC.md
install.sh		install.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

world

Why not just shell commands?

Concepts

Domains

spec

observe → act → await

Session domains

Targets

sample

Domains

CLI

Extending

Building

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

world

Why not just shell commands?

Concepts

Domains

spec

observe → act → await

Session domains

Targets

sample

Domains

CLI

Extending

Building

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages