rlm-cli

CLI for Recursive Language Models — based on the RLM paper.

Instead of dumping a huge context into a single LLM call, RLM lets the model write Python code to process it — slicing, chunking, running sub-queries on pieces, and building up an answer across multiple iterations.

What's New in v0.5.0

Ollama support — use any locally-installed model (llama3, mistral, qwen, etc.) with zero API key setup
Mixed-model mode — sub_model in config lets you use a cheap/fast model for sub-queries and a powerful model for the root loop (mirrors the paper's GPT-5 + GPT-5-mini setup)
Paper-aligned system prompt — per-iteration budget awareness, sub-query strategy guidance, parallel async patterns from arXiv:2512.24601
Session-based trajectories — runs grouped into ~/.rlm/sessions/<session-id>/ instead of a flat directory
Refreshed terminal UI — Electric Amber RGB palette, two-column welcome panel with version in border, silent operation (no noise between queries)
Depth enforcement — max_depth now actually enforced (was tracked but never checked)

Install

npm install -g rlm-cli

Requires Node.js >= 20 and Python 3.

Run rlm to start. First launch will prompt for a provider + API key (saved to ~/.rlm/credentials).

Supported Providers

Provider	Env Variable	Default Model
Anthropic	`ANTHROPIC_API_KEY`	`claude-sonnet-4-6`
OpenAI	`OPENAI_API_KEY`	`gpt-4o`
Google	`GEMINI_API_KEY`	`gemini-2.5-flash`
OpenRouter	`OPENROUTER_API_KEY`	`auto`
Ollama	(no key needed)	any installed model

Ollama (local models)

If Ollama is running, rlm-cli auto-detects it at startup — no config needed.

ollama pull llama3.1:8b
rlm
# → /model llama3.1:8b   or   /provider → choose Ollama

Set a custom daemon URL with OLLAMA_BASE_URL=http://....

Keys are loaded from (highest priority wins):

Shell environment variables
.env file in project root
~/.rlm/credentials

From Source

git clone https://github.com/viplismism/rlm-cli.git
cd rlm-cli
npm install
npm run build
npm link

Usage

Interactive Terminal

rlm

Persistent session with a two-column welcome panel showing your model, provider, context, and quick-ref slash commands. Everything auto-saves to a session folder.

Slash commands:

Command	What it does
`/file <path>`	Load file, directory, or glob as context
`/url <url>`	Fetch URL as context
`/paste`	Multi-line paste mode
`@file <query>`	Load file + run query in one step
`/model [id\|#]`	List or switch model (shows Ollama models too)
`/provider`	Switch provider (includes Ollama if running)
`/key`	Update an API key
`/trajectories`	Browse saved sessions
`/context`	Show loaded context info
`/clear`	Clear screen
`/help`	Full command reference
`/quit`	Exit

Tips:

Just type a question — no context needed for general queries
Paste a URL directly to fetch it as context
Paste 4+ lines of text to set it as context
Ctrl+C stops a running query, Ctrl+C twice exits

Single-Shot Mode

rlm run --file large-file.txt "List all classes and their methods"
rlm run --url https://example.com/data.txt "Summarize this"
cat data.txt | rlm run --stdin "Count the errors"
rlm run --model gpt-4o --file code.py "Find bugs"

Answer goes to stdout, progress to stderr — pipe-friendly.

Trajectory Viewer

rlm viewer

Browse saved runs in a TUI. Navigate iterations, inspect code and output at each step, drill into sub-queries. Sessions are saved to ~/.rlm/sessions/.

Benchmarks

Compare direct LLM vs RLM on the same query from standard long-context datasets.

Benchmark	Dataset	What it tests
`oolong`	Oolong Synth	Synthetic long-context: timeline ordering, user tracking, counting
`longbench`	LongBench NarrativeQA	Reading comprehension over long narratives

rlm benchmark oolong          # default: index 4743
rlm benchmark longbench       # default: index 182
rlm benchmark oolong --idx 10

Python dependencies are auto-installed into a .venv on first run.

How It Works

Your full context is loaded into a persistent Python REPL as a context variable
The LLM gets metadata about the context (size, preview) plus your query
It writes Python code that can slice context, call llm_query(chunk, instruction) for sub-questions, and call FINAL(answer) when done
Code runs, output is captured and fed back for the next iteration
Loop continues until FINAL() is called or max iterations are reached

For large documents, the model chunks the text and runs parallel sub-queries with async_llm_query() + asyncio.gather(), then aggregates the results.

Configuration

Create rlm_config.yaml in your working directory:

max_iterations: 20       # Max iterations before giving up (1-100)
max_depth: 1             # Max recursive sub-agent depth — paper uses 1
max_sub_queries: 50      # Max total sub-queries (1-500)
truncate_len: 5000       # Truncate REPL output beyond this (500-50000)
metadata_preview_lines: 20

# Use a cheaper/faster model for sub-queries (paper: GPT-5-mini for sub-calls)
# sub_model: gpt-4o-mini
# sub_model: claude-haiku-3-5
# sub_model: llama3.1:8b        # Ollama model for free sub-queries!

The sub_model option is the key cost-saving trick from the paper — a fast cheap model handles the chunking work while the root model synthesizes the final answer.

Project Structure

src/
  main.ts          CLI entry point and command router
  interactive.ts   Interactive terminal REPL
  rlm.ts           Core RLM loop (Algorithm 1 from paper)
  repl.ts          Python REPL subprocess manager
  runtime.py       Python runtime (FINAL, llm_query, async_llm_query)
  cli.ts           Single-shot CLI mode
  viewer.ts        Trajectory viewer TUI
  colors.ts        Terminal color palette (Electric Amber RGB)
  ollama.ts        Ollama local model integration
  config.ts        Config loader
  env.ts           Environment variable loader
benchmarks/
  oolong_synth.ts
  longbench_narrativeqa.ts
  requirements.txt
bin/
  rlm.mjs          Global CLI shim

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 72 Commits
.github/agents		.github/agents
benchmarks		benchmarks
bin		bin
site		site
src		src
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
demo.png		demo.png
package-lock.json		package-lock.json
package.json		package.json
rlm_config.yaml		rlm_config.yaml
tsconfig.json		tsconfig.json
tsconfig.tsbuildinfo		tsconfig.tsbuildinfo

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

rlm-cli

What's New in v0.5.0

Install

Supported Providers

Ollama (local models)

From Source

Usage

Interactive Terminal

Single-Shot Mode

Trajectory Viewer

Benchmarks

How It Works

Configuration

Project Structure

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

rlm-cli

What's New in v0.5.0

Install

Supported Providers

Ollama (local models)

From Source

Usage

Interactive Terminal

Single-Shot Mode

Trajectory Viewer

Benchmarks

How It Works

Configuration

Project Structure

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages