Skip to content

jrollin/cartog

Repository files navigation

cartog

CI codecov Crates.io Crates.io downloads GitHub stars License: MIT

Map your codebase. Navigate by graph, not grep.

cartog gives your AI coding agent a pre-computed code graph — symbols, calls, imports, inheritance — so it queries structure in 1-2 calls instead of 6+. Everything runs locally: no API calls, no cloud, no data leaves your machine.

Why cartog

grep/cat workflow cartog
Tokens per query ~1,700 ~280 (83% fewer)
Recall (completeness) 78% 97%
Query latency multi-step 8-450 us
Privacy n/a 100% local — no remote calls
Transitive analysis impossible impact --depth 3 traces callers-of-callers

Where cartog shines most: tracing call chains (88% token reduction, 35% grep recall vs 100% cartog), finding callers (95% reduction), and type references (93% reduction).

Measured across 13 scenarios, 5 languages (full benchmark suite).

What you get immediately

  • Single binary, self-containedcargo install cartog and you're done. No Docker, no config.
  • 100% offline — tree-sitter parsing + SQLite storage + ONNX embeddings. Your code never leaves your machine, ever.
  • Optional LSP precision — auto-detects language servers on PATH to boost edge resolution from ~25% to ~42-81%. Works without them, better with them.
  • Smart search routing — keyword search (sub-ms, symbol names) and semantic search (natural language queries) work together. Run both in parallel when unsure.
  • Live indexcartog watch auto re-indexes on file changes. Your agent always queries fresh data.
  • MCP servercartog serve exposes 12 tools over stdio. Plug into Claude Code, Cursor, Windsurf, Zed, or any MCP-compatible agent.

cartog demo

Quick Start

cargo install cartog
cd your-project
cartog index .               # build the graph (~95ms for 4k LOC, incremental)
cartog search validate       # find symbols by name (sub-millisecond)
cartog refs validate_token   # who calls/imports/references this?
cartog impact validate_token # what breaks if I change this?

Add semantic search (optional, still fully local)

cartog rag setup             # download embedding + re-ranker models (~1.2GB, one-time)
cartog rag index .           # embed all symbols into sqlite-vec
cartog rag search "authentication token validation"   # natural language queries

Models are downloaded once to ~/.cache/cartog/models/ and run locally via ONNX Runtime. No API keys, no network calls at query time.

Install

From crates.io

cargo install cartog                    # core (heuristic resolution only)
cargo install cartog --features lsp     # + LSP-based resolution (recommended)

The lsp feature adds ~50KB to the binary. It auto-detects language servers on PATH (rust-analyzer, pyright, typescript-language-server, gopls, ruby-lsp, solargraph) and uses them to resolve edges that heuristic matching can't. No extra config needed — if a server is on PATH, it's used automatically.

Pre-built binaries

Download from GitHub Releases:

# macOS (Apple Silicon)
curl -L https://github.com/jrollin/cartog/releases/latest/download/cartog-aarch64-apple-darwin.tar.gz | tar xz
sudo mv cartog /usr/local/bin/

# macOS (Intel)
curl -L https://github.com/jrollin/cartog/releases/latest/download/cartog-x86_64-apple-darwin.tar.gz | tar xz
sudo mv cartog /usr/local/bin/

# Linux (x86_64)
curl -L https://github.com/jrollin/cartog/releases/latest/download/cartog-x86_64-unknown-linux-gnu.tar.gz | tar xz
sudo mv cartog /usr/local/bin/

# Linux (ARM64)
curl -L https://github.com/jrollin/cartog/releases/latest/download/cartog-aarch64-unknown-linux-gnu.tar.gz | tar xz
sudo mv cartog /usr/local/bin/

# Windows (x86_64) — download .zip from releases page

Configuration

The database path is resolved automatically — no config needed for standard use:

  1. --db flag / CARTOG_DB env var — explicit override (highest priority)
  2. .cartog.toml at the git root — project-specific config
  3. Auto git-root detection — DB placed at the root of the git repository
  4. cwd fallback.cartog.db in the current directory
# Override database location
cartog --db /tmp/myproject.db index .
CARTOG_DB=~/.local/share/cartog/proj.db cartog search foo

# --db is global — applies to all subcommands
cartog --db /tmp/x.db stats

.cartog.toml (optional, place at project root):

[database]
path = "~/.local/share/cartog/myproject.db"

Useful when indexing from a parent directory across multiple projects, or when storing the DB outside the repo. See docs/usage.md for details.

Search: Keyword, Semantic, or Both

cartog offers two search modes that complement each other:

Query type Command Speed Best for
Symbol name / partial name cartog search parse sub-ms You know the name: validate_token, AuthService
Natural language / concept cartog rag search "error handling" ~150-500ms You know the behavior, not the name
Broad keyword, unsure Run both in parallel sub-ms + ~300ms auth, config — catch names + semantics

Narrowing pattern: cartog search parse returns 30 hits? Narrow with cartog rag search "parse JSON response body" to pinpoint the right ones.

# Direct keyword search — fast, exact
cartog search validate_token
cartog search parse --kind function --limit 10

# Semantic search — natural language, conceptual
cartog rag search "database connection pooling"
cartog rag search "error handling" --kind function

# Both in parallel when unsure
cartog search auth & cartog rag search "authentication and authorization"

Commands

# Index
cartog index .                              # Build the graph (with LSP if available)
cartog index . --no-lsp                     # Fast heuristic-only (~1-4s)
cartog index . --force                      # Re-index all files

# Search
cartog search validate                      # Find symbols by partial name
cartog search validate --kind function      # Filter by kind
cartog rag search "token validation"        # Semantic search (natural language)

# Navigate
cartog outline src/auth/tokens.py           # File structure without reading it
cartog refs validate_token                  # Who references this? (calls, imports, inherits, types)
cartog refs validate_token --kind calls     # Filter: only call sites
cartog callees authenticate                 # What does this call?
cartog impact SessionManager --depth 3      # What breaks if I change this?
cartog hierarchy BaseService                # Inheritance tree
cartog deps src/routes/auth.py              # File-level imports
cartog stats                                # Index summary

# Watch (auto re-index on file changes)
cartog watch .                              # Watch for changes, re-index automatically
cartog watch . --rag                        # Also re-embed symbols (deferred)

# MCP Server
cartog serve                                # MCP server over stdio (12 tools)
cartog serve --watch                        # With background file watcher
cartog serve --watch --rag                  # Watcher + deferred RAG embedding

All commands support --json for structured output.

Example outputs

outline

$ cartog outline auth/tokens.py
from datetime import datetime, timedelta  L3
from typing import Optional  L4
import hashlib  L5
class TokenError  L11-14
class ExpiredTokenError  L17-20
function generate_token(user: User, expires_in: int = 3600) -> str  L23-27
function validate_token(token: str) -> Optional[User]  L30-44
function lookup_session(token: str) -> Optional[Session]  L47-49
function refresh_token(old_token: str) -> str  L52-56
function revoke_token(token: str) -> bool  L59-65

search

$ cartog search validate
function  validate_token    auth/tokens.py:30
function  validate_session  auth/tokens.py:68
function  validate_user     services/user.py:12

Results ranked: exact match > prefix > substring. Case-insensitive.

impact

$ cartog impact validate_token --depth 3
  calls  get_current_user  auth/service.py:40
  calls  refresh_token  auth/tokens.py:54
    calls  impersonate  auth/service.py:52

refs

$ cartog refs UserService
imports  ./service  routes/auth.py:3
calls    login  routes/auth.py:15
inherits AdminService  auth/service.py:47
references  process  routes/auth.py:22

How It Works

graph LR
    A["Source files<br/>(py, ts, rs, go, rb, java)"] -->|tree-sitter| B["Symbols + Edges"]
    B -->|write| C[".cartog.db<br/>(SQLite)"]
    C -->|query| D["search / refs / impact<br/>outline / callees / hierarchy"]
    C -->|embed locally| E["ONNX embeddings<br/>(sqlite-vec)"]
    E -->|query| F["rag search<br/>(FTS5 + vector KNN + reranker)"]
Loading
  1. Index — walks your project, parses each file with tree-sitter, extracts symbols (functions, classes, methods, imports, variables) and edges (calls, imports, inherits, raises, type references)
  2. Store — writes everything to a local .cartog.db SQLite file
  3. Resolve (heuristic) — links edges by name with scope-aware matching (same file > import path > same directory > unique project match)
  4. Resolve (LSP, optional) — for edges the heuristic couldn't resolve, sends textDocument/definition to language servers for compiler-grade precision. Results persist in the DB.
  5. Embed (optional) — generates vector embeddings locally with ONNX Runtime (BAAI/bge-small-en-v1.5), stored in sqlite-vec
  6. Query — instant lookups against the pre-computed graph, hybrid FTS5 + vector search with RRF merge and cross-encoder re-ranking

Re-indexing is incremental: git diff + SHA-256 skips unchanged files, and Merkle-tree diffing within changed files updates only modified symbols. cartog watch automates this on file changes.

Everything runs on your machine. No API keys. No cloud endpoints. No telemetry. Your code stays local.

MCP Server

cartog runs as an MCP server, exposing 12 tools (10 core + 2 RAG) over stdio.

# Claude Code
claude mcp add cartog -- cartog serve

# With live re-indexing
claude mcp add cartog -- cartog serve --watch --rag

# Cursor — add to .cursor/mcp.json
# Windsurf — add to ~/.codeium/windsurf/mcp_config.json
# OpenCode — add to .opencode.json
# Zed — add to ~/.config/zed/settings.json

Common config (JSON):

{
  "mcpServers": {
    "cartog": {
      "command": "cartog",
      "args": ["serve", "--watch", "--rag"]
    }
  }
}

See Usage — MCP Server for per-client installation details.

Agent Skill

Install cartog as an Agent Skill for Claude Code, Cursor, Copilot, and other compatible agents:

npx skills add jrollin/cartog

Or install manually:

cp -r skills/cartog ~/.claude/skills/

The skill teaches your AI agent when and how to use cartog — including search routing (rag search as default, structural search for refs/callees/impact), refactoring workflows, and when to fall back to grep. See Agent Skill for details.

Privacy

cartog is designed for air-gapped and privacy-conscious environments:

  • Parsing: tree-sitter runs in-process, no external calls
  • Storage: SQLite file in your project directory (.cartog.db)
  • Embeddings: ONNX Runtime inference, models cached locally (~/.cache/cartog/models/)
  • Re-ranking: cross-encoder runs locally via ONNX, no API
  • MCP server: communicates over stdio only, no network sockets
  • No telemetry, no analytics, no phone-home of any kind

Your code never leaves your machine. Not during indexing, not during search, not ever.

Supported Languages

Language Extensions Symbols Edges
Python .py, .pyi functions, classes, methods, imports, variables calls, imports, inherits, raises, type refs
TypeScript .ts, .tsx functions, classes, methods, imports, variables calls, imports, inherits, type refs, new
JavaScript .js, .jsx, .mjs, .cjs functions, classes, methods, imports, variables calls, imports, inherits, new
Rust .rs functions, structs, traits, impls, imports calls, imports, inherits (trait impl), type refs
Go .go functions, structs, interfaces, imports calls, imports, type refs
Ruby .rb functions, classes, modules, imports calls, imports, inherits, raises, rescue types
Java .java classes, interfaces, enums, methods, imports, variables calls, imports, inherits, raises, type refs, new

Performance

Indexing: 69 files / 4k LOC in 95ms (Python fixture, release build). Incremental re-index skips unchanged files.

Query latency (criterion benchmarks on the same fixture):

Query type Latency
outline 8-14 us
hierarchy 8-9 us
deps 25 us
stats 32 us
search 81-102 us
callees 177-180 us
refs 258-471 us
impact (depth 3) 2.7-17 ms

Edge Resolution: Heuristic vs LSP

cartog uses a two-tier resolution strategy. The heuristic pass runs instantly; LSP is optional and adds precision.

Project type Language Heuristic only With LSP Time (LSP)
TS microservice (230 files) TypeScript 37% 81% 13s
Vue.js SPA (739 files) Vue/TS/JS 31% 72% 25s
Rust CLI (358 files) Rust 25% 44% 72s

Remaining unresolved edges are mostly calls to external libraries (std, node_modules, crates) — definitions outside the project boundary.

When to use LSP: before a major refactoring, when refs or impact seem incomplete. When to skip (--no-lsp): day-to-day exploration, post-change verification, watch mode.

Design Trade-offs

  • Two-tier resolution — fast heuristic pass (~1s) for daily use, optional LSP for precision refactoring. Results persist in SQLite — pay the LSP cost once.
  • Self-contained — single binary, all dependencies compiled in. LSP is opt-in via language servers already on your PATH.
  • Incremental — git diff + SHA256 per file, Merkle-tree diff per symbol. Stable IDs survive line movements.
  • Local-first — embedding models run via ONNX Runtime on your CPU. Slower than API calls, but your code stays private.

Documentation

License

MIT

About

claude code with treesitter and query capability (RLM)

Topics

Resources

License

Contributing

Security policy

Stars

Watchers

Forks

Sponsor this project

Packages

 
 
 

Contributors