Distill the Universe. Expand the Atom. Mathematically Verified within the Canonical Semantic Boundary.
SUM is a Gödel-State Knowledge Engine — a system that encodes semantic content as prime-factored integers and performs semantic operations (merge, entailment, paradox detection, graph traversal) using pure number theory, avoiding corpus-scale scanning.
Built on the formalisms of the Semantic Prime Number Theorem and Gauge-Theoretic CRUD, SUM replaces probabilistic vector databases with deterministic integer arithmetic for core knowledge operations. It provides mechanically verifiable round-trip conservation within the canonical semantic representation, cross-runtime state verification, signed knowledge transport, and decentralized P2P syncing.
Complexity note: operations described below as O(1) are O(1) in axiom count — they do not scan the corpus. They scale with integer bit length (sub-quadratic via GMP). See
docs/PROOF_BOUNDARY.mdfor precise complexity analysis.
┌────────────────────────────────────────────────────────────────────────────┐
│ Quantum Knowledge OS (SUM) │
│ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────────────────────────┐ │
│ │ Quantum UI │ │ Quantum API │ │ Autonomous Daemons │ │
│ │ quantum.html │ │ /state /sync │ │ • Subconscious Crystallizer │ │
│ │ WASM Engine │ │ /ingest /zk │ │ • P2P Holographic Mesh │ │
│ │ vis-network │ │ /sync/state │ │ • Epistemic Arbiter │ │
│ │ Telemetry HUD│ │ /discoveries │ │ • Automated Scientist Daemon │ │
│ └──────┬───────┘ └──────┬───────┘ └──────────────┬───────────────────┘ │
│ │ │ │ │
│ ┌──────┴─────────────────┴──────────────────────────┴─────────────────┐ │
│ │ GlobalKnowledgeOS Singleton │ │
│ ├────────────────────────────────────────────────────────────────────┤ │
│ │ GodelStateAlgebra │ Epistemic Engine │ Multiverse │ │
│ │ • SHA-256 Primes │ • Paradox Collapse │ • Branching │ │
│ │ • LCM Merge / GCD Sync │ • ZK Semantic Proof │ • Merging │ │
│ │ • GraphRAG │ • Causal Cascades │ • Time Travel │ │
│ │ • Zig C-ABI Fast Path │ • Deterministic │ │ │
│ │ (Strangler Fig) │ Arbiter (SHA-256) │ │ │
│ ├─────────────────────────┴─────────────────────┴────────────────────┤ │
│ │ Extraction Validator (Structural Gate → Algebra) │ │
│ ├────────────────────────────────────────────────────────────────────┤ │
│ │ Akashic Ledger (SQLite) │ │
│ │ Event-sourced • Crash-safe • Merkle Hash-Chain • Replayable │ │
│ └────────────────────────────────────────────────────────────────────┘ │
└────────────────────────────────────────────────────────────────────────────┘
Every irreducible fact ("Alice age 30") is assigned a unique, deterministic prime number via SHA-256 hashing. The entire knowledge state of the universe is a single massive integer — the product of all active primes.
| Semantic Operation | Mathematical Equivalent | Complexity |
|---|---|---|
| Branch / Fork | Integer Copy: Branch_B = Branch_A |
O(n) copy where n = bit length |
| Merge States | Least Common Multiple: math.lcm(A, B) |
O(n²) via GCD on n-bit integers |
| Verify a Fact | Modulo Entailment: Global_State % Prime == 0 |
O(n) modulo on n-bit integer |
| Delete a Fact | Integer Division: Global_State // Prime |
O(n) division on n-bit integer |
| Update a Fact | lcm(State // Old_Prime, New_Prime) |
O(n²) via GCD |
| GraphRAG Traversal | math.gcd(Global, Node_Integer) |
O(n²) via GCD |
Complexity note: All complexities are in integer bit length (n), which grows with each axiom. Operations avoid corpus-scale document scanning. At 10,000 axioms the state integer is ~640K bits; GCD/LCM complete in milliseconds on modern hardware. See
docs/PROOF_BOUNDARY.mdfor empirical benchmarks.
LLM-generated text is mathematically verified against the Gödel state. If the modulo check fails, math.gcd isolates the exact hallucinated primes. The LLM is forced into a self-correcting loop with strict negative constraints until mathematical fidelity is achieved.
Maps the absolute certainty of discrete Gödel Primes to the fuzzy continuous space of Vector Embeddings. Semantic search (cosine_similarity) is mathematically filtered by State % Prime == 0, meaning deleted facts vanish instantly from search results.
Horizon III — Universal Vector Alignment: Supports optional O(1) affine transformation matrices (W*, b*) enabling heterogeneous P2P nodes (Llama, Qwen, Mistral, etc.) to perfectly align their latent geometries into a single Canonical Geometry before discrete prime extraction.
An append-only SQLite event log (MINT, MUL, DIV, SYNC, DEDUCED). Provides crash recovery — the RAM-based Gödel BigInt can be perfectly reconstructed by replaying the mathematical trace. A SHA-256 Merkle hash-chain makes every event tamper-evident: modification, deletion, or injection of events is detectable on boot.
Clients hold a local BigInt state. Syncing over the network requires sending one integer. The server uses math.gcd to identify exactly what the client needs to add or delete. Single-integer network delta, zero JSON diffing.
- Zoom Out: Replace a cluster of 100 micro-primes with 1 Macro-Prime, storing the cluster product as provenance.
- Zoom In: Divide out the Macro-Prime and multiply the provenance back in to decompress the cluster (single division + multiplication).
Standard RAG requires O(N) vector scans. Quantum GraphRAG maintains a "Node Integer" for every entity. Querying a node's multi-hop topological neighborhood is math.gcd(Global_State, Node_Integer) — no corpus scan required. A background daemon autonomously crystallizes dense clusters while the system sleeps.
When mutually exclusive facts are ingested ("Alice lives in NY" vs "London"), the system detects Level 3 Curvature (a topological paradox) and enters Superposition. An LLM Arbiter acts as a logical judge, mathematically collapsing the wave function to a single verified truth, streamed live via Server-Sent Events (SSE). A DeterministicArbiter (SHA-256 lexicographic ordering) provides a zero-dependency fallback requiring no LLM.
Because the state is just an integer, branching a timeline is a simple integer copy. We support Semantic Smart Contracts (Causal Triggers): learning one fact automatically cascades through the integer, deducing logical consequences until equilibrium is reached.
- Time Travel: Rebuild the Akashic Ledger to any historical tick into a parallel branch.
- ZK Semantic Proofs: Prove you know a fact using salted SHA-256 hashes of the quotient, without revealing your state integer.
- Decentralized P2P: A background gossip daemon constantly syncs Gödel Integers with peer nodes, achieving a planetary hive-mind.
The GodelStateAlgebra core is being progressively migrated to Zig via the Strangler Fig Pattern. A compiled libsum_core shared library (.dylib/.so/.dll) exports C-ABI functions for deterministic prime derivation (SHA-256 → Miller-Rabin) at nanosecond speed. Python's ctypes loads the binary transparently — if present, the engine runs at bare-metal speed; if not, it falls back to sympy seamlessly with zero impact on correctness or CI.
| Use Case | Key Capability | Why SUM |
|---|---|---|
| Personal Knowledge Base | /ingest + /ask + epistemic loop |
Answers are mathematically verified against your corpus — State % Prime ≠ 0 catches hallucinations. No vector DB can prove groundedness. |
| Decentralized Knowledge Sync | /sync/state + P2P mesh |
Sync two nodes by sending one integer. GCD computes the exact delta. No JSON diffing, no conflict heuristics. |
| Tamper-Evident Knowledge Transport | canonical_codec + verify.js |
Export signed bundles (HMAC + Ed25519). An independent Node.js verifier confirms integrity — different runtime, same math. |
| Zero-Knowledge Fact Verification | /zk/prove |
Prove you know a fact without revealing your knowledge state. Salted SHA-256 of the quotient — algebraically sufficient. |
| Contradiction Detection & Resolution | Gauge orchestrator + arbiter | Conflicting facts are algebraically detectable (both primes in state). The arbiter collapses to a single truth, streamed live via SSE. |
| Temporal Audit Trail | Akashic Ledger + /time-travel |
Every operation is event-sourced. Rebuild state at any historical tick. Branch timelines, merge via LCM. Full provenance per axiom. |
| LLM-Free Math Ingestion | /ingest/math + Babel harvester |
Raw (S, P, O) triplets → primes. Zero LLM calls, zero API cost. Works entirely offline. |
| Autonomous Deduction | Automated Scientist + causal triggers | Background daemon discovers entailed-but-unminted axioms via transitive closure. Learning one fact cascades logical consequences. |
| Multi-LLM Knowledge Fusion | Vector Bridge + affine alignment | Nodes using different LLMs (Llama, Qwen, Mistral) share a canonical prime layer. The Gödel integer is model-agnostic consensus. |
| Semantic Compression | Fractal crystallization | Replace 100 micro-primes with 1 macro-prime. Decompress on demand. Lossless — single division + multiplication. |
The common thread: Knowledge state is a single integer. Verification is deterministic, sync is minimal, proofs are algebraic, contradictions are structural. None of this is true for vector databases.
SUM has completed its genesis and hardening phases. The road ahead expands into three frontiers:
- Horizon I: Project EXOCORTEX (Continuous Human Digitization): Wiring desktop and browser companion clients to stream daily human life directly into the
/ingestAPI, compressing personal reality into a single, ever-growing Gödel Integer. - Horizon II: The Babel Protocol (Planetary-Scale Truth): Deploying thousands of headless
SUMnodes to ingest Wikipedia, ArXiv, and global news—forcing the Epistemic Arbiter to collapse historical and scientific paradoxes into a Single Master Integer (Planetary Truth). - Horizon III: Bare-Metal Supremacy (Zig → WASM → Quantum): The Strangler Fig migration progressively replaces Python's math engine with Zig's zero-cost C-ABI. Deterministic primes, LCM/GCD, mod, divisibility, and batch minting are all live in Zig. First-class WebAssembly compilation enables the math core to run natively inside a browser tab via
sum_wasm.js. The Automated Scientist daemon continuously sweeps the global state for logically entailed but unminted axioms, performing autonomous deduction via transitive closure. Future quantum-resistant considerations: if large-scale quantum computers become practical, the prime factoring assumption underlying Gödel integers would need to be revisited with lattice-based or hash-based alternatives.
git clone https://github.com/OtotaO/SUM.git
cd SUM
pip install -r requirements-prod.txt
# Run the verification suite
python -m pytest Tests/ -v
# Run the 21-check Fortress gate
python scripts/verify_fortress.py --json
# Launch the Quantum UI & OS
export OPENAI_API_KEY="sk-..."
uvicorn quantum_main:app --reload --port 8000Open http://localhost:8000 to access the Dashboard of Truth (Vis-Network Graph, BigInt State Mirror, Live Telemetry).
brew install zig # or your platform's package manager
cd core-zig && zig build -Doptimize=ReleaseFast && cd ..
# Engine will print: ⚡ BARE-METAL ZIG CORE ENGAGED ⚡All endpoints are mounted under /api/v1/quantum/. Every endpoint accepts an optional ?branch=main parameter.
Deprecation note: The
/api/v1/quantum/prefix is legacy naming retained for backward compatibility; a neutral alias may be introduced in a future major release.
| Method | Endpoint | Description |
|---|---|---|
GET |
/state |
Returns the current Gödel integer |
POST |
/sync |
Network delta sync via GCD |
POST |
/ingest |
Tomes → Tags (Text to Math pipeline) |
POST |
/ingest/math |
Math-only ingestion (no LLM required) |
POST |
/extrapolate |
Tags → Tomes (Verified generation) |
POST |
/query |
GraphRAG retrieval |
POST |
/search |
Fuzzy semantic search via Vector Bridge |
GET |
/telemetry |
SSE Stream of internal monologues & wave collapses |
POST |
/branch |
Semantic Branching (fork state) |
POST |
/merge |
Semantic Merge via LCM |
POST |
/time-travel |
Rebuilds the universe at a historical tick |
POST |
/zk/prove |
Generate a Zero-Knowledge Semantic Proof |
POST |
/peers |
Add a node to the P2P Holographic Mesh |
POST |
/sync/state |
O(1) LCM merge from browser/peer |
GET |
/discoveries |
Machine-deduced knowledge |
POST |
/ask |
Natural-language knowledge retrieval |
GET |
/provenance/{axiom_key} |
Axiom provenance chain query |
SUM/
├── internal/
│ ├── algorithms/
│ │ ├── semantic_arithmetic.py # GodelStateAlgebra, SPNT, Fractal Zoom
│ │ ├── syntactic_sieve.py # DeterministicSieve — spaCy NLP → (S,P,O) triplets
│ │ ├── zk_semantics.py # Zero-Knowledge Entailment Proofs
│ │ ├── predicate_canon.py # PredicateCanonicalizer — synonym normalization
│ │ └── causal_discovery.py # Topological inference via transitive closure
│ ├── ensemble/
│ │ ├── epistemic_loop.py # Hallucination-proof generation loop
│ │ ├── vector_bridge.py # Continuous↔Discrete bridge + Affine Alignment
│ │ ├── autonomous_agent.py # Subconscious Crystallizer daemon
│ │ ├── epistemic_arbiter.py # Wave Function Collapse (LLM + Deterministic)
│ │ ├── gauge_orchestrator.py # L1/L2/L3 Curvature Commutativity Hierarchy
│ │ ├── causal_triggers.py # Semantic Smart Contracts (deductive cascades)
│ │ ├── ouroboros.py # OuroborosVerifier — round-trip conservation proof
│ │ ├── tome_generator.py # AutoregressiveTomeGenerator — canonical tomes
│ │ ├── confidence_calibrator.py # Multi-signal confidence scoring
│ │ ├── mass_semantic_engine.py # Mass semantic operations engine
│ │ ├── semantic_dedup.py # Predicate-aware deduplication
│ │ ├── live_llm_adapter.py # OpenAI structured outputs adapter
│ │ ├── extraction_validator.py # Structural gate: rejects malformed triplets
│ │ └── automated_scientist.py # Autonomous deduction daemon (15s cycles)
│ └── infrastructure/
│ ├── akashic_ledger.py # Event-sourced crash recovery, Time Travel, Merkle Chain
│ ├── p2p_mesh.py # Decentralized Gossip Protocol
│ ├── canonical_codec.py # Signed bundle transport (HMAC + Ed25519)
│ ├── key_manager.py # Ed25519 keypair lifecycle management
│ ├── rate_limiter.py # Sliding window per-IP rate limiter
│ ├── resource_guards.py # Payload-size guards (fail-closed, HTTP 413)
│ ├── scheme_registry.py # Prime scheme versioning & compatibility
│ ├── state_encoding.py # Hex/decimal state encoding utilities
│ ├── zig_bridge.py # Horizon III: Zig C-ABI FFI bridge (ctypes)
│ └── telemetry.py # @trace_zig_ffi observability decorator
├── core-zig/
│ ├── build.zig # Dual targets: native + WASM
│ └── src/
│ └── main.zig # Bare-metal primes, LCM, GCD, batch mint (C-ABI)
├── api/
│ └── quantum_router.py # FastAPI routing and GlobalKnowledgeOS
├── quantum_main.py # ASGI entrypoint (FastAPI + lifespan boot)
├── standalone_verifier/
│ └── verify.js # Independent Node.js semantic witness
├── scripts/
│ ├── sum_cli.py # CLI tool: ingest, ask, export, diff, status, provenance
│ ├── verify_fortress.py # 21-check CI verification gate
│ ├── launch_swarm.sh # Local P2P swarm launcher
│ ├── babel_harvester.py # RSS→math ingestion for Babel Protocol
│ └── ignite_mesh.py # P2P mesh ignition script
├── docs/
│ ├── CANONICAL_ABI_SPEC.md # Normative protocol specification
│ ├── PROOF_BOUNDARY.md # What is proven vs aspirational
│ ├── THREAT_MODEL.md # Security analysis and attack surfaces
│ └── COMPATIBILITY_POLICY.md # Version semantics and guarantees
├── static/
│ ├── quantum.html # Quantum UI (Sovereign Edge, Telemetry HUD)
│ └── js/
│ ├── godel_client.js # Browser-side Gödel sync client
│ └── sum_wasm.js # WASM BigInt API (offline math engine)
├── experiments.tsv # Autoresearch experiment ledger
└── Tests/
├── fixtures/ # Frozen golden reference vectors
├── benchmarks/ # Golden corpus & scoring harness
└── test_*.py # Verification Tests
Current test count: Run
python -m pytest Tests/ -qfor the live count. The fortress gate ispython scripts/verify_fortress.py --json.
─── Core Hardening ───
ZK Semantic Proofs .................... 16 ✓ (round-trip, tamper, non-linkability, stress)
Akashic Ledger Replay ................. 8 ✓ (crash recovery, time-travel, DIV)
Causal Cascade Verification ........... 6 ✓ (multi-hop, cycle termination, idempotency)
Gauge Orchestrator .................... 10 ✓ (L1/L2/L3 detection, merge, arbitration)
Extraction Adversarial ................ 15 ✓ (HTML/SQL injection, Unicode, 10K-word stress)
Deterministic Arbiter ................. 7 ✓ (SHA-256 lexicographic, no LLM dependency)
Rate Limiter .......................... 8 ✓ (sliding window, per-IP, burst protection)
Cross-Instance & Stability ............ 24 ✓ (collision, tome ordering, timestamp, version)
─── Architectural Hardening ───
Phase 0 — Durability Contract ......... 8 ✓ (crash recovery, branch isolation, boot)
Phase 0.1 — Durability Integrity ...... 6 ✓ (branch rebuild, import materialization, gossip)
Phase 19A — Extraction Validator ...... 25 ✓ (structural gate, canonicalization, dedup)
Phase 19C — Merkle Hash-Chain ......... 16 ✓ (tamper, deletion, injection detection)
Phase 19D — Active Prime Index ........ 30 ✓ (lifecycle, queries, consistency, coherence invariant)
─── Phase Tests ───
Phase 17b — BigInt Zig C-ABI .......... 22 ✓ (LCM, GCD, mod, divisibility, consistency)
Phase 17 — Horizon III ................ 15 ✓ (affine alignment, Zig FFI, Strangler Fig)
Phase 16 — Independent Witness ........ 21 ✓ (cross-runtime verification, frozen vectors)
Phase 15 — Canonical Semantic ABI ..... 22 ✓ (versioning, bundles, JWT, multi-hop)
Phase 14 — Ouroboros Round-Trip ....... 16 ✓ (encode/decode conservation)
Phase 13 — JWT Multi-Tenancy .......... 12 ✓ (token gen, isolation, branch safety)
Ed25519 Attestation ................... 11 ✓ (dual-sig, tamper, compat, key mgmt)
Witness Matrix Hardening .............. 7 ✓ (frozen vectors, cross-runtime)
Property & Adversarial Tests .......... 46 ✓ (algebra invariants, bundle hardening)
Phase 10 — Chronos & Mesh ............. 17 ✓
Phase 9 — Multiverse & Causal Triggers 21 ✓
Phase 8 — Wave Function Collapse ...... 8 ✓
Phase 7 — GraphRAG + Daemon ........... 6 ✓
Phase 6 — Fractal Crystallization ..... 5 ✓
Phase 4 — Akashic Ledger .............. 3 ✓
Phase 3 — Temporal CRUD + Vectors ..... 12 ✓
Phase 2 — Epistemic Feedback Loop ..... 7 ✓
Phase 1 — SPNT + Gödel Algebra ........ 15 ✓
Phase 21 — Knowledge Retrieval ........ 15 ✓ (/ask endpoint, predicate canonicalization)
Phase 22 — Provenance + Confidence .... 14 ✓ (source tracking, confidence×recency weighting)
Phase 23 — CLI Tool ................... 16 ✓ (ingest, ask, export, diff, status, provenance)
Phase 24 — Confidence Calibration ..... 23 ✓ (source-type, redundancy, contradiction penalty)
Phase 25 — Semantic Deduplication ..... 22 ✓ (predicate synonyms, Jaccard+Levenshtein, API dedup)
Stage 1 — Dual-Format Transport ....... 18 ✓ (hex companion fields, parse_state, P2P hex)
Stage 2 — Scheme Versioning ........... 27 ✓ (scheme registry, protocol enforcement, hex cross-check)
Stage 3A — 128-Bit Shadow ............. 28 ✓ (v2 reference vectors, Python↔Node parity, BPSW, collision policy)
Stage 4 — Evidence Enrichment ......... 12 ✓ (hedging detection, certainty calibration, annotated triplets)
Stage 5 — Resource Guards ............. 17 ✓ (payload limits, HTTP 413, operator-readable errors)
Stage 6-7 — Perfection Criteria ....... 13 ✓ (v1 stable, v2 shadow, fail-closed, evidence, limits, vectors)
Final Integration — Operational ....... 20 ✓ (guards in handlers, evidence in /ingest, Node v2 witness, env-flag activation)
Honest status notes:
- Zig v2 parity: ✅ confirmed — Zig 0.15.2,
zig build testpasses, Python↔Zig v2 primes match on frozen vectors- v2 activation: gated behind
SUM_PRIME_SCHEME=sha256_128_v2env var; default is v1- Extraction gating (19A): structural validator rejects malformed/duplicate triplets before algebra ingestion
- Golden benchmark (19B): 50 annotated documents, 100 gold-standard triplets, 7 adversarial categories
- Merkle chain (19C): SHA-256 hash-chain on event log; 3-tier enforcement policy (
warn/degraded/strictviaSUM_MERKLE_POLICY)- Active Prime Index (19D): O(1) active-prime lookup per branch; coherence-validated across 13 mutation paths. Hot-path callers not yet migrated — adoption is profiling-driven
- Evidence enrichment: affects
/ingest(LLM path) and/ingest/math(direct path); other ingestion surfaces not yet covered- Linguistic certainty: document-level (coarse-grained) in
/ingest;/ingest/mathdefaults to 1.0 (definitional)
| Vector | Status |
|---|---|
| Bundle tampering | ✅ HMAC-SHA256 + Ed25519 |
| State/tome mismatch | ✅ Witness verification |
| Version mismatch | ✅ Version gate |
| Malformed bundles | ✅ Field validation |
| Public authenticity | ✅ Ed25519 (self-asserted) |
| Key compromise | ✅ Rotation + archive |
| Adversarial extraction | |
| Collision replay | ✅ 1000-axiom cross-instance test |
| Contradiction governance | ✅ DeterministicArbiter (SHA-256) |
| Resource exhaustion | ✅ Bundle limits + rate limiter |
| Ledger tampering | ✅ Merkle hash-chain (19C) |
OPENAI_API_KEY=sk-... # Required for LLM features (ingest/extrapolate/search/arbitrate)
# Math-only CRUD, GraphRAG, and P2P Sync work without it- Fork it
- Create your feature branch (
git checkout -b feature/holographic-expansion) - Run the test suite (
python -m pytest Tests/ -v) - Run the fortress gate (
python scripts/verify_fortress.py --json) - Commit your changes
- Push and open a Pull Request
Apache 2.0 — Built for the future of Man and Machine.
SUM — Canonical Semantic Compression. Mechanically Verified.