Skip to content

feat: architecture modernization with Python 3.14 support and dependency upgrades#6

Merged
jlevy merged 36 commits intomainfrom
claude/review-branch-tiers-icTv8
Feb 9, 2026
Merged

feat: architecture modernization with Python 3.14 support and dependency upgrades#6
jlevy merged 36 commits intomainfrom
claude/review-branch-tiers-icTv8

Conversation

@jlevy
Copy link
Owner

@jlevy jlevy commented Feb 8, 2026

Summary

Comprehensive architecture modernization of kash based on a code review and planning spec. Includes dependency upgrades from PR #5, Python 3.14 CI support, and all Tier 1 beads plus partial Tier 2/3 work.

Scope: 244 files changed, 5,266 additions, 140 deletions. 301 tests (up from ~135 on main).

Key changes

  • Standalone library API (kash.run): kash_init() + kash_run() for using kash programmatically without the interactive shell
  • Clean public API surface (kash/__init__.py): __all__ with 28 canonical exports; all deep import paths preserved
  • Lazy action/command registration: ensure_actions_registered() / ensure_commands_registered() replace module-level side effects
  • Dependency upgrades: openai 1.99.9 → 2.17.0, litellm → 1.81.9, all transitive deps upgraded
  • Dependency cleanup: Removed 6 unused deps; added optional groups [llm], [web], [media], [server], [shell], [all]
  • Python modernization: from __future__ import annotations (184 files), @override (34 methods), named constants, dead code removal
  • CI: Added Python 3.14 to matrix (3.11–3.14 all green), uv 0.9.5 → 0.9.26
  • Test infrastructure: 21 new test files, conftest.py with shared fixtures, 301 total tests

Validation Plan

Part 1: Automated — Verified by CI (all passing)

CI runs on Python 3.11, 3.12, 3.13, 3.14 — all 4 builds green.

Check Tool Status Notes
301 tests pass uv run pytest 166 new tests added, 0 regressions
0 type errors basedpyright --stats Strict mode, all 305 files checked
0 lint errors ruff check + ruff format 43 auto-fixed issues committed
0 spelling errors codespell Added hel to ignore (test data)
Python 3.14 compat CI matrix Pydantic 2.12.5 works with 3.14.3
Dependency resolution uv lock --upgrade All 148 packages resolve cleanly
No circular imports Pytest collection All 301 tests collect without errors

Test coverage breakdown:

Area File(s) Count What's tested
Standalone runner tests/test_run.py 15 kash_init(), kash_run(), input building, workspace resolution, params
Action execution pipeline tests/exec/test_action_exec_integration.py 12 Full kash_run flow with mocked actions, param validation, save/no-save
Action registry tests/exec/test_action_registry.py 5 Registration, lookup, caching, lazy init
Arg resolution tests/exec/test_resolve_args.py 8 Input resolution, glob patterns, workspace lookups
Precondition checks tests/exec/test_precondition_checks.py 6 Validation pass/fail, multi-precondition
Model: items tests/model/test_items.py 6 Item creation, serialization, type coercion
Model: params tests/model/test_params.py 10 Param parsing, validation, common params, raw → typed
Model: preconditions tests/model/test_preconditions.py 7 Precondition model, evaluation
LLM completion tests/llm_utils/test_llm_completion.py 6 Mocked litellm.completion, response parsing
Fuzzy parsing tests/llm_utils/test_fuzzy_parsing.py 7 JSON/YAML extraction from LLM output
Sliding transforms tests/llm_utils/test_sliding_transforms.py 4 Chunked text transforms
Item ID index tests/file_storage/test_item_id_index.py 5 URL → ItemId lookup, persistence
Completion scoring tests/shell/test_completion_scoring.py 7 Prefix scoring, normalization, recency decay
Web fetch tests/web_content/test_web_fetch_integration.py 8 file:// download, HTTP headers, content type
Local file cache tests/web_content/test_local_file_cache_integration.py 8 Cache lifecycle, expiration, ALWAYS/NEVER
Param state tests/workspaces/test_param_state.py 4 Workspace param persistence
Selections tests/workspaces/test_selections.py 14 Selection add/remove/clear, persistence
+ 148 inline tests src/**/*.py 148 Existing tests in source files (unchanged)

Part 2: Manual Verification Required — Human Review Checklist

These items cannot be fully validated by automated tests and require a human to confirm.

2a. Backward Compatibility (HIGH PRIORITY)

  • Deep imports still work: Verify existing code using deep import paths is unbroken:
    from kash.model.items_model import Item          # should still work
    from kash.exec.action_decorators import kash_action  # should still work
    from kash import Item                             # new convenience import
  • Downstream packages: If kash-docs or kash-media exist as separate packages, verify they still import from kash without errors
  • Shell startup: Run kash interactively and verify:
    • Shell prompt appears
    • Tab completion works
    • help command works
    • Actions are discoverable (lazy registration triggers on first use)
  • Kit loading: If any kits are installed, verify they still load and their actions are available

2b. OpenAI 2.x + LiteLLM Compatibility (HIGH PRIORITY)

  • LLM action smoke test: Run an actual LLM-backed action (e.g. summarize_as_bullets) with a real API key and verify:
    • API call succeeds (no import/type errors from openai 2.x)
    • Response format is correct
    • Streaming works (if used)
  • Embedding model: If embeddings are used, verify text-embedding-3-large still works with openai 2.x

2c. Standalone Library API (MEDIUM PRIORITY)

  • Real action execution: Test kash_run() outside the shell with a real (non-mocked) action:
    from kash import kash_init, kash_run
    kash_init()
    result = kash_run("strip_html", ["https://example.com"])
    print(result)  # should have items with body content
  • Workspace behavior: Verify results save correctly to the specified workspace directory
  • Error handling: Verify meaningful errors for invalid action names or bad inputs

2d. Optional Dependency Groups (MEDIUM PRIORITY)

  • Minimal install: pip install kash-shell (or uv pip install .) — verify core imports work without optional extras
  • Group installs: Test each group individually:
    • pip install kash-shell[llm] — LLM actions available
    • pip install kash-shell[web] — Web scraping works
    • pip install kash-shell[all] — Everything works

2e. Lazy Registration Behavior (LOW PRIORITY)

  • Startup time: Compare time kash --version before and after this PR — should be equal or faster
  • First action call: Verify no perceptible delay on first action execution (registration happens transparently)
  • Idempotency: Calling ensure_actions_registered() multiple times doesn't re-import or duplicate

2f. Documentation Accuracy (LOW PRIORITY)

  • Architecture spec: Skim the planning spec — verify implementation matches what's described
  • Spec checkboxes: Verify the [x] marks in the spec match actual bead status

Part 3: Known Issues & Risk Areas

Risk Severity Mitigation
OpenAI 2.x breaking changes High litellm 1.81.9 wraps the API; but real LLM test recommended
from __future__ import annotations + dataclass round-trip Low Already hit and fixed in tabbed_webpage.py; other files don't do YAML round-trips with nested dataclasses
Lazy registration race condition in concurrent use Low Registration is guarded by a module-level bool; safe for single-process use
Optional dep groups might confuse users Low All deps still in default install; groups are additive for opt-in minimal installs

Follow-Up Work (Post-Merge)

The remaining 5 open beads are all Tier 2/3 features that build on this PR's foundation. They should be tackled in a follow-up PR after this one merges.

Tier 2: Remaining Additive Features (4 beads)

These add new capabilities without breaking existing imports. They build on the lazy registration and kash.run() foundation from this PR.

1. ShellContext Protocol (kash-0vb5) — P2, blocks kash-5ew2

Goal: Decouple command execution from xonsh via a ShellContext protocol.

Design:

class ShellContext(Protocol):
    def set_env(self, name: str, value: Any) -> None: ...
    def print_output(self, content: str) -> None: ...
    def get_workspace(self) -> Workspace: ...
    def record_history(self, command: str) -> None: ...

Implementations needed:

  • XonshShellContext — wraps current xonsh environment (preserves existing behavior)
  • StandaloneContext — for library/script use (builds on kash.run)
  • CLIContext — for standalone CLI commands

Why deferred: Design needs careful thought to avoid leaking xonsh internals. The protocol must be stable since downstream code will depend on it.

2. Standalone CLI for Actions (kash-qejq) — P2

Goal: Enable kash run <action> --input ... --format json without starting the interactive shell.

Design:

  • Typer or argparse-based CLI entry point
  • Uses kash_run() from this PR as the execution backend
  • Supports --format text|json, --non-interactive, --no-progress, --dry-run
  • Exit codes: 0=success, 1=error, 2=validation, 130=SIGINT

Why deferred: Depends on ShellContext (kash-0vb5) for clean output routing. Could be started independently with a simpler approach.

3. --format json and --non-interactive Flags (kash-5ew2) — P2, blocked by kash-0vb5

Goal: Machine-readable output for agent/CI consumption.

Design:

  • OutputManager pattern for dual text/JSON output
  • Route data to stdout, errors/warnings to stderr
  • Disable spinners/progress in non-TTY contexts
  • Respect CI and NO_COLOR env vars

Why deferred: Blocked by ShellContext protocol (kash-0vb5) — needs the abstraction to route output correctly.

4. KashTestRunner (kash-j8st) — P2

Goal: First-class testing support for running kash actions in pytest.

Design:

from kash.testing import KashTestRunner

runner = KashTestRunner(workspace=tmp_path)
result = runner.run_action("strip_html", item_with_html)
assert "clean text" in result.body

# Pipeline testing
result = runner.run_pipeline([
    ("fetch_url", {"url": "https://example.com"}),
    ("strip_html", {}),
    ("summarize_as_bullets", {"model": "gpt-5"}),
])

Requirements: No console output during test runs, deterministic behavior (mock LLM), easy workspace setup/teardown, assert-friendly return types.

Why deferred: Builds on kash.run() from this PR but needs more design work around mock injection and pipeline composition.

Tier 3: Breaking Changes (1 bead)

5. Lightweight MCP Mode (kash-s9q0) — P3

Goal: Expose individual actions as MCP tools without loading the entire kash registry.

Current problem: The MCP server requires full kash initialization including all registries, workspace system, and configuration. Cannot expose a single action as an MCP tool without loading everything.

Design considerations:

  • Selective action loading (only import what's needed)
  • Direct Claude Code skill integration (without running MCP server)
  • Skill template generation from @kash_action metadata

Why deferred: This is the only Tier 3 (potentially breaking) item remaining. It may change how action discovery works internally. Benefits from the lazy registration work in this PR (kash-y80s) as a foundation.

Prerequisite evaluations (completed in this PR):

  • kash-zt0y: Utils audit → no reorganization needed
  • kash-law6: web_content extraction → NOT ready (5/10 independence)
  • kash-78wq: llm_utils extraction → partially feasible but premature

Recommended Order

kash-0vb5 (ShellContext)
    ↓
kash-qejq (Standalone CLI)  +  kash-j8st (KashTestRunner)
    ↓
kash-5ew2 (--format json flags)
    ↓
kash-s9q0 (Lightweight MCP) — can be done independently

Beads

Closed (19): kash-keim, kash-ermq, kash-ppz9, kash-jlmn, kash-aik9, kash-gpj8, kash-69rw, kash-kngd, kash-hv4e, kash-ht5b, kash-gzzs, kash-91re, kash-ymq4, kash-y80s, kash-i7dr, kash-8hh2, kash-zt0y, kash-law6, kash-78wq

Open (5): kash-0vb5, kash-qejq, kash-5ew2, kash-j8st, kash-s9q0

Subsumes: PR #5 (dependency upgrades, CODE_REVIEW doc, CI uv update). PR #5 can be closed.

https://claude.ai/code/session_01Pv152sQEpeLvwDoJTdz9Fm

claude added 25 commits February 6, 2026 18:17
Comprehensive senior engineering review of the kash codebase covering
architecture, code quality, Python conformance, and reusability.
Includes 23 tracked beads across 5 phases of improvements.

https://claude.ai/code/session_01Th8ndDSiErVpyHhGLuMFGH
…view

Adds Part 5 (testability by subsystem with priority matrix) and Part 6
(guidelines conformance review) to the architecture spec. Detailed
breakdown of easy/medium/hard testing per component, conftest.py design,
and estimated ~300 new tests across 23 files in 3 phases.

https://claude.ai/code/session_01Th8ndDSiErVpyHhGLuMFGH
…o spec

Expands Testing Strategy section with specific rules from general-testing-rules,
general-tdd-guidelines, and golden-testing-guidelines. Updates Part 6 with
detailed conformance analysis against all applicable tbd guidelines (python-rules,
python-modern-guidelines, python-cli-patterns, error-handling-rules, etc.).
Adds structured References section.

https://claude.ai/code/session_01Th8ndDSiErVpyHhGLuMFGH
…iers

Analyzed kash-docs (393 imports, 150+ symbols) and kash-media (80+ symbols)
dependency surfaces. Restructured Implementation Plan into 3 tiers:
- Tier 1: Fully compatible (tests, modernization, reviews) - 12 beads
- Tier 2: Additive changes (new APIs, old paths preserved) - 8 beads
- Tier 3: Breaking changes (require downstream updates) - 4 beads

All beads labeled with tier. Critical API surface documented.

https://claude.ai/code/session_01Th8ndDSiErVpyHhGLuMFGH
Add detailed body descriptions to 7 beads that had empty bodies:
kash-8hh2, kash-0vb5, kash-j8st, kash-hv4e, kash-law6, kash-78wq, kash-s9q0.
Fix Part 6/7 ordering in the spec document.

https://claude.ai/code/session_01Pv152sQEpeLvwDoJTdz9Fm
Create test infrastructure with fixtures for temp_workspace, mock_current_ws,
mock_llm, and sample_item. Register pytest markers for slow, online, and golden
test categories.

https://claude.ai/code/session_01Pv152sQEpeLvwDoJTdz9Fm
…-ermq)

New test files:
- tests/model/test_preconditions.py: Boolean algebra (&, |, ~, and_all, or_all)
- tests/model/test_params.py: Validation, json_schema, properties
- tests/model/test_items.py: ItemType properties, content_equals, serialization
- tests/llm_utils/test_fuzzy_parsing.py: Edge cases for parsing functions
- tests/shell/test_completion_scoring.py: Scoring, normalization, recency decay

50 new tests, all pass in <0.4s.

https://claude.ai/code/session_01Pv152sQEpeLvwDoJTdz9Fm
The exact pin existed due to a litellm/openai compatibility issue with
ResponseTextConfig imports. The import works fine with 1.99.x. Pin to
1.x range to allow patch updates while avoiding openai 2.x breaking changes.

https://claude.ai/code/session_01Pv152sQEpeLvwDoJTdz9Fm
…oring (kash-69rw)

Replace 15+ magic numbers with named constants: MAX_SCORE, MIN_CUTOFF,
SHORT_PREFIX_SCORE, PREFIX_BASE_SCORE, DESCRIPTION_BONUS, PATH_PHRASE_WEIGHT,
PATH_RECENCY_WEIGHT, COMMENT_SCORE_WEIGHT, RECENCY_DECAY_CONSTANT, etc.
Each constant has a docstring explaining its purpose.

https://claude.ai/code/session_01Pv152sQEpeLvwDoJTdz9Fm
…cks (kash-kngd)

- Remove commented-out lazyasd background import code in lazy_imports.py
- Remove `if __name__ == "__main__"` testing blocks from colors.py,
  version.py, and web_extract.py (per python-rules: no __main__ for testing)

https://claude.ai/code/session_01Pv152sQEpeLvwDoJTdz9Fm
…kash-aik9)

Per python-rules and python-modern-guidelines: add future annotations
to all Python files with type hints. 184 files modified (182 in src/kash,
2 in tests). Skipped trivial __init__.py files and files without type
annotations.

https://claude.ai/code/session_01Pv152sQEpeLvwDoJTdz9Fm
…(kash-gpj8)

Per python-rules: ALWAYS use @OverRide decorators on overridden methods.
Added @OverRide to ABC implementations (MediaService, Action, HelpDoc),
framework overrides (Path, RichHandler, Protocol), and key dunder methods.
Uses typing_extensions.override for Python 3.11 compatibility.

https://claude.ai/code/session_01Pv152sQEpeLvwDoJTdz9Fm
Outbox bead files cleared by `tbd import --outbox --clear-on-success`.
uv.lock updated to reflect openai version pin change (>=1.99,<2.0).

https://claude.ai/code/session_01Pv152sQEpeLvwDoJTdz9Fm
tbd sync to remote tbd-sync branch failed with 403. Saving outbox
to feature branch so bead data is preserved. Run `tbd sync` when
push access is available to restore.

https://claude.ai/code/session_01Pv152sQEpeLvwDoJTdz9Fm
…n_checks, selections, param_state, item_id_index, llm_completion, sliding_transforms (kash-ppz9)

75 new tests across 8 test files covering:
- action_registry: registration, duplicate warning, lookup, not-found error
- resolve_args: URL/path/StorePath resolution, selection fallback
- precondition_checks: action matching, item filtering, max_results, error skipping
- selections: Selection state, SelectionHistory push/pop/nav/persistence/truncation
- param_state: set/get, persistence across instances, missing file handling
- item_id_index: indexing, duplicate detection, unindexing, load error handling
- llm_completion: CitationList, LLMCompletionResult, mocked API calls, template completion
- sliding_transforms: filtered transform, paragraph windowing, validation errors

https://claude.ai/code/session_01Pv152sQEpeLvwDoJTdz9Fm
…rts (kash-ymq4)

- Add __all__ to kash/llm_utils/__init__.py (11 symbols)
- Add canonical top-level imports to kash/__init__.py with __all__ (28 symbols)
  covering core model types, execution decorators, LLM utilities, and errors
- All existing deep import paths remain valid (purely additive)

https://claude.ai/code/session_01Pv152sQEpeLvwDoJTdz9Fm
- actions/__init__.py: Replace module-level import_and_register() and
  load_kits() with ensure_actions_registered() (idempotent, lazy)
- commands/__init__.py: Same pattern with ensure_commands_registered()
- action_registry.py: get_all_action_classes() calls ensure_actions_registered()
  instead of `import kash.actions`
- command_registry.py: register_all_commands() calls
  ensure_commands_registered() instead of `import kash.commands`

`import kash.actions` no longer triggers registration as a side effect.
Registration happens lazily on first call to get_all_action_classes() or
get_all_commands(). import_and_register() used by downstream kits is unchanged.

https://claude.ai/code/session_01Pv152sQEpeLvwDoJTdz9Fm
…-i7dr)

Remove confirmed-unused packages: tenacity, tiktoken, selectolax,
pluralizer, rich-argparse. Each was verified absent from imports in
both kash and downstream packages (kash-docs, kash-media).

Add optional dependency groups [llm], [web], [media], [server], [shell]
and [all] for users who want minimal installs. All current deps remain
in the default install for backward compatibility.

https://claude.ai/code/session_01Pv152sQEpeLvwDoJTdz9Fm
…hh2)

New kash.run module provides kash_init() and kash_run() to execute any
kash action as a Python library call without the interactive shell or
xonsh dependency.

kash_run() takes an action name, optional inputs (URLs, file paths, or
Item objects), optional params, and returns a structured ActionResult.
Auto-initializes on first call. Uses temporary workspace if none provided.

Includes 13 tests covering input building (URLs, files, Items, mixed),
workspace resolution, and full pipeline with mocked actions.

Also exports kash_init and kash_run from the root kash package for
convenient `from kash import kash_run` usage.

https://claude.ai/code/session_01Pv152sQEpeLvwDoJTdz9Fm
…local_file_cache (kash-jlmn)

Integration tests covering:
- action_exec pipeline via kash_run(): URL/file/Item inputs, multiple
  inputs, params, not-found errors, no-input actions (7 tests)
- web_fetch: file:// URL downloads, HttpHeaders MIME type parsing (8 tests)
- local_file_cache: Loadable caching, TTL expiration, NEVER expiration,
  is_cached, local file caching, mtime reading (9 tests)
- kash.run module: input building, workspace resolution, full pipeline (13 tests)

Fix RawParamValues construction in kash.run (use values= kwarg).

Total: 36 new tests (210 total across project).

https://claude.ai/code/session_01Pv152sQEpeLvwDoJTdz9Fm
Mark completed beads in spec: all 12 Tier 1 items, 4/8 Tier 2 items,
3/4 Tier 3 evaluations. Add evaluation results for utils audit
(kash-zt0y), web_content extraction (kash-law6), and llm_utils
extraction (kash-78wq). Update spec status to In Progress.

https://claude.ai/code/session_01Pv152sQEpeLvwDoJTdz9Fm
Fix 32 basedpyright errors:
- kash.run: use proper raw-to-typed param conversion via parse_all()
- paths_model: remove @OverRide from __truediv__ (no base method)
- test_preconditions: add type ignore for dynamic Item construction
- test_item_id_index: wrap url strings in Url(), narrow ItemId | None
- test_llm_completion: prefix unused mock params with underscore
- test_action_registry: add type ignore for mock return
- test_web_fetch: use Url() NewType, fix MimeType None checks
- test_local_file_cache: use named function instead of lambda for save

https://claude.ai/code/session_01Pv152sQEpeLvwDoJTdz9Fm
All 301 tests pass on Python 3.14.3. The pydantic incompatibility was
specific to the 3.14.0 release candidate and is resolved in the final
release.

https://claude.ai/code/session_01Pv152sQEpeLvwDoJTdz9Fm
@jlevy jlevy changed the title feat: architecture modernization — Tier 1 complete, Tier 2/3 partial feat: architecture modernization with Python 3.14 support and dependency upgrades Feb 9, 2026
Merge unique content from docs/CODE_REVIEW_2026-01.md (dependency upgrade
status, TODO/FIXME catalog, creative improvement ideas, best practices
alignment) into the architecture spec as appendices. Remove the standalone
file to follow spec conventions.

https://claude.ai/code/session_01Pv152sQEpeLvwDoJTdz9Fm
@jlevy jlevy merged commit ed89d0d into main Feb 9, 2026
4 checks passed
@jlevy jlevy deleted the claude/review-branch-tiers-icTv8 branch February 9, 2026 05:50
jlevy pushed a commit that referenced this pull request Feb 9, 2026
Detailed instructions for verifying kash-docs against updated kash-shell
main branch (PR #6) and applying the same code quality cleanups:
Python 3.14 support, ruff/codespell/basedpyright fixes, future annotations,
@OverRide decorators, dependency upgrades (openai 2.x, litellm 1.81).

https://claude.ai/code/session_01Pv152sQEpeLvwDoJTdz9Fm
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants