Skip to content

fix: resolve datastore sync path mismatch for remote datastores#977

Merged
stack72 merged 1 commit intomainfrom
fix/datastore-sync-path-mismatch
Mar 31, 2026
Merged

fix: resolve datastore sync path mismatch for remote datastores#977
stack72 merged 1 commit intomainfrom
fix/datastore-sync-path-mismatch

Conversation

@stack72
Copy link
Copy Markdown
Contributor

@stack72 stack72 commented Mar 31, 2026

Summary

Fixes a bug where data written during workflow execution with S3 (or other
remote) datastores was never pushed to the remote, making it invisible to
other machines pulling from the same datastore.

Root Cause

For custom datastores with a local cache (e.g. @swamp/s3-datastore), there
are two path values in CustomDatastoreConfig:

Field S3 provider resolves to Used by
datastorePath {repoDir}/.swamp DefaultDatastorePathResolver — base for all data reads/writes
cachePath ~/.swamp/repos/unknown S3CacheSyncService — directory scanned for push, target for pull

These are different directories. The consequence:

  1. CI writes data to {repoDir}/.swamp/data/... (via the path resolver
    using datastorePath)
  2. Push scans ~/.swamp/repos/unknown/ (via the sync service using
    cachePath) — finds no files — pushes nothing to S3
  3. Local pull downloads from S3 to ~/.swamp/repos/unknown/ — but the
    path resolver reads from {repoDir}/.swamp/ — data is invisible

The S3 provider's resolveCachePath() comment even says "return a sensible
default that core will override"
, but the ?? fallback in config resolution
never triggers because the provider returns a string (not undefined).

The Fix

DefaultDatastorePathResolver now uses cachePath ?? datastorePath as
the base path for custom datastores. This ensures data reads/writes go to
the same directory the sync service pushes from and pulls to. When no
cachePath is configured, it falls back to the original datastorePath
behavior.

Additionally, DatastoreSyncService.pullChanged()/pushChanged() now
return Promise<number | void> instead of Promise<void>, and
createDatastoreSyncDeps passes through real file counts instead of
hardcoding zero. This fixes misleading output like "Pulled 0 files" and
"already up to date" even when files were actually transferred.

Files Changed

  • src/infrastructure/persistence/default_datastore_path_resolver.ts
    Use cachePath ?? datastorePath for remote datastores
  • src/infrastructure/persistence/default_datastore_path_resolver_test.ts
    Add tests for cachePath preference and fallback
  • src/domain/datastore/datastore_sync_service.ts — Return
    Promise<number | void> from sync methods
  • src/libswamp/datastores/sync.ts — Pass through real counts from sync
    service instead of hardcoding 0
  • packages/testing/datastore_types.ts — Match updated interface

Note: The @swamp/s3-datastore extension also needs a matching update
to return actual file counts from pullChanged()/pushChanged(). That
change lives in swamp-extensions/datastore/s3/ (separate repo).

Test Plan

  • All 7734 unit tests pass (2 pre-existing date-sensitive failures in
    update_notification_service_test.ts, unrelated)
  • New tests verify cachePath is preferred over datastorePath for custom
    datastores, and that datastorePath is used as fallback when no
    cachePath exists
  • Manual verification: built binary, ran swamp datastore sync, confirmed
    data from S3 is now visible via swamp data search
  • Full round-trip (CI push → local pull) requires deploying the fixed binary
    to CI and running a workflow

DefaultDatastorePathResolver used datastorePath as the base for all data
reads/writes, while S3CacheSyncService used cachePath for sync operations.
For S3 datastores these resolved to different directories, so pushChanged()
never found files to upload and pullChanged() downloaded to a location the
application never read from.

Also update DatastoreSyncService to return file counts instead of void,
and pass real counts through createDatastoreSyncDeps instead of hardcoding
zero.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
stack72 added a commit to systeminit/swamp-extensions that referenced this pull request Mar 31, 2026
Update `S3CacheSyncService` to return actual file counts from `pullChanged()`
and `pushChanged()` instead of `Promise<void>`. This feeds accurate numbers
through to the CLI output — previously `swamp datastore sync --pull` always
displayed "Pulled 0 files" and the coordinator always logged "already up to
date" regardless of how many files were actually transferred.

Companion to systeminit/swamp#977 which updated the `DatastoreSyncService`
interface to accept `Promise<number | void>`.

- **`extensions/datastores/_lib/s3_cache_sync.ts`** — Return pulled/pushed
counts from `pullChanged()` and `pushChanged()`
- **`extensions/datastores/_lib/interfaces.ts`** — Match updated interface
(`Promise<number | void>`)

- Type-checks pass (`deno check`)
- No behavioral change to sync logic — only the return values are new
Copy link
Copy Markdown

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CLI UX Review

Blocking

None.

Suggestions

None.

Verdict

PASS — This PR is a pure bug fix with no new flags, commands, or help text. The only UX-visible change is that swamp datastore sync now correctly reports the actual number of files pulled/pushed (e.g. Pushed 3 files) instead of always showing Pushed 0 files / Pulled 0 files. Both log-mode and JSON-mode output were already structured correctly in src/presentation/renderers/datastore_sync.ts — the renderer just wasn't receiving real counts before this fix.

Copy link
Copy Markdown

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adversarial Review

Critical / High

None found.

Medium

  1. src/libswamp/datastores/sync.ts:103-105fullSync sequential ordering may hide errors. In fullSync, if pullChanged() succeeds but pushChanged() throws, the pull results are lost and the caller sees only the push error. The errors array is always returned empty. This isn't new to this PR (the sequential pattern existed before), but the PR now surfaces real counts, making it more visible that a partial-success scenario silently discards pull results. Consider wrapping each call in try/catch and populating the errors array, or at minimum documenting that fullSync is not atomic.

  2. src/infrastructure/persistence/default_datastore_path_resolver.ts:59cachePath could be an empty string. cachePath is typed as string | undefined. The ?? operator only triggers on null/undefined, not on "". If a provider ever resolves cachePath to an empty string, datastoreBasePath would become "", and all join() calls would produce relative paths (e.g., "data/foo" instead of "/absolute/data/foo"). This is defensive — I don't see a current provider that does this — but a || datastoreConfig.datastorePath would be more robust than ??.

Low

  1. src/libswamp/datastores/sync.ts:97typeof count === "number" guard. The typeof count === "number" ? count : 0 pattern is correct for the number | void union, but if a provider implementation were to return NaN (which typeof classifies as "number"), the caller would propagate NaN as a file count. Extremely unlikely in practice.

  2. Test at line 150 in default_datastore_path_resolver_test.ts — The new "prefers cachePath" test doesn't verify resolvePath for a non-datastore subdir (to confirm localPath still uses repoDir/.swamp and not cachePath). The existing test at line 24 covers this for filesystem configs, but not for custom configs with a divergent cachePath. Theoretical gap only — the localPath method doesn't reference datastoreBasePath.

Verdict

PASS — The core fix is correct and well-targeted. The cachePath ?? datastorePath change properly aligns the path resolver with the sync service's expectations. The return type widening is backwards-compatible (existing callers already handle void). Tests cover the key cases.

Copy link
Copy Markdown

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

Blocking Issues

None.

Suggestions

  1. DatastoreSyncService return type: Promise<number | void> creates caller-side ambiguity requiring typeof guards. If this interface were internal-only, Promise<number> (returning 0 for no-op) would be cleaner. However, since this is a contract with external extension authors who may already return void, the union is a reasonable backward-compatibility choice — no change needed now, but worth revisiting if the extension API gets a major version bump.

  2. Pre-existing test name mismatch (line 45–57 in the test file): The test "custom datastorePath uses datastorePath" now actually exercises a case where cachePath === datastorePath, so it tests cachePath preference rather than datastorePath fallback. The name was accurate before this PR but is now slightly misleading. Not blocking since the new dedicated tests cover both cases clearly.

LGTM — minimal, well-targeted fix with good test coverage. The path mismatch root cause is clearly explained and the cachePath ?? datastorePath approach is correct.

@stack72 stack72 merged commit fedded3 into main Mar 31, 2026
10 checks passed
@stack72 stack72 deleted the fix/datastore-sync-path-mismatch branch March 31, 2026 01:29
stack72 added a commit to systeminit/swamp-extensions that referenced this pull request Mar 31, 2026
Update `S3CacheSyncService` to return actual file counts from `pullChanged()`
and `pushChanged()` instead of `Promise<void>`. This feeds accurate numbers
through to the CLI output — previously `swamp datastore sync --pull` always
displayed "Pulled 0 files" and the coordinator always logged "already up to
date" regardless of how many files were actually transferred.

Companion to systeminit/swamp#977 which updated the `DatastoreSyncService`
interface to accept `Promise<number | void>`.

- **`extensions/datastores/_lib/s3_cache_sync.ts`** — Return pulled/pushed
counts from `pullChanged()` and `pushChanged()`
- **`extensions/datastores/_lib/interfaces.ts`** — Match updated interface
(`Promise<number | void>`)

- Type-checks pass (`deno check`)
- No behavioral change to sync logic — only the return values are new
stack72 added a commit to systeminit/swamp-extensions that referenced this pull request Mar 31, 2026
Update `S3CacheSyncService` to return actual file counts from `pullChanged()`
and `pushChanged()` instead of `Promise<void>`. This feeds accurate numbers
through to the CLI output — previously `swamp datastore sync --pull` always
displayed "Pulled 0 files" and the coordinator always logged "already up to
date" regardless of how many files were actually transferred.

Companion to systeminit/swamp#977 which updated the `DatastoreSyncService`
interface to accept `Promise<number | void>`.

- **`extensions/datastores/_lib/s3_cache_sync.ts`** — Return pulled/pushed
counts from `pullChanged()` and `pushChanged()`
- **`extensions/datastores/_lib/interfaces.ts`** — Match updated interface
(`Promise<number | void>`)

- Type-checks pass (`deno check`)
- No behavioral change to sync logic — only the return values are new
stack72 added a commit to systeminit/swamp-extensions that referenced this pull request Mar 31, 2026
Update `S3CacheSyncService` to return actual file counts from `pullChanged()`
and `pushChanged()` instead of `Promise<void>`. This feeds accurate numbers
through to the CLI output — previously `swamp datastore sync --pull` always
displayed "Pulled 0 files" and the coordinator always logged "already up to
date" regardless of how many files were actually transferred.

Companion to systeminit/swamp#977 which updated the `DatastoreSyncService`
interface to accept `Promise<number | void>`.

- **`extensions/datastores/_lib/s3_cache_sync.ts`** — Return pulled/pushed
counts from `pullChanged()` and `pushChanged()`
- **`extensions/datastores/_lib/interfaces.ts`** — Match updated interface
(`Promise<number | void>`)

- Type-checks pass (`deno check`)
- No behavioral change to sync logic — only the return values are new
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant