refactor(inkless:consume): only remote fetches to run on data executor by jeqo · Pull Request #466 · aiven/inkless

jeqo · 2025-12-23T09:56:16Z

Refactor fetch planning by changing how fetch operations are scheduled on the data executor. It only assigns the data executor for remote fetch operations and let the calling thread to run the cache.get(), avoiding the scenario where calls to cache are blocked by all data executor threads being used.

By doing this, the CacheFetchJob component got irrelevant as most of the logic is already on the planning, even the scheduling.

Changes:

Delete CacheFetchJob
Extend ObjectCache to allow passing the executor to run the load function
Refactor the FetchPlanner to use the new approach
Update documentation on FetchPlanner and Reader to clarify how stages are ran
Move CacheFetchJobTest to FetchPlannerTest to keep and improve coverage

Copilot

Pull request overview

This PR refactors the fetch planning architecture to optimize thread pool usage by executing only remote fetches on the data executor, while cache lookups run on the calling thread. The key changes eliminate the CacheFetchJob component and extend the ObjectCache interface to accept an executor parameter for async operations.

Key Changes:

Deleted CacheFetchJob and moved its logic into FetchPlanner, which now directly interacts with the cache and schedules remote fetches
Modified ObjectCache API to return CompletableFuture and accept an executor parameter for controlling where load operations run
Migrated tests from CacheFetchJobTest to FetchPlannerTest with expanded coverage for cache hit/miss scenarios

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 6 comments.

Show a summary per file

File	Description
`FetchPlannerTest.java`	Migrated and expanded tests from `CacheFetchJobTest`, adding comprehensive tests for concurrent requests, cache hits/misses, and failure scenarios
`CacheFetchJobTest.java`	Deleted - tests migrated to `FetchPlannerTest`
`NullCache.java`	Updated to implement new async `computeIfAbsent` API that returns `CompletableFuture` and accepts executor parameter
`Reader.java`	Added clarifying comments about execution stages and which thread pools handle different operations
`FetchPlanner.java`	Major refactor: removed `CacheFetchJob` dependency, integrated cache operations directly, added `ObjectFetchRequest` and `MergedBatchRequest` records for internal planning
`CacheFetchJob.java`	Deleted - functionality merged into `FetchPlanner`
`ObjectCache.java`	Changed `computeIfAbsent` signature to return `CompletableFuture` and accept an `Executor` parameter for controlling async load execution
`MemoryCache.java`	Deleted - test-only implementation no longer needed
`CaffeineCache.java`	Updated to implement new async `computeIfAbsent` API using Caffeine's async cache with custom executor

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

storage/inkless/src/main/java/io/aiven/inkless/consume/FetchPlanner.java

storage/inkless/src/main/java/io/aiven/inkless/cache/CaffeineCache.java

storage/inkless/src/test/java/io/aiven/inkless/consume/FetchPlannerTest.java

storage/inkless/src/main/java/io/aiven/inkless/consume/FetchPlanner.java

storage/inkless/src/main/java/io/aiven/inkless/cache/ObjectCache.java

storage/inkless/src/test/java/io/aiven/inkless/consume/FetchPlannerTest.java

Copilot

Pull request overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated 2 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

storage/inkless/src/test/java/io/aiven/inkless/consume/FetchPlannerTest.java

Copilot

Pull request overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated no new comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot

Pull request overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated 1 comment.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

storage/inkless/src/main/java/io/aiven/inkless/consume/FetchPlanner.java

AnatolyPopov · 2025-12-29T21:29:16Z

storage/inkless/src/test/java/io/aiven/inkless/consume/FetchPlannerTest.java

+            // Should have only 1 future because the cache deduplicates same-key requests
+            assertThat(futures).hasSize(1);


This will happen not because of the cache key but because of groupingBy with BatchInfo.objectKey. There will be only single FetchRequest created always because of that. Might be so that this is still fine because the idea is to test deduplication but the comment seems to be misleading.

Good point, updating comment.

AnatolyPopov · 2025-12-29T21:40:05Z

storage/inkless/src/main/java/io/aiven/inkless/consume/FetchPlanner.java

-                metrics::fetchFileFinished,
-                metrics::cacheEntrySize
+                timestamp,
+                totalBytes


If total bytes will be used for rate limiting there will be a sum calculated further out of these values, right? But here every request will get the totalBytes and will have the same value, should it be byteRange.size() instead? Or could you clarify what am I missing?

I'm removing the ts/bytes from this PR as it's mixing concerns from a following PR here. I've refactored the changes on the first fixup commit to simplify this.

AnatolyPopov · 2025-12-29T23:05:47Z

storage/inkless/src/test/java/io/aiven/inkless/consume/FetchPlannerTest.java

+            // Execute: Trigger fetch operations
+            final List<CompletableFuture<FileExtent>> futures = planner.get();
+
+            // Verify: Should have two futures
+            assertThat(futures).hasSize(2);
+
+            // Wait for all to complete
+            CompletableFuture.allOf(futures.toArray(new CompletableFuture[0])).get();
+
+            // Verify both were fetched
+            verify(fetcher).fetch(eq(OBJECT_KEY_A), any(ByteRange.class));
+            verify(fetcher).fetch(eq(OBJECT_KEY_B), any(ByteRange.class));
+
+            // Verify correct data for each
+            final List<FileExtent> results = futures.stream()
+                .map(f -> {
+                    try {
+                        return f.get();
+                    } catch (Exception e) {
+                        throw new RuntimeException(e);
+                    }
+                })
+                .collect(Collectors.toList());
+
+            assertThat(results).hasSize(2);
+
+            // Verify the actual data content matches expected values
+            // Note: We cannot rely on ordering since the futures complete asynchronously,
+            // so we use containsExactlyInAnyOrder to verify both byte arrays are present.
+            assertThat(results)
+                .extracting(FileExtent::data)
+                .containsExactlyInAnyOrder(dataA, dataB);


Could this be just a one-liner plus verification?

Suggested change

// Execute: Trigger fetch operations

final List<CompletableFuture<FileExtent>> futures = planner.get();

// Verify: Should have two futures

assertThat(futures).hasSize(2);

// Wait for all to complete

CompletableFuture.allOf(futures.toArray(new CompletableFuture[0])).get();

// Verify both were fetched

verify(fetcher).fetch(eq(OBJECT_KEY_A), any(ByteRange.class));

verify(fetcher).fetch(eq(OBJECT_KEY_B), any(ByteRange.class));

// Verify correct data for each

final List<FileExtent> results = futures.stream()

.map(f -> {

try {

return f.get();

} catch (Exception e) {

throw new RuntimeException(e);

}

})

.collect(Collectors.toList());

assertThat(results).hasSize(2);

// Verify the actual data content matches expected values

// Note: We cannot rely on ordering since the futures complete asynchronously,

// so we use containsExactlyInAnyOrder to verify both byte arrays are present.

assertThat(results)

.extracting(FileExtent::data)

.containsExactlyInAnyOrder(dataA, dataB);

assertThat(planner.get()).map(CompletableFuture::get).map(FileExtent::data)

.containsExactlyInAnyOrder(dataA, dataB);

verify(fetcher).fetch(eq(OBJECT_KEY_A), any(ByteRange.class));

verify(fetcher).fetch(eq(OBJECT_KEY_B), any(ByteRange.class));

AnatolyPopov · 2025-12-29T23:30:47Z

storage/inkless/src/main/java/io/aiven/inkless/cache/CaffeineCache.java

+            // Use the provided executor instead of Caffeine's default executor.
+            // This allows us to control which thread pool handles the fetch and blocks there,
+            // while Caffeine's internal threads remain unblocked, so cache operations can continue to be served.
+            return CompletableFuture.supplyAsync(() -> load.apply(k), loadExecutor);


If the future completes exceptionally will the result be cached? AFAIU there will be no retry on the next attempt and the future will stay cached until evicted.

Good catch! It looks like this is the case in Caffeine. I'm updating the logic to handle the eviction in case of failure.

AnatolyPopov · 2025-12-29T23:43:16Z

storage/inkless/src/main/java/io/aiven/inkless/cache/ObjectCache.java

 import io.aiven.inkless.generated.CacheKey;
 import io.aiven.inkless.generated.FileExtent;

 public interface ObjectCache extends Cache<CacheKey, FileExtent>, Closeable {


Side question out of curiosity: why at all we are extending this Cache interface? Are we relying on it somewhere?

Not fully sure, but it may be some historic reasons why it was using the internal API available for caching where it was first implemented as Infinispan. Could be reconsidered in a follow-up PR.

Thanks for answering, the reason for the question is that this leads to some races in the CaffeineCache class, where we use check-then-act-pattern, e.g. getIfPresent, check for null, then do something. In these situations the eviction can be triggered between the check and before the act and that is what we faced with in tiered storage. But because of these Kafka Cache semantics we have to implement it this way. So would be interesting to see if we can get rid of this interface eliminating the race at the same time.

AnatolyPopov · 2025-12-29T23:54:13Z

storage/inkless/src/test/java/io/aiven/inkless/consume/FetchPlannerTest.java

+
+            final FetchPlanner planner = new FetchPlanner(
+                time, OBJECT_KEY_CREATOR, keyAlignmentStrategy,
+                caffeineCache, fetcher, dataExecutor, coordinates, metrics


Do I understand correctly that the supposed to be concurrent requests will be submitted to single-threaded dataExecutor? What am I missing here and how they can be concurrent then?

Good point. I'm clarifying this by renaming the test and updating the docs to reflect this is testing the async behavior, but using single threaded pool for deterministic results.

AnatolyPopov · 2025-12-29T23:59:53Z

storage/inkless/src/main/java/io/aiven/inkless/cache/CaffeineCache.java

Unrelated to the PR but a bit strange: the doc says -1 is disabled, I wonder how it becomes 180.

Refactor fetch planning by changing how fetch operations are scheduled on the data executor. It only assigns the data executor for remote fetch operations and let the calling thread to run the cache.get(), avoiding the scenario where calls to cache are blocked by all data executor threads being used. By doing this, the CacheFetchJob component becomes irrelevant as most of the logic is already on the planning, even the scheduling decision. Changes: - Delete CacheFetchJob - Extend ObjectCache to allow passing the executor to run the load function - Refactor the FetchPlanner to use the new approach - Update documentation on FetchPlanner and Reader to clarify how stages are ran - Move CacheFetchJobTest to FetchPlannerTest to keep and improve coverage

…executor Remove the temporal merged as this PR does not use bytes or timestamp. Let that for the following PRs to define

…executor Apply suggestion on simplified test

…executor Improve error handling on cache async get

…executor Improve comment as suggested

…executor Further refactoring: Use same constructor as existing methods

…executor

AnatolyPopov · 2025-12-30T14:08:16Z

storage/inkless/src/main/java/io/aiven/inkless/cache/CaffeineCache.java

+                    // While Caffeine has built-in failed future cleanup, it happens asynchronously.
+                    // Explicit invalidation ensures immediate removal for faster retry on subsequent requests.
+                    if (throwable != null) {
+                        cache.synchronous().invalidate(key);


nit: Should it be k instead of key? Using k I think we will guarantee that even for some reason the key will be mutated then we will properly invalidate the entry.

AnatolyPopov · 2025-12-30T14:09:40Z

storage/inkless/src/main/java/io/aiven/inkless/cache/CaffeineCache.java

+        // Caffeine's AsyncCache.get() provides atomic cache population per key.
+        // When multiple threads concurrently request the same uncached key, the mapping function
+        // is invoked only once, and all waiting threads receive the same CompletableFuture.
+        // This guarantees that the load function is called exactly once per key, preventing duplicate


Nit: Exactly once is not entirely correct now since we retry on failure

…executor

AnatolyPopov

Thanks for addressing all the comments, @jeqo! LGTM!

#466) * refactor(inkless:consume): only remote fetches to run on data executor Refactor fetch planning by changing how fetch operations are scheduled on the data executor. It only assigns the data executor for remote fetch operations and let the calling thread to run the cache.get(), avoiding the scenario where calls to cache are blocked by all data executor threads being used. By doing this, the CacheFetchJob component becomes irrelevant as most of the logic is already on the planning, even the scheduling decision. Changes: - Delete CacheFetchJob - Extend ObjectCache to allow passing the executor to run the load function - Refactor the FetchPlanner to use the new approach - Update documentation on FetchPlanner and Reader to clarify how stages are ran - Move CacheFetchJobTest to FetchPlannerTest to keep and improve coverage * fixup! refactor(inkless:consume): only remote fetches to run on data executor Remove the temporal merged as this PR does not use bytes or timestamp. Let that for the following PRs to define * fixup! refactor(inkless:consume): only remote fetches to run on data executor Apply suggestion on simplified test * fixup! refactor(inkless:consume): only remote fetches to run on data executor Improve error handling on cache async get * fixup! refactor(inkless:consume): only remote fetches to run on data executor Improve comment as suggested * fixup! refactor(inkless:consume): only remote fetches to run on data executor Further refactoring: Use same constructor as existing methods * fixup! refactor(inkless:consume): only remote fetches to run on data executor * fixup! refactor(inkless:consume): only remote fetches to run on data executor * fixup! refactor(inkless:consume): only remote fetches to run on data executor * fixup! refactor(inkless:consume): only remote fetches to run on data executor * fixup! refactor(inkless:consume): only remote fetches to run on data executor

jeqo requested a review from Copilot December 23, 2025 10:13

Copilot started reviewing on behalf of jeqo December 23, 2025 10:14 View session

Copilot AI reviewed Dec 23, 2025

View reviewed changes

jeqo force-pushed the jeqo/only-fetch-on-work-thread branch from 6f06aff to d4c4bc4 Compare December 23, 2025 11:07

jeqo requested a review from Copilot December 23, 2025 11:07

Copilot started reviewing on behalf of jeqo December 23, 2025 11:07 View session

Copilot AI reviewed Dec 23, 2025

View reviewed changes

storage/inkless/src/test/java/io/aiven/inkless/consume/FetchPlannerTest.java Outdated Show resolved Hide resolved

storage/inkless/src/test/java/io/aiven/inkless/consume/FetchPlannerTest.java Outdated Show resolved Hide resolved

jeqo force-pushed the jeqo/only-fetch-on-work-thread branch 2 times, most recently from c64c4aa to 31b9aa9 Compare December 23, 2025 13:48

jeqo requested a review from Copilot December 23, 2025 13:50

Copilot started reviewing on behalf of jeqo December 23, 2025 13:51 View session

Copilot AI reviewed Dec 23, 2025

View reviewed changes

jeqo force-pushed the jeqo/only-fetch-on-work-thread branch from 31b9aa9 to f6be2a5 Compare December 23, 2025 15:16

jeqo force-pushed the jeqo/refactor-fetch-completer-schedule branch from 6cf25a1 to e862b11 Compare December 29, 2025 12:38

jeqo force-pushed the jeqo/only-fetch-on-work-thread branch from f6be2a5 to 94ed0bf Compare December 29, 2025 13:30

jeqo requested a review from Copilot December 29, 2025 14:53

Copilot started reviewing on behalf of jeqo December 29, 2025 14:54 View session

Copilot AI reviewed Dec 29, 2025

View reviewed changes

storage/inkless/src/main/java/io/aiven/inkless/consume/FetchPlanner.java Outdated Show resolved Hide resolved

jeqo force-pushed the jeqo/only-fetch-on-work-thread branch from 2713196 to 3285a0e Compare December 29, 2025 15:10

Base automatically changed from jeqo/refactor-fetch-completer-schedule to main December 29, 2025 15:27

jeqo force-pushed the jeqo/only-fetch-on-work-thread branch from 3285a0e to 3dd60d4 Compare December 29, 2025 15:29

jeqo marked this pull request as ready for review December 29, 2025 15:30

jeqo requested a review from AnatolyPopov December 29, 2025 15:30

AnatolyPopov reviewed Dec 30, 2025

View reviewed changes

jeqo added 6 commits December 30, 2025 12:20

fixup! refactor(inkless:consume): only remote fetches to run on data …

f514cfb

…executor Remove the temporal merged as this PR does not use bytes or timestamp. Let that for the following PRs to define

fixup! refactor(inkless:consume): only remote fetches to run on data …

f2d53f8

…executor Apply suggestion on simplified test

fixup! refactor(inkless:consume): only remote fetches to run on data …

b128519

…executor Improve error handling on cache async get

fixup! refactor(inkless:consume): only remote fetches to run on data …

b091857

…executor Improve comment as suggested

fixup! refactor(inkless:consume): only remote fetches to run on data …

b2a3222

…executor Further refactoring: Use same constructor as existing methods

fixup! refactor(inkless:consume): only remote fetches to run on data …

f3a73f7

…executor

jeqo force-pushed the jeqo/only-fetch-on-work-thread branch from 39afa9d to f3a73f7 Compare December 30, 2025 10:20

jeqo requested a review from AnatolyPopov December 30, 2025 10:20

jeqo added 3 commits December 30, 2025 13:16

fixup! refactor(inkless:consume): only remote fetches to run on data …

d0697d0

…executor

fixup! refactor(inkless:consume): only remote fetches to run on data …

b13b677

…executor

fixup! refactor(inkless:consume): only remote fetches to run on data …

7851824

…executor

AnatolyPopov reviewed Dec 30, 2025

View reviewed changes

fixup! refactor(inkless:consume): only remote fetches to run on data …

70b9878

…executor

jeqo requested a review from AnatolyPopov December 30, 2025 15:06

AnatolyPopov approved these changes Dec 30, 2025

View reviewed changes

AnatolyPopov merged commit 05ff0ac into main Dec 30, 2025
4 checks passed

AnatolyPopov deleted the jeqo/only-fetch-on-work-thread branch December 30, 2025 23:13

		// Should have only 1 future because the cache deduplicates same-key requests
		assertThat(futures).hasSize(1);

Conversation

jeqo commented Dec 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

AnatolyPopov left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jeqo commented Dec 23, 2025 •

edited

Loading