Skip to content

feat(metadata:diskless): implement managed replicas for diskless topics#492

Merged
giuseppelillo merged 5 commits intomainfrom
jeqo/pod-2001-diskless-managed-replica
Mar 10, 2026
Merged

feat(metadata:diskless): implement managed replicas for diskless topics#492
giuseppelillo merged 5 commits intomainfrom
jeqo/pod-2001-diskless-managed-replica

Conversation

@jeqo
Copy link
Contributor

@jeqo jeqo commented Jan 22, 2026

Implements Phase 1 of Diskless Managed Replicas (see #478 docs/inkless/ts-unification/DISKLESS_MANAGED_RF.md).

When diskless.managed.rf.enable=true, new diskless topics are created with RF = rack_count (one replica per rack) using standard KRaft replica placement, instead of legacy RF=1.

Changes

  • Add diskless.managed.rf.enable server config (default: false)
  • Compute RF from rack cardinality at topic creation
  • Use standard replicaPlacer.place() for rack-aware assignment
  • Allow manual replica assignments for operational flexibility
  • Comprehensive unit tests for managed and unmanaged modes

Known Limitations (Phase 1)

This feature is opt-in only via diskless.managed.rf.enable=true.
Existing clusters and topics are unaffected unless explicitly enabled.

When enabled:

  • Partition Reassignment: Not yet supported for managed replica topics. Reassignment will hang waiting for replica sync. Will be fixed in follow-up PR.

  • Transformer: Not updated - uses legacy routing (Phase 2). KRaft metadata shows RF=3, but clients are still routed to any alive broker. Doesn't affect correctness for DISKLESS_ONLY topics.

  • Add Partitions: Works correctly - inherits RF and uses rack-aware placement.

  • Observability metrics: Deferred to Phase 2.

Safe to merge because:

  • Feature flag is false by default
  • Only affects new topic creation when explicitly enabled
  • Existing diskless topics (RF=1) continue to work unchanged
  • No impact on classic (non-diskless) topics

Follow-up PRs Sequence

  1. This PR (Phase 1): Topic creation with managed replicas
  2. Phase 2 PR (jeqo/pod-2001-unlock-diskless-reassignment currently has):
  • Transformer changes for mode-aware routing
  • Immediate partition reassignment fix (critical!)
  • Phase 3 test for add partitions

Testing

  • Tests for: no-racks, with-racks, invalid input, internal topics, broker fencing, unregister scenarios

Configuration

Config Default Description
diskless.managed.rf.enable false When true, new diskless topics get RF=rack_count

@jeqo jeqo force-pushed the jeqo/pod-2001-diskless-managed-replica branch from 259d178 to d64e450 Compare February 5, 2026 14:54
@jeqo jeqo requested a review from Copilot February 9, 2026 09:58
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Implements Phase 1 of “Diskless Managed Replicas”: when enabled via a new server config, newly-created diskless topics use standard KRaft replica placement with replication factor derived from rack cardinality (instead of legacy RF=1), and manual replica assignments are allowed for diskless topics in managed mode.

Changes:

  • Add diskless.managed.rf.enable server config and plumb it through broker/controller config wiring.
  • Update ReplicationControlManager.createTopic to compute diskless RF from rack cardinality and use replicaPlacer.place() in managed mode; allow manual assignments in managed mode.
  • Extend unit tests to cover managed vs unmanaged behavior across no-rack / with-rack / invalid input / fencing & unregister scenarios.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
server-common/src/main/java/org/apache/kafka/server/config/ServerConfigs.java Introduces new diskless.managed.rf.enable config definition and documentation.
metadata/src/main/java/org/apache/kafka/controller/ReplicationControlManager.java Adds managed-replica enable flag, rack-cardinality RF computation, and managed-mode placement path in topic creation.
metadata/src/main/java/org/apache/kafka/controller/QuorumController.java Wires the managed-replica flag from controller builder into ReplicationControlManager.
core/src/main/scala/kafka/server/KafkaConfig.scala Exposes disklessManagedReplicasEnabled from server properties.
core/src/main/scala/kafka/server/ControllerServer.scala Passes the new config into the QuorumController.Builder.
metadata/src/test/java/org/apache/kafka/controller/ReplicationControlManagerTest.java Adds/updates tests and test context plumbing for managed vs unmanaged diskless behavior.
checkstyle/suppressions.xml Adds a MethodLength suppression for ReplicationControlManager.java.
Comments suppressed due to low confidence (1)

metadata/src/main/java/org/apache/kafka/controller/ReplicationControlManager.java:792

  • The INVALID_REPLICATION_FACTOR message hard-codes the “default value (1)”, but when diskless managed replicas are enabled the effective RF is derived from rack cardinality (can be > 1). Update the error message (and/or the preceding comment) to describe the actual allowed inputs (1 or -1) and what -1/managed mode means, without implying the default is always 1.
            // Diskless RF: only -1 (system-computed from rack count) or 1 (backward compat) allowed.
            // Explicit RF > 1 rejected: users shouldn't need to know rack topology.
            if (Math.abs(topic.replicationFactor()) != 1) {
                return new ApiError(Errors.INVALID_REPLICATION_FACTOR,
                    "Replication factor for diskless topics must be 1 or -1 to use the default value (1).");
            }

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@jeqo jeqo force-pushed the jeqo/pod-2001-diskless-managed-replica branch from d64e450 to 3e7a42b Compare February 9, 2026 16:37
@jeqo jeqo requested a review from Copilot February 9, 2026 16:37
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

@jeqo jeqo requested a review from Copilot February 9, 2026 16:38
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

@jeqo jeqo requested a review from Copilot February 9, 2026 16:59
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 6 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@jeqo jeqo force-pushed the jeqo/pod-2001-diskless-managed-replica branch from 3e7a42b to 6fba0c9 Compare February 9, 2026 18:02
@jeqo jeqo requested a review from Copilot February 9, 2026 18:12
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@jeqo jeqo force-pushed the jeqo/pod-2001-diskless-managed-replica branch from 6fba0c9 to 1f3c7e1 Compare February 9, 2026 19:06
@jeqo jeqo requested a review from Copilot February 9, 2026 19:06
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@jeqo jeqo force-pushed the jeqo/pod-2001-diskless-managed-replica branch from 1f3c7e1 to 6f01be4 Compare February 9, 2026 23:23
@jeqo jeqo requested a review from Copilot February 9, 2026 23:29
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@jeqo jeqo force-pushed the jeqo/pod-2001-diskless-managed-replica branch from 6f01be4 to 82dbca5 Compare February 10, 2026 00:16
@jeqo jeqo requested a review from Copilot February 10, 2026 00:16
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@jeqo jeqo force-pushed the jeqo/pod-2001-diskless-managed-replica branch from 82dbca5 to 5f41dde Compare February 10, 2026 00:46
@jeqo jeqo requested a review from Copilot February 10, 2026 00:46
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated no new comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@jeqo jeqo marked this pull request as ready for review February 10, 2026 00:59
jeqo added 2 commits March 3, 2026 14:04
… unregister scenarios

- Add _noRacks and _withRacks test variants for consistent coverage
- Fix tests that assumed broker 0 was always the leader
- Get actual leader from partition registration before fencing/unregistering
- Use dynamic assertions based on actual partition state
- Improve assertion error messages for clarity
Add diskless.managed.rf.enable config (default: false) to control whether
diskless topics use managed replicas with RF=rack_count or legacy RF=1.

This config only affects topic creation. When enabled, new diskless topics
will be created with one replica per rack using standard KRaft placement.

Part of Phase 1: Diskless Managed Replicas
(See #478 docs/inkless/ts-unification/DISKLESS_MANAGED_RF.md)

# Conflicts:
#	core/src/main/scala/kafka/server/ControllerServer.scala
#	core/src/main/scala/kafka/server/KafkaConfig.scala
#	metadata/src/main/java/org/apache/kafka/controller/QuorumController.java
#	server-common/src/main/java/org/apache/kafka/server/config/ServerConfigs.java
@jeqo jeqo force-pushed the jeqo/pod-2001-diskless-managed-replica branch 2 times, most recently from 18e49b9 to 20c315d Compare March 4, 2026 11:15
When diskless.managed.rf.enable=true, new diskless topics are created with
RF=rack_count using standard KRaft replica placement instead of legacy RF=1.

Changes:
- Compute RF from rack cardinality via rackCardinality()
- Use standard replicaPlacer.place() for rack-aware assignment
- Allow manual replica assignments when managed replicas enabled
- Add checkstyle suppression for extended createTopic method

Phase 1 limitations:
- Add Partitions inherits RF from existing partitions (Phase 3)
- Transformer not updated, uses legacy routing (Phase 2)
- Integration tests deferred to Phase 2
(See #478 docs/inkless/ts-unification/DISKLESS_MANAGED_RF.md)

� Conflicts:
�	metadata/src/main/java/org/apache/kafka/controller/QuorumController.java
�	metadata/src/main/java/org/apache/kafka/controller/ReplicationControlManager.java
�	metadata/src/test/java/org/apache/kafka/controller/ReplicationControlManagerTest.java
@jeqo jeqo force-pushed the jeqo/pod-2001-diskless-managed-replica branch from 20c315d to 9a44b38 Compare March 4, 2026 14:37
@jeqo jeqo requested a review from giuseppelillo March 5, 2026 10:55
// Throws BrokerNotAvailableException or InvalidReplicationFactorException on failure,
// which are caught by the caller and converted to ApiError.
short disklessReplicationFactor = disklessEnabled && isDisklessManagedReplicasEnabled ? rackCardinality() : 1;
short replicationFactor = disklessEnabled ? disklessReplicationFactor : classicReplicationFactor;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have been thinking a lot about this. Overall I agree that from a service provider point of view it's a good feature to choose replication factor based on available zones or regions automatically. On the other hand Kafka users don't really expect that the replication factor will be chosen for them and they may not use rack awareness at all. So this assumes that diskless partitions will be used on cloud. Should we implement this feature in Aiven Core instead?
If this is an Aiven specific feature it might be OK, but perhaps we could consider moving this to Aiven Core in that case too to avoid modifications in core Kafka.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As an alternative you can use manual assignment and specify the RF that you prefer.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, on a third thought, we might be fine as long as the user can opt to use manual assignment. Then maybe it's enough to log that we auto-selected the replication factor and brokers.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair question. The main goal was not to hide but to keep things backward compatible with how diskless topics are currently created with RF=1 (default -1 was turned to 1), and having this mechanism to define a proper default for diskless.

On a second thought, I wonder if we can remove this "magic RF" and leave the user-defined RF. I'm looking into the implications.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a couple of fixup commits to remove the calculated RF, and rely on user-provided value. It relies a bit more on how the cluster is configured, but as long as default replication factor = racks, it achieves the same result when RF=-1.

I think we can go with this approach to simplify things and avoid diverging further from classic kafka. PTAL.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, it's true that without being careful you can end up with diskless topics not spread among all available AZs (like if you don't edit default.replication.factor and use RF=-1), but it's better to have maximum flexibility.

giuseppelillo
giuseppelillo previously approved these changes Mar 9, 2026
@giuseppelillo giuseppelillo merged commit 09ba4d1 into main Mar 10, 2026
8 checks passed
@giuseppelillo giuseppelillo deleted the jeqo/pod-2001-diskless-managed-replica branch March 10, 2026 10:52
AnatolyPopov pushed a commit that referenced this pull request Mar 23, 2026
…cs (#492)

* test(metadata:diskless): improve test coverage for broker fencing and unregister scenarios

- Add _noRacks and _withRacks test variants for consistent coverage
- Fix tests that assumed broker 0 was always the leader
- Get actual leader from partition registration before fencing/unregistering
- Use dynamic assertions based on actual partition state
- Improve assertion error messages for clarity

* feat(controller:diskless): add server config for managed replicas

Add diskless.managed.rf.enable config (default: false) to control whether
diskless topics use managed replicas with RF=rack_count or legacy RF=1.

This config only affects topic creation. When enabled, new diskless topics
will be created with one replica per rack using standard KRaft placement.

Part of Phase 1: Diskless Managed Replicas
(See #478 docs/inkless/ts-unification/DISKLESS_MANAGED_RF.md)

# Conflicts:
#	core/src/main/scala/kafka/server/ControllerServer.scala
#	core/src/main/scala/kafka/server/KafkaConfig.scala
#	metadata/src/main/java/org/apache/kafka/controller/QuorumController.java
#	server-common/src/main/java/org/apache/kafka/server/config/ServerConfigs.java

* feat(metadata:diskless): implement managed replicas for diskless topics

When diskless.managed.rf.enable=true, new diskless topics are created with
RF=rack_count using standard KRaft replica placement instead of legacy RF=1.

Changes:
- Compute RF from rack cardinality via rackCardinality()
- Use standard replicaPlacer.place() for rack-aware assignment
- Allow manual replica assignments when managed replicas enabled
- Add checkstyle suppression for extended createTopic method

Phase 1 limitations:
- Add Partitions inherits RF from existing partitions (Phase 3)
- Transformer not updated, uses legacy routing (Phase 2)
- Integration tests deferred to Phase 2
(See #478 docs/inkless/ts-unification/DISKLESS_MANAGED_RF.md)

� Conflicts:
�	metadata/src/main/java/org/apache/kafka/controller/QuorumController.java
�	metadata/src/main/java/org/apache/kafka/controller/ReplicationControlManager.java
�	metadata/src/test/java/org/apache/kafka/controller/ReplicationControlManagerTest.java

* fixup! feat(controller:diskless): add server config for managed replicas

* fixup! feat(metadata:diskless): implement managed replicas for diskless topics

(cherry picked from commit 09ba4d1)
jeqo added a commit that referenced this pull request Mar 23, 2026
…cs (#492)

* test(metadata:diskless): improve test coverage for broker fencing and unregister scenarios

- Add _noRacks and _withRacks test variants for consistent coverage
- Fix tests that assumed broker 0 was always the leader
- Get actual leader from partition registration before fencing/unregistering
- Use dynamic assertions based on actual partition state
- Improve assertion error messages for clarity

* feat(controller:diskless): add server config for managed replicas

Add diskless.managed.rf.enable config (default: false) to control whether
diskless topics use managed replicas with RF=rack_count or legacy RF=1.

This config only affects topic creation. When enabled, new diskless topics
will be created with one replica per rack using standard KRaft placement.

Part of Phase 1: Diskless Managed Replicas
(See #478 docs/inkless/ts-unification/DISKLESS_MANAGED_RF.md)

# Conflicts:
#	core/src/main/scala/kafka/server/ControllerServer.scala
#	core/src/main/scala/kafka/server/KafkaConfig.scala
#	metadata/src/main/java/org/apache/kafka/controller/QuorumController.java
#	server-common/src/main/java/org/apache/kafka/server/config/ServerConfigs.java

* feat(metadata:diskless): implement managed replicas for diskless topics

When diskless.managed.rf.enable=true, new diskless topics are created with
RF=rack_count using standard KRaft replica placement instead of legacy RF=1.

Changes:
- Compute RF from rack cardinality via rackCardinality()
- Use standard replicaPlacer.place() for rack-aware assignment
- Allow manual replica assignments when managed replicas enabled
- Add checkstyle suppression for extended createTopic method

Phase 1 limitations:
- Add Partitions inherits RF from existing partitions (Phase 3)
- Transformer not updated, uses legacy routing (Phase 2)
- Integration tests deferred to Phase 2
(See #478 docs/inkless/ts-unification/DISKLESS_MANAGED_RF.md)

� Conflicts:
�	metadata/src/main/java/org/apache/kafka/controller/QuorumController.java
�	metadata/src/main/java/org/apache/kafka/controller/ReplicationControlManager.java
�	metadata/src/test/java/org/apache/kafka/controller/ReplicationControlManagerTest.java

* fixup! feat(controller:diskless): add server config for managed replicas

* fixup! feat(metadata:diskless): implement managed replicas for diskless topics
jeqo added a commit that referenced this pull request Mar 23, 2026
…cs (#492)

* test(metadata:diskless): improve test coverage for broker fencing and unregister scenarios

- Add _noRacks and _withRacks test variants for consistent coverage
- Fix tests that assumed broker 0 was always the leader
- Get actual leader from partition registration before fencing/unregistering
- Use dynamic assertions based on actual partition state
- Improve assertion error messages for clarity

* feat(controller:diskless): add server config for managed replicas

Add diskless.managed.rf.enable config (default: false) to control whether
diskless topics use managed replicas with RF=rack_count or legacy RF=1.

This config only affects topic creation. When enabled, new diskless topics
will be created with one replica per rack using standard KRaft placement.

Part of Phase 1: Diskless Managed Replicas
(See #478 docs/inkless/ts-unification/DISKLESS_MANAGED_RF.md)

# Conflicts:
#	core/src/main/scala/kafka/server/ControllerServer.scala
#	core/src/main/scala/kafka/server/KafkaConfig.scala
#	metadata/src/main/java/org/apache/kafka/controller/QuorumController.java
#	server-common/src/main/java/org/apache/kafka/server/config/ServerConfigs.java

* feat(metadata:diskless): implement managed replicas for diskless topics

When diskless.managed.rf.enable=true, new diskless topics are created with
RF=rack_count using standard KRaft replica placement instead of legacy RF=1.

Changes:
- Compute RF from rack cardinality via rackCardinality()
- Use standard replicaPlacer.place() for rack-aware assignment
- Allow manual replica assignments when managed replicas enabled
- Add checkstyle suppression for extended createTopic method

Phase 1 limitations:
- Add Partitions inherits RF from existing partitions (Phase 3)
- Transformer not updated, uses legacy routing (Phase 2)
- Integration tests deferred to Phase 2
(See #478 docs/inkless/ts-unification/DISKLESS_MANAGED_RF.md)

� Conflicts:
�	metadata/src/main/java/org/apache/kafka/controller/QuorumController.java
�	metadata/src/main/java/org/apache/kafka/controller/ReplicationControlManager.java
�	metadata/src/test/java/org/apache/kafka/controller/ReplicationControlManagerTest.java

* fixup! feat(controller:diskless): add server config for managed replicas

* fixup! feat(metadata:diskless): implement managed replicas for diskless topics

(cherry picked from commit 09ba4d1)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants