feat(metadata:diskless): implement managed replicas for diskless topics#492
Conversation
259d178 to
d64e450
Compare
There was a problem hiding this comment.
Pull request overview
Implements Phase 1 of “Diskless Managed Replicas”: when enabled via a new server config, newly-created diskless topics use standard KRaft replica placement with replication factor derived from rack cardinality (instead of legacy RF=1), and manual replica assignments are allowed for diskless topics in managed mode.
Changes:
- Add
diskless.managed.rf.enableserver config and plumb it through broker/controller config wiring. - Update
ReplicationControlManager.createTopicto compute diskless RF from rack cardinality and usereplicaPlacer.place()in managed mode; allow manual assignments in managed mode. - Extend unit tests to cover managed vs unmanaged behavior across no-rack / with-rack / invalid input / fencing & unregister scenarios.
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
server-common/src/main/java/org/apache/kafka/server/config/ServerConfigs.java |
Introduces new diskless.managed.rf.enable config definition and documentation. |
metadata/src/main/java/org/apache/kafka/controller/ReplicationControlManager.java |
Adds managed-replica enable flag, rack-cardinality RF computation, and managed-mode placement path in topic creation. |
metadata/src/main/java/org/apache/kafka/controller/QuorumController.java |
Wires the managed-replica flag from controller builder into ReplicationControlManager. |
core/src/main/scala/kafka/server/KafkaConfig.scala |
Exposes disklessManagedReplicasEnabled from server properties. |
core/src/main/scala/kafka/server/ControllerServer.scala |
Passes the new config into the QuorumController.Builder. |
metadata/src/test/java/org/apache/kafka/controller/ReplicationControlManagerTest.java |
Adds/updates tests and test context plumbing for managed vs unmanaged diskless behavior. |
checkstyle/suppressions.xml |
Adds a MethodLength suppression for ReplicationControlManager.java. |
Comments suppressed due to low confidence (1)
metadata/src/main/java/org/apache/kafka/controller/ReplicationControlManager.java:792
- The INVALID_REPLICATION_FACTOR message hard-codes the “default value (1)”, but when diskless managed replicas are enabled the effective RF is derived from rack cardinality (can be > 1). Update the error message (and/or the preceding comment) to describe the actual allowed inputs (1 or -1) and what -1/managed mode means, without implying the default is always 1.
// Diskless RF: only -1 (system-computed from rack count) or 1 (backward compat) allowed.
// Explicit RF > 1 rejected: users shouldn't need to know rack topology.
if (Math.abs(topic.replicationFactor()) != 1) {
return new ApiError(Errors.INVALID_REPLICATION_FACTOR,
"Replication factor for diskless topics must be 1 or -1 to use the default value (1).");
}
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
server-common/src/main/java/org/apache/kafka/server/config/ServerConfigs.java
Outdated
Show resolved
Hide resolved
metadata/src/test/java/org/apache/kafka/controller/ReplicationControlManagerTest.java
Outdated
Show resolved
Hide resolved
metadata/src/test/java/org/apache/kafka/controller/ReplicationControlManagerTest.java
Outdated
Show resolved
Hide resolved
metadata/src/main/java/org/apache/kafka/controller/ReplicationControlManager.java
Outdated
Show resolved
Hide resolved
metadata/src/main/java/org/apache/kafka/controller/ReplicationControlManager.java
Outdated
Show resolved
Hide resolved
d64e450 to
3e7a42b
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 7 out of 7 changed files in this pull request and generated 6 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
metadata/src/main/java/org/apache/kafka/controller/ReplicationControlManager.java
Outdated
Show resolved
Hide resolved
metadata/src/main/java/org/apache/kafka/controller/ReplicationControlManager.java
Outdated
Show resolved
Hide resolved
metadata/src/main/java/org/apache/kafka/controller/ReplicationControlManager.java
Outdated
Show resolved
Hide resolved
metadata/src/main/java/org/apache/kafka/controller/ReplicationControlManager.java
Outdated
Show resolved
Hide resolved
metadata/src/test/java/org/apache/kafka/controller/ReplicationControlManagerTest.java
Outdated
Show resolved
Hide resolved
3e7a42b to
6fba0c9
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
metadata/src/main/java/org/apache/kafka/controller/ReplicationControlManager.java
Outdated
Show resolved
Hide resolved
metadata/src/test/java/org/apache/kafka/controller/ReplicationControlManagerTest.java
Outdated
Show resolved
Hide resolved
6fba0c9 to
1f3c7e1
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 7 out of 7 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
server-common/src/main/java/org/apache/kafka/server/config/ServerConfigs.java
Outdated
Show resolved
Hide resolved
metadata/src/test/java/org/apache/kafka/controller/ReplicationControlManagerTest.java
Outdated
Show resolved
Hide resolved
metadata/src/test/java/org/apache/kafka/controller/ReplicationControlManagerTest.java
Outdated
Show resolved
Hide resolved
1f3c7e1 to
6f01be4
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 7 out of 7 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
metadata/src/test/java/org/apache/kafka/controller/ReplicationControlManagerTest.java
Outdated
Show resolved
Hide resolved
6f01be4 to
82dbca5
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
metadata/src/test/java/org/apache/kafka/controller/ReplicationControlManagerTest.java
Show resolved
Hide resolved
metadata/src/main/java/org/apache/kafka/controller/ReplicationControlManager.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
server-common/src/main/java/org/apache/kafka/server/config/ServerConfigs.java
Outdated
Show resolved
Hide resolved
metadata/src/test/java/org/apache/kafka/controller/ReplicationControlManagerTest.java
Outdated
Show resolved
Hide resolved
82dbca5 to
5f41dde
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 7 out of 7 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
… unregister scenarios - Add _noRacks and _withRacks test variants for consistent coverage - Fix tests that assumed broker 0 was always the leader - Get actual leader from partition registration before fencing/unregistering - Use dynamic assertions based on actual partition state - Improve assertion error messages for clarity
Add diskless.managed.rf.enable config (default: false) to control whether diskless topics use managed replicas with RF=rack_count or legacy RF=1. This config only affects topic creation. When enabled, new diskless topics will be created with one replica per rack using standard KRaft placement. Part of Phase 1: Diskless Managed Replicas (See #478 docs/inkless/ts-unification/DISKLESS_MANAGED_RF.md) # Conflicts: # core/src/main/scala/kafka/server/ControllerServer.scala # core/src/main/scala/kafka/server/KafkaConfig.scala # metadata/src/main/java/org/apache/kafka/controller/QuorumController.java # server-common/src/main/java/org/apache/kafka/server/config/ServerConfigs.java
18e49b9 to
20c315d
Compare
When diskless.managed.rf.enable=true, new diskless topics are created with RF=rack_count using standard KRaft replica placement instead of legacy RF=1. Changes: - Compute RF from rack cardinality via rackCardinality() - Use standard replicaPlacer.place() for rack-aware assignment - Allow manual replica assignments when managed replicas enabled - Add checkstyle suppression for extended createTopic method Phase 1 limitations: - Add Partitions inherits RF from existing partitions (Phase 3) - Transformer not updated, uses legacy routing (Phase 2) - Integration tests deferred to Phase 2 (See #478 docs/inkless/ts-unification/DISKLESS_MANAGED_RF.md) � Conflicts: � metadata/src/main/java/org/apache/kafka/controller/QuorumController.java � metadata/src/main/java/org/apache/kafka/controller/ReplicationControlManager.java � metadata/src/test/java/org/apache/kafka/controller/ReplicationControlManagerTest.java
20c315d to
9a44b38
Compare
| // Throws BrokerNotAvailableException or InvalidReplicationFactorException on failure, | ||
| // which are caught by the caller and converted to ApiError. | ||
| short disklessReplicationFactor = disklessEnabled && isDisklessManagedReplicasEnabled ? rackCardinality() : 1; | ||
| short replicationFactor = disklessEnabled ? disklessReplicationFactor : classicReplicationFactor; |
There was a problem hiding this comment.
I have been thinking a lot about this. Overall I agree that from a service provider point of view it's a good feature to choose replication factor based on available zones or regions automatically. On the other hand Kafka users don't really expect that the replication factor will be chosen for them and they may not use rack awareness at all. So this assumes that diskless partitions will be used on cloud. Should we implement this feature in Aiven Core instead?
If this is an Aiven specific feature it might be OK, but perhaps we could consider moving this to Aiven Core in that case too to avoid modifications in core Kafka.
There was a problem hiding this comment.
As an alternative you can use manual assignment and specify the RF that you prefer.
There was a problem hiding this comment.
Yea, on a third thought, we might be fine as long as the user can opt to use manual assignment. Then maybe it's enough to log that we auto-selected the replication factor and brokers.
There was a problem hiding this comment.
Fair question. The main goal was not to hide but to keep things backward compatible with how diskless topics are currently created with RF=1 (default -1 was turned to 1), and having this mechanism to define a proper default for diskless.
On a second thought, I wonder if we can remove this "magic RF" and leave the user-defined RF. I'm looking into the implications.
There was a problem hiding this comment.
I've added a couple of fixup commits to remove the calculated RF, and rely on user-provided value. It relies a bit more on how the cluster is configured, but as long as default replication factor = racks, it achieves the same result when RF=-1.
I think we can go with this approach to simplify things and avoid diverging further from classic kafka. PTAL.
There was a problem hiding this comment.
Sounds good, it's true that without being careful you can end up with diskless topics not spread among all available AZs (like if you don't edit default.replication.factor and use RF=-1), but it's better to have maximum flexibility.
…cs (#492) * test(metadata:diskless): improve test coverage for broker fencing and unregister scenarios - Add _noRacks and _withRacks test variants for consistent coverage - Fix tests that assumed broker 0 was always the leader - Get actual leader from partition registration before fencing/unregistering - Use dynamic assertions based on actual partition state - Improve assertion error messages for clarity * feat(controller:diskless): add server config for managed replicas Add diskless.managed.rf.enable config (default: false) to control whether diskless topics use managed replicas with RF=rack_count or legacy RF=1. This config only affects topic creation. When enabled, new diskless topics will be created with one replica per rack using standard KRaft placement. Part of Phase 1: Diskless Managed Replicas (See #478 docs/inkless/ts-unification/DISKLESS_MANAGED_RF.md) # Conflicts: # core/src/main/scala/kafka/server/ControllerServer.scala # core/src/main/scala/kafka/server/KafkaConfig.scala # metadata/src/main/java/org/apache/kafka/controller/QuorumController.java # server-common/src/main/java/org/apache/kafka/server/config/ServerConfigs.java * feat(metadata:diskless): implement managed replicas for diskless topics When diskless.managed.rf.enable=true, new diskless topics are created with RF=rack_count using standard KRaft replica placement instead of legacy RF=1. Changes: - Compute RF from rack cardinality via rackCardinality() - Use standard replicaPlacer.place() for rack-aware assignment - Allow manual replica assignments when managed replicas enabled - Add checkstyle suppression for extended createTopic method Phase 1 limitations: - Add Partitions inherits RF from existing partitions (Phase 3) - Transformer not updated, uses legacy routing (Phase 2) - Integration tests deferred to Phase 2 (See #478 docs/inkless/ts-unification/DISKLESS_MANAGED_RF.md) � Conflicts: � metadata/src/main/java/org/apache/kafka/controller/QuorumController.java � metadata/src/main/java/org/apache/kafka/controller/ReplicationControlManager.java � metadata/src/test/java/org/apache/kafka/controller/ReplicationControlManagerTest.java * fixup! feat(controller:diskless): add server config for managed replicas * fixup! feat(metadata:diskless): implement managed replicas for diskless topics (cherry picked from commit 09ba4d1)
…cs (#492) * test(metadata:diskless): improve test coverage for broker fencing and unregister scenarios - Add _noRacks and _withRacks test variants for consistent coverage - Fix tests that assumed broker 0 was always the leader - Get actual leader from partition registration before fencing/unregistering - Use dynamic assertions based on actual partition state - Improve assertion error messages for clarity * feat(controller:diskless): add server config for managed replicas Add diskless.managed.rf.enable config (default: false) to control whether diskless topics use managed replicas with RF=rack_count or legacy RF=1. This config only affects topic creation. When enabled, new diskless topics will be created with one replica per rack using standard KRaft placement. Part of Phase 1: Diskless Managed Replicas (See #478 docs/inkless/ts-unification/DISKLESS_MANAGED_RF.md) # Conflicts: # core/src/main/scala/kafka/server/ControllerServer.scala # core/src/main/scala/kafka/server/KafkaConfig.scala # metadata/src/main/java/org/apache/kafka/controller/QuorumController.java # server-common/src/main/java/org/apache/kafka/server/config/ServerConfigs.java * feat(metadata:diskless): implement managed replicas for diskless topics When diskless.managed.rf.enable=true, new diskless topics are created with RF=rack_count using standard KRaft replica placement instead of legacy RF=1. Changes: - Compute RF from rack cardinality via rackCardinality() - Use standard replicaPlacer.place() for rack-aware assignment - Allow manual replica assignments when managed replicas enabled - Add checkstyle suppression for extended createTopic method Phase 1 limitations: - Add Partitions inherits RF from existing partitions (Phase 3) - Transformer not updated, uses legacy routing (Phase 2) - Integration tests deferred to Phase 2 (See #478 docs/inkless/ts-unification/DISKLESS_MANAGED_RF.md) � Conflicts: � metadata/src/main/java/org/apache/kafka/controller/QuorumController.java � metadata/src/main/java/org/apache/kafka/controller/ReplicationControlManager.java � metadata/src/test/java/org/apache/kafka/controller/ReplicationControlManagerTest.java * fixup! feat(controller:diskless): add server config for managed replicas * fixup! feat(metadata:diskless): implement managed replicas for diskless topics
…cs (#492) * test(metadata:diskless): improve test coverage for broker fencing and unregister scenarios - Add _noRacks and _withRacks test variants for consistent coverage - Fix tests that assumed broker 0 was always the leader - Get actual leader from partition registration before fencing/unregistering - Use dynamic assertions based on actual partition state - Improve assertion error messages for clarity * feat(controller:diskless): add server config for managed replicas Add diskless.managed.rf.enable config (default: false) to control whether diskless topics use managed replicas with RF=rack_count or legacy RF=1. This config only affects topic creation. When enabled, new diskless topics will be created with one replica per rack using standard KRaft placement. Part of Phase 1: Diskless Managed Replicas (See #478 docs/inkless/ts-unification/DISKLESS_MANAGED_RF.md) # Conflicts: # core/src/main/scala/kafka/server/ControllerServer.scala # core/src/main/scala/kafka/server/KafkaConfig.scala # metadata/src/main/java/org/apache/kafka/controller/QuorumController.java # server-common/src/main/java/org/apache/kafka/server/config/ServerConfigs.java * feat(metadata:diskless): implement managed replicas for diskless topics When diskless.managed.rf.enable=true, new diskless topics are created with RF=rack_count using standard KRaft replica placement instead of legacy RF=1. Changes: - Compute RF from rack cardinality via rackCardinality() - Use standard replicaPlacer.place() for rack-aware assignment - Allow manual replica assignments when managed replicas enabled - Add checkstyle suppression for extended createTopic method Phase 1 limitations: - Add Partitions inherits RF from existing partitions (Phase 3) - Transformer not updated, uses legacy routing (Phase 2) - Integration tests deferred to Phase 2 (See #478 docs/inkless/ts-unification/DISKLESS_MANAGED_RF.md) � Conflicts: � metadata/src/main/java/org/apache/kafka/controller/QuorumController.java � metadata/src/main/java/org/apache/kafka/controller/ReplicationControlManager.java � metadata/src/test/java/org/apache/kafka/controller/ReplicationControlManagerTest.java * fixup! feat(controller:diskless): add server config for managed replicas * fixup! feat(metadata:diskless): implement managed replicas for diskless topics (cherry picked from commit 09ba4d1)
Implements Phase 1 of Diskless Managed Replicas (see #478 docs/inkless/ts-unification/DISKLESS_MANAGED_RF.md).
When
diskless.managed.rf.enable=true, new diskless topics are created with RF = rack_count (one replica per rack) using standard KRaft replica placement, instead of legacy RF=1.Changes
diskless.managed.rf.enableserver config (default: false)replicaPlacer.place()for rack-aware assignmentKnown Limitations (Phase 1)
This feature is opt-in only via
diskless.managed.rf.enable=true.Existing clusters and topics are unaffected unless explicitly enabled.
When enabled:
Partition Reassignment: Not yet supported for managed replica topics. Reassignment will hang waiting for replica sync. Will be fixed in follow-up PR.
Transformer: Not updated - uses legacy routing (Phase 2). KRaft metadata shows RF=3, but clients are still routed to any alive broker. Doesn't affect correctness for DISKLESS_ONLY topics.
Add Partitions: Works correctly - inherits RF and uses rack-aware placement.
Observability metrics: Deferred to Phase 2.
Safe to merge because:
falseby defaultFollow-up PRs Sequence
Testing
Configuration
diskless.managed.rf.enablefalse