Skip to content

HDDS-11898. Design document for leader side execution#10503

Open
spacemonkd wants to merge 2 commits into
apache:masterfrom
spacemonkd:HDDS-11898
Open

HDDS-11898. Design document for leader side execution#10503
spacemonkd wants to merge 2 commits into
apache:masterfrom
spacemonkd:HDDS-11898

Conversation

@spacemonkd

Copy link
Copy Markdown
Contributor

What changes were proposed in this pull request?

HDDS-11898. Design document for Leader Side Execution framework

Please describe your PR in detail:

  • This PR adds the design document for Leader Side Execution Framework

Based on the proposal by @kerneltime and earlier work by @sumitagrawl on PR #7583

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-11898

How was this patch tested?

N/A

@spacemonkd spacemonkd self-assigned this Jun 12, 2026
@spacemonkd spacemonkd added design om-pre-ratis-execution PRs related to https://issues.apache.org/jira/browse/HDDS-11897 labels Jun 12, 2026
@spacemonkd spacemonkd marked this pull request as ready for review June 14, 2026 06:05
@ivandika3

ivandika3 commented Jun 15, 2026

Copy link
Copy Markdown
Contributor

Please kindly help to research at other production Raft-based / Paxos-based systems (e.g. CockroachDB, TiKV (TiDB), YugabyteDB, ScyllaDB, Cassandra, RheaKV) with respect on how they separate the transaction processing and the state machine replication since these systems should have mature concurrency model while remaining performant. Also it is also good to review some research papers and literatures for this. Based on these systems and literatures, we can inform the way we would evolve the OM architecture.

The leader-side execution requires a more robust (and possibly complex) concurrency model (similar to RDBMS concurrency control) since most write transaction is not going to be serialized in Raft single threaded applier and Ratis is going to be used only for DB mutation replication. From my understanding now is that we need to implement more RDMBS related transaction logic like transaction commit, rollback, etc. However, I'm not sure whether the community has experience in this (e.g. implementing RDBMS MVCC) before.

Also previously @kerneltime and @sumitagrawl worked on this, have you discussed with them regarding the current status and blockers? What is the difference between this and their plan? I'm a bit confused about the status of this feature.

My main point is that we should make sure and convince ourselves that this is a correct decision and make the design a lot more fleshed out (e.g. more detail implementations, how to do concurrency tests to ensure that consistency is not affected). Changing the OM transaction processing framework is a very heavy and time-consuming implementation, so it's better if we can do it right straightaway rather than having future OM transaction processing framework rework again. Personally, I have been wanting to study more on RDBMS concurrency control by implementing https://15445.courses.cs.cmu.edu/fall2025/assignments.html so that I can have the correct fundamentals for evaluate the design.

@ivandika3 ivandika3 requested a review from amaliujia June 15, 2026 06:29
@spacemonkd

Copy link
Copy Markdown
Contributor Author

Thanks @ivandika3 for the detailed insights.
I agree that we should do it once and do it right. I'm going to study the implementation of others using Raft/Paxos and see what they are doing. Probably Claude can help me extract their high level design.

As for this design vs the earlier design from @sumitagrawl and @kerneltime - it isn't different, the older PRs were closed as they didn't have activity, I am planning to pick up the same work and dedicate some time to research and implement these changes. I'll be connecting with them as well to get their inputs and ideas.

Thank you again for the detailed inputs, I really appreciate it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

design om-pre-ratis-execution PRs related to https://issues.apache.org/jira/browse/HDDS-11897

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants