feat: 2pc logic for resharding by yogesh1801 · Pull Request #1088 · pgdogdev/pgdog

yogesh1801 · 2026-06-18T12:29:22Z

Use two-phase commit for COPY commands when resharding. This allows for a clean rollback on error.

CLAassistant · 2026-06-18T12:29:38Z

All committers have signed the CLA.

codecov · 2026-06-18T14:38:24Z

Codecov Report

❌ Patch coverage is 32.18391% with 59 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
...src/backend/replication/logical/subscriber/copy.rs	8.88%	41 Missing ⚠️
...src/frontend/client/query_engine/two_pc/manager.rs	12.50%	7 Missing ⚠️
pgdog/src/backend/replication/logical/error.rs	0.00%	6 Missing ⚠️
...end/replication/logical/publisher/parallel_sync.rs	0.00%	5 Missing ⚠️

📢 Thoughts on this report? Let us know!

meskill · 2026-06-18T14:58:29Z

Could you please uncomment the tests in the integration/copy_data/run.sh and make sure you have 2pc enabled for them? We had issues with these tests and I'd like to check if 2pc fixes them

yogesh1801 · 2026-06-18T16:05:01Z

@meskill these tests pass for me even if two_pc flag is disabled

meskill · 2026-06-18T17:18:59Z

@meskill these tests pass for me even if two_pc flag is disabled

yeah, they are just flaky sometimes on ci. don't worry about it, I'll be just checking ci status

levkk · 2026-06-18T17:56:58Z

@meskill are we ok to merge or do we need anything else?

meskill · 2026-06-19T10:41:38Z

+                TwoPcPhase::Phase1 => format!("PREPARE TRANSACTION '{txn}_{shard}'"),
+                TwoPcPhase::Phase2 => format!("COMMIT PREPARED '{txn}_{shard}'"),


this should work rn since the same logic is used for tx name as in the binding

pgdog/pgdog/src/backend/pool/connection/binding.rs

Line 377 in 280798a

let shard_name = format!("{}_{}", name, shard);

but it could diverge if something changed on one side. It's especially important since the rollback is used exactly from the binding.

Can we share some of the logic of the query or tx name creation between here and binding.rs? that'll prevent possible issues

@meskill have added statement.rs for this

meskill · 2026-06-19T11:37:28Z

+            let query = match phase {
+                TwoPcPhase::Phase1 => format!("PREPARE TRANSACTION '{txn}_{shard}'"),
+                TwoPcPhase::Phase2 => format!("COMMIT PREPARED '{txn}_{shard}'"),
+                _ => unreachable!(),


There could be some issues due to the nature of cli calls i.e. when we run pgdog data-sync instead admin command on the running instance:

the Manager::monitor should be running anyway, but we don't wait for it's shutdown in the cli path. That means if we fail the prepared transactions could be left on some shards and they won't be cleaned up.

wal is also is not enabled in cli path, but on running instance it'll be enabled. We can get different behavior depending how exactly the copy command was run.

We probably should unify this and make sure we wait for shutdown and enable wal for the cli path. But that also means we should think how the old transactions should be recovered..

should i take up this as another PR, after thinking about crash recovery

We can call this waiter on shutdown: https://github.com/pgdogdev/pgdog/blob/main/pgdog/src/frontend/client/query_engine/two_pc/manager.rs#L357

meskill · 2026-06-19T11:41:04Z

+        }
+
+        for result in join_all(futures).await {
+            result?;


I'm not sure if it's really a problem, but the commit/rollback for prepared transactions happens inside Manager::monitor handler and it's async to this code meaning we may actually start new attempt before the manager logic is fired and the cleanup could happen in the middle of new copy attempt.

@levkk wdyt? should we care about it?

Maybe we can add a waiter to make sure the transaction is rolled back / committed?

That makes sense. Should we add some code to wait for the transaction to be rolled back?

yeah, something like this, but this should be probably done not just after the error, but when starting the new attempt, so we won't wait twice until the transaction is rolled back and then the sleep timeout between retries. We could wait for manager's queue exhausting, before starting to push data.

@meskill @levkk have added a v1 for this problem, could you review this

Taking a look!

feat: 2pc logic for resharding

3090989

levkk approved these changes Jun 18, 2026

View reviewed changes

levkk requested a review from meskill June 18, 2026 14:29

meskill requested changes Jun 19, 2026

View reviewed changes

yogesh1801 added 3 commits June 21, 2026 17:55

fix: comment for rollback

e6add87

feat: transaction naming moved to another file

d8e60e0

feat: add cleanup wait before retry

0b20dcb

		TwoPcPhase::Phase1 => format!("PREPARE TRANSACTION '{txn}_{shard}'"),
		TwoPcPhase::Phase2 => format!("COMMIT PREPARED '{txn}_{shard}'"),

Conversation

yogesh1801 commented Jun 18, 2026 • edited by levkk Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

CLAassistant commented Jun 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov Bot commented Jun 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

meskill commented Jun 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yogesh1801 commented Jun 18, 2026

Uh oh!

meskill commented Jun 18, 2026

Uh oh!

levkk commented Jun 18, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

levkk Jun 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

yogesh1801 commented Jun 18, 2026 •

edited by levkk

Loading

CLAassistant commented Jun 18, 2026 •

edited

Loading

codecov Bot commented Jun 18, 2026 •

edited

Loading

meskill commented Jun 18, 2026 •

edited

Loading

levkk Jun 22, 2026 •

edited

Loading