Made initial table synchronization parallel. #198

insafk · 2018-11-28T10:15:57Z

Made initial table synchronization parallel.
(at pglogical_apply.c:1802 in process_syncing_tables())
If no worker slots are available to launch bgworker, the process just throws WARNING, and retries again after some time, rather than throwing ERROR and exiting.
(at pglogical_worker.c:142 and line 207 in pglogical_worker_register())
Up to max_sync_workers_per_subscription sync workers will be created per subscription, and that many tables will be sync-ed parallelly.
(at pglogical_worker.c:124 in pglogical_worker_register())
During initial table data copying, we used to create one more backend connection to the target database, with the backend in COPY FROM stdin mode, and the bgworker used to route CopyData from "COPY TO" output from publisher to this backend. Now, this extra backend connection code has been removed. Instead, the sync worker directly writes into the underlying target database. This sync worker is also in replication mode, meaning that FK violation constraint triggers won't get called, and this allows us to sync each table in any arbitrary order.
(at pglogical_sync.c:516 in copy_table_data())
Now the subscription init code just gets the list of tables from the subscribed replication_set and add those table's meta into the local meta and mark them as in INIT state. Then the apply process spawns sync workers for each non-READY tables. In contrast, in the old code, the subscription init process used to get the list of tables and copied them sequentially.
(at pglogical_sync.c:622 in copy_replication_sets_data() and pglogical_apply.c:1792 in process_syncing_tables())
In the previous code, sync worker in CATCHUP state doesn't exit even if it has replayed up to the required LSN. It exits only after receiving at least one logical change (WAL) from the publisher, because the exit code is written only in handle_commit() function. Now the same code(after little modification) from handle_commit() is copied into the process_syncing_tables() as well. And this process_syncing_tables() will get called periodically, and within that, the sync process exits if it has caught up with the apply process.
(at pglogical_apply.c:1808 in process_syncing_tables())

Initial table synchronization is done parallelly by multiple sync workers. Some more improvements were made which is explained below. If no worker slots are available to launch bgworker, the process just throws WARNING, and retries again after some time, rather than throwing ERROR and exiting. Up to max_sync_workers_per_subscription sync workers will be created per subscription, and that many tables will be sync-ed parallelly. During initial table data copying, we used to create one more backend connection to the target database, with the backend in COPY FROM stdin mode, and the bgworker used to route CopyData from "COPY TO" output from publisher to this backend. Now, this extra backend connection code has been removed. Instead, the sync worker directly writes into the underlying target database. This sync worker is also in replication mode, meaning that FK violation constraint triggers won't get called, and this allows us to sync each table in any arbitrary order. Now the subscription init code just gets the list of tables in the subscribed replication_set and add those table's meta in INIT state. Then the apply process spawns sync workers for each non-READY tables. In contrast, in the old code, the subscription init process used to get the list of tables and copied them sequentially. In the previous code, sync worker in CATCHUP state doesn't exit even if it has replayed up to the required LSN. It exits only after receiving at least one logical change (WAL) from the publisher, because the exit code is written only in handle_commit() function. Now the same code(after little modification) from handle_commit() is copied into the process_syncing_tables() as well. And this process_syncing_tables() will get called periodically, and within that, the sync process exits if it has caught up with the apply process.

ringerc · 2018-12-11T04:25:50Z

Thanks very much for reaching out.

Sync is astonishingly difficult to get correct and race-free even when running one at a time. Parallel sync has mostly not been implemented for time reasons, because it can increase the risk of deadlocks, and because we've been unsure how to most effectively choose a level of parallelism that improves performance without overloading the system.

There's also another issue that I don't think can be solved sensibly without core postgres changes: managing session replication origins. Right now, a replication origin combines two quite distinct jobs. It keeps track of the remote lsn that corresponds to the current local committed position, so we don't replay a remote change twice or miss one. But it also gets recorded alongside the commit timestamp to identify the node the transaction was replicated from. There's no way to set one differently to the other in a race-free, crash-safe manner.

When you made the change "Instead, the sync worker directly writes into the underlying target database" you broke pglogical's defense against that issue. The replication origin IDs for the sync commits will now differ from commits made during normal streaming. That breaks conflict detection, including the conflict detection we do when dealing with sync overlapping concurrent streamed changes.

Sync gets dramatically harder when you're dealing with concurrent write activity on the tables being synced.

I don't think I'll be able to merge the work directly. In addition to the replication origin issues, pglogical3 (not yet open source) has been heavily restructured and the sync process has changed a lot. We don't expect to release a pglogical 2.3 in which we could add a major feature like parallel sync on the pglogical2 back-branch. I realise that's frustrating when you can't get your hands on the current development trunk, and I don't have any answers for about when it will be open source.

ringerc · 2018-12-14T07:46:49Z

Hm, I got email notification that you commented, but nothing here.

To be honest I don't remember where the deadlocks can occur, only that they can in a simple resync + other writes under load.

In any case, the replication origin issue is a showstopper for this approach. I cannot, and will not, merge this change. I'm trying to work on a patch for core postgres that'd make it possible at the moment though.

I'm glad you've patched it to meet your needs.

ringerc and others added 2 commits November 1, 2018 13:37

Extend colfilter and repset tests

2f3ec2a

ringerc force-pushed the REL2_x_STABLE branch from 7fe5912 to 3e8585a Compare December 14, 2018 02:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Made initial table synchronization parallel. #198

Made initial table synchronization parallel. #198

insafk commented Nov 28, 2018

ringerc commented Dec 11, 2018

ringerc commented Dec 14, 2018

Made initial table synchronization parallel. #198

Are you sure you want to change the base?

Made initial table synchronization parallel. #198

Conversation

insafk commented Nov 28, 2018

ringerc commented Dec 11, 2018

ringerc commented Dec 14, 2018