-
Notifications
You must be signed in to change notification settings - Fork 155
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Made initial table synchronization parallel. #198
base: REL2_x_STABLE
Are you sure you want to change the base?
Conversation
Initial table synchronization is done parallelly by multiple sync workers. Some more improvements were made which is explained below. If no worker slots are available to launch bgworker, the process just throws WARNING, and retries again after some time, rather than throwing ERROR and exiting. Up to max_sync_workers_per_subscription sync workers will be created per subscription, and that many tables will be sync-ed parallelly. During initial table data copying, we used to create one more backend connection to the target database, with the backend in COPY FROM stdin mode, and the bgworker used to route CopyData from "COPY TO" output from publisher to this backend. Now, this extra backend connection code has been removed. Instead, the sync worker directly writes into the underlying target database. This sync worker is also in replication mode, meaning that FK violation constraint triggers won't get called, and this allows us to sync each table in any arbitrary order. Now the subscription init code just gets the list of tables in the subscribed replication_set and add those table's meta in INIT state. Then the apply process spawns sync workers for each non-READY tables. In contrast, in the old code, the subscription init process used to get the list of tables and copied them sequentially. In the previous code, sync worker in CATCHUP state doesn't exit even if it has replayed up to the required LSN. It exits only after receiving at least one logical change (WAL) from the publisher, because the exit code is written only in handle_commit() function. Now the same code(after little modification) from handle_commit() is copied into the process_syncing_tables() as well. And this process_syncing_tables() will get called periodically, and within that, the sync process exits if it has caught up with the apply process.
Thanks very much for reaching out. Sync is astonishingly difficult to get correct and race-free even when running one at a time. Parallel sync has mostly not been implemented for time reasons, because it can increase the risk of deadlocks, and because we've been unsure how to most effectively choose a level of parallelism that improves performance without overloading the system. There's also another issue that I don't think can be solved sensibly without core postgres changes: managing session replication origins. Right now, a replication origin combines two quite distinct jobs. It keeps track of the remote lsn that corresponds to the current local committed position, so we don't replay a remote change twice or miss one. But it also gets recorded alongside the commit timestamp to identify the node the transaction was replicated from. There's no way to set one differently to the other in a race-free, crash-safe manner. When you made the change "Instead, the sync worker directly writes into the underlying target database" you broke pglogical's defense against that issue. The replication origin IDs for the sync commits will now differ from commits made during normal streaming. That breaks conflict detection, including the conflict detection we do when dealing with sync overlapping concurrent streamed changes. Sync gets dramatically harder when you're dealing with concurrent write activity on the tables being synced. I don't think I'll be able to merge the work directly. In addition to the replication origin issues, pglogical3 (not yet open source) has been heavily restructured and the sync process has changed a lot. We don't expect to release a pglogical 2.3 in which we could add a major feature like parallel sync on the pglogical2 back-branch. I realise that's frustrating when you can't get your hands on the current development trunk, and I don't have any answers for about when it will be open source. |
7fe5912
to
3e8585a
Compare
Hm, I got email notification that you commented, but nothing here. To be honest I don't remember where the deadlocks can occur, only that they can in a simple resync + other writes under load. In any case, the replication origin issue is a showstopper for this approach. I cannot, and will not, merge this change. I'm trying to work on a patch for core postgres that'd make it possible at the moment though. I'm glad you've patched it to meet your needs. |
Made initial table synchronization parallel.
(at pglogical_apply.c:1802 in process_syncing_tables())
If no worker slots are available to launch bgworker, the process just throws WARNING, and retries again after some time, rather than throwing ERROR and exiting.
(at pglogical_worker.c:142 and line 207 in pglogical_worker_register())
Up to max_sync_workers_per_subscription sync workers will be created per subscription, and that many tables will be sync-ed parallelly.
(at pglogical_worker.c:124 in pglogical_worker_register())
During initial table data copying, we used to create one more backend connection to the target database, with the backend in COPY FROM stdin mode, and the bgworker used to route CopyData from "COPY TO" output from publisher to this backend. Now, this extra backend connection code has been removed. Instead, the sync worker directly writes into the underlying target database. This sync worker is also in replication mode, meaning that FK violation constraint triggers won't get called, and this allows us to sync each table in any arbitrary order.
(at pglogical_sync.c:516 in copy_table_data())
Now the subscription init code just gets the list of tables from the subscribed replication_set and add those table's meta into the local meta and mark them as in INIT state. Then the apply process spawns sync workers for each non-READY tables. In contrast, in the old code, the subscription init process used to get the list of tables and copied them sequentially.
(at pglogical_sync.c:622 in copy_replication_sets_data() and pglogical_apply.c:1792 in process_syncing_tables())
In the previous code, sync worker in CATCHUP state doesn't exit even if it has replayed up to the required LSN. It exits only after receiving at least one logical change (WAL) from the publisher, because the exit code is written only in handle_commit() function. Now the same code(after little modification) from handle_commit() is copied into the process_syncing_tables() as well. And this process_syncing_tables() will get called periodically, and within that, the sync process exits if it has caught up with the apply process.
(at pglogical_apply.c:1808 in process_syncing_tables())