Bug: Race Condition in WorkflowSweeper Leading to Inconsistent Workflow States #213

rq-dbrady · 2024-07-18T15:02:30Z

Describe the bug
We have identified a race condition in the WorkflowSweeper class, which causes workflows to be in inconsistent states across different threads. This issue is critical as it affects the reliability and correctness of workflow execution and completion checks.

Details
Conductor version: 3.17
Persistence implementation: Postgres,Opensearch
Queue implementation: RedisCluster
Lock: Redis

Steps to Reproduce:
Deploy the application with at least 30 replicas in a Kubernetes environment.
Use a high sweeper rate of about 25ms and a high thread count.
Use a Redis cluster with Redis lock for workflow execution.
Execute workflows at a rate of approximately 75-90 workflows per second.
Monitor the state of workflows and observe for inconsistencies.

Observed Behavior
Workflows are fetched from executionDaoFacade before acquiring a lock.
The verifyAndRepair method mutates the workflow state without proper synchronization.
The workflow lock is released before the workflow is removed from the queue.
These conditions create a time window of roughly 50µ to 100µ seconds where a workflow can be in two states concurrently on different threads.
Workflow listeners or completion checks may fail as a result, with workflows erroneously marked as "Running" even after triggering the finish.

Expected Behavior
Workflows should maintain consistent states across all threads.
Proper locking should be enforced to prevent state mutations without synchronization.
Workflow locks should only be released after the workflow is securely removed from the queue.

Screenshots

v1r3n · 2024-07-26T18:19:58Z

Hi @rq-dbrady we are investigating.

rq-dbrady mentioned this issue Jul 18, 2024

Fix Issue: Ensure proper locking in WorkflowSweeper to prevent race conditions #214

Merged

6 tasks

v1r3n self-assigned this Jul 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug: Race Condition in WorkflowSweeper Leading to Inconsistent Workflow States #213

Bug: Race Condition in WorkflowSweeper Leading to Inconsistent Workflow States #213

rq-dbrady commented Jul 18, 2024 •

edited

Loading

v1r3n commented Jul 26, 2024

Bug: Race Condition in WorkflowSweeper Leading to Inconsistent Workflow States #213

Bug: Race Condition in WorkflowSweeper Leading to Inconsistent Workflow States #213

Comments

rq-dbrady commented Jul 18, 2024 • edited Loading

v1r3n commented Jul 26, 2024

rq-dbrady commented Jul 18, 2024 •

edited

Loading