Skip to content
RKushnir edited this page Apr 21, 2021 · 4 revisions

Retrying failed work items has 2 faces.

  1. Retry a task and keep the scope within the task (in memory)
  2. Retry a task and keep the scope outside (in queue)

note: this article was written due to https://github.com/jondot/sneakers/issues/129#issuecomment-101525527

In Memory

The usage scenario is simple. You called an external API and the network was glitchy, and it failed. Most times, retrying the very same moment, the exact same call should work.

Ideally, the whole retry logic:

  • When to retry
  • How long to wait between retries
  • How many times to retry

Should not be any of your concern. This is a retry policy, that in the case of your HTTP client (the API example) should manage transparently.

See more here:

In Queue

Some exceptional cases are really exceptional. There exists some kind of tasks that once failed, will remain failed until a long time has passed (external API just throttled you), requires manual intervention (database has failed in a spectacular way), and so on.

These cases cannot be solved with in-memory retries. More often than not, if you tried that, you would get a ton of processes, threads churning your systems endlessly. Hopelessly trying to make up failed jobs.

This is where Sneakers' retry module comes in. It uses RabbitMQ's dead-letter exchange which is built for failed jobs specifically. Historically, other queuing solutions needed to "invent" this time and time again.

See more: