Error-handling #269

arnauorriols · 2022-07-08T14:52:03Z

Conceptual Design

Error handling is too often an afterthought in the design of a library. Rust provides a magnificent syntax to remove the noise of error-handling from the happy path. However, alongside all the benefits this syntax has, it also promotes the mindset that errors should be dealt with at a later moment. Which in general is good for productivity, but carries the danger of delaying this moment until it's too late for a conscious and holistic design.

Designing the error-handling of a library is much more difficult than we usually credit. As can be seen in various examples of libraries similar to Streams [1] [2], one of the primary issues is in the mindset when designing the errors of the library: they are designed from a bottom-up perspective. They answer the question of “how can I wrap this low-level error so that I can forward it downstream”, strongly incentivized by the question-mark syntax.

In design, we know that there are no hard truths. Design by inertia is dangerous and one should always be compelled to revaluate past assumptions, and question their applicability in an specific context. Let’s first revisit the foundations of a good error-handling design, and how do we want to apply them to Streams.

The purpose of errors

When designing a library, and specifically its error-handling, the first premise that we need to keep in mind is that they are the artefact that a user of the library will receive when something does not go as planed. The errors have 2 main purposes:

log errors for tracing
handle errors for business logic

For tracing, the most important quality of errors is that they are expressive and specific. There's nothing more annoying than a log that says "an error occurred during fetching the message. Try again later". It is equally annoying to have 10 lines of logs explaining "an error occurred during fetching: Error in data layer: [...] warning, something has failed: [404] Error 404: document not found". We need to strive for error messages that provide information as much concrete as possible, as much condensed as possible, always remembering that they are supposed to be read by a human at some point, and she is supposed to intuitively know how to act on them. As per structure of the error, the tracing purpose does not have any particular need for structure; we could be returning strings with the error and be done with it.

Structure becomes important when we consider errors as a mechanism for application engineers to implement business logic around them. And this is the main question that the structure should try to answer: “what business logic will the application engineer want to trigger in each particular error case?”

In a network communication protocol like Streams, we can classify the different errors by the action that the user should do upon encountering them:

Retry the operation again without changing anything
Correct the data being sent and retry the operation
Correct the environment and retry the operation
Desist from executing the operation

Regardless of how the Error enum is structured, when a operation returns an error, the user will match against it in an attempt to decide which of these 4 actions should do. Therefore, is of outmost importance that the enum projects this classification as directly as possible.

Error model

In Streams the operations that can fail are:

Send a message
Fetch a message
Create a stream
Connect to a stream
Create a branch
Change the permissions of a branch

To each of these operations, the following model tries to project the 4 different actions described above:

operation \ action	Retry the operation again without changing anything	Correct the data being sent and retry the operation	Correct the environment and retry the operation	Desist from executing the operation
Send a message	transient network failure	payload too big	User has not created or received the announcement message of the stream	User has readonly permission over this topic user does not have an identity
Fetch a message	transient network failure	NA	User has not created or received the announcement message of the stream spongos state of linked message not in store	message not found user is not allowed to read from this topic message data cannot be unwrapped message signature is not valid message data is not compatible with this version of streams
Create a stream	transient network failure	Stream with this topic already exists (should this be upserting instead?)	?	User does not have an identity
Create a branch	transient network failure	Branch with this topic already exists (should this be upserting instead?)	User has not created or received the announcement message of the stream	User does not have an identity user has readonly permission over the parent topic
Change the permissions of a branch	transient network failure	Unknown PSK	User has not created or received the announcement message of the stream	Use does not have an identity User is not admin of the branch

Enum

With the pervious model in mind, the proposed Error enum squeleton could look something like this (variant fields to be validated and possibly extended during implementation) :

enum Error {
    NetworkFailure(String, Box<dyn Error>),
    DataError(String),
    SetupError(String),
    PermissionError(String),
    MessageNotFound(String),
    FatalError(String),
}

The text was updated successfully, but these errors were encountered:

arnauorriols · 2022-07-11T11:13:25Z

Self-note: make sure the classification is granular enough to be able to store data within each Error variant that can be relevant for the error-handling logic

kwek20 assigned arnauorriols Jul 28, 2022

arnauorriols removed their assignment May 30, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error-handling #269

Error-handling #269

arnauorriols commented Jul 8, 2022 •

edited

Loading

arnauorriols commented Jul 11, 2022

Error-handling #269

Error-handling #269

Comments

arnauorriols commented Jul 8, 2022 • edited Loading

Conceptual Design

The purpose of errors

Error model

Enum

arnauorriols commented Jul 11, 2022

arnauorriols commented Jul 8, 2022 •

edited

Loading