Nov 19, 2023 / LANGUAGES, DESIGN, ERRORS HANDLING

A retrospective on Errors Management: where do we go from here?

Error management is a fact of life in software development as it is often inevitable and generated by different causes that also include incorrect or incomplete understanding of the requirements or even lack of knowledge of some tools or elements used during development.

Let’s go on a small trip into the evolutions and different concepts related to error management, analyzing why it is difficult and why we are going in new directions after the moment of Exceptions in programming languages.

Error management philosophy in the ages

In languages such as C and C++, error handling was primarily based on return codes: specific values indicate the success or failure (sentinel value). This approach required the caller to check the return code, which could be ignored explicitly, and the control flow became convoluted due to the numerous checks scattered throughout the code base.

The notion of exception was introduced to address these challenges. The idea was to transfer the responsibility of error handling from the caller to a designated exception handler. When an exceptional situation arises, the runtime system throws an exception, which must be caught and handled by appropriate code. The introduction of already allocated exception handlers, like in Java, made it possible to catch/process Exceptions even if the system runs out of memory.

In recently defined programming languages (i.e.,, Rust, Go, Zig), we return to the dichotomy: exception vs. error. The distinction between recoverable and non-recoverable errors is considered. Unrecoverable ones lead to unmanageable software crashes. Those defined as recoverable return to being values, not primitive types, managed in the standard functionality code. There are no exceptions to throw, the compiler forces handling, and no longer specific scopes to place management code. These values are supported by dedicated constructs and, in some cases, strongly inspired by functional programming.

The transition to the Cloud has also altered the vision of error management. From the concept of robustness, we have increasingly migrated to considerations linked to the resilience of a system. Furthermore, the usefulness of stack traces has been partly questioned: more distributed code, as well as the emphasis on Observability, makes simpler and more timely information necessary compared to the habit of examining deep stack traces.

Why Exceptions aren’t enough?

When we think about using exceptions, there’s the habit of considering different problems to address:

If generated by the runtime, we must catch them to avoid the collapse of the system
Some exceptions become part of the signature of a method or function, also defining an increase in the coupling level to consider
By its nature, the handling of an exception breaks the flow of execution. Without the right design, the specialized handler increases cognitive overload or lowers the readability of the code base.

We can open a parenthesis relating to the widespread adoption of Runtime exceptions compared to other types, especially in general-pose frameworks. It may be a misstep justified by the typical design of the frameworks themselves: operating with decoration or proxying patterns seems a natural choice for the use of Runtime exceptions also if it pushes for a precise knowledge of the framework, increasing the cognitive load.

The need to reduce the exceptions thrown directly by the runtime is addressed today in new languages in the continuous effort to limit low-level error conditions linked to memory usage, concurrency, or I/O. We have safer languages but also the promise of evolution regarding languages such as C++

But is it all the fault of the Exceptions?

What is our role, as software creators, about Errors?

An intriguing sentence comes from Michael Feathers reporting that Errors are just conditions we refuse to take seriously. So, what is the percentage value dedicated to the study error conditions and/or error management?

To understand what is occurring, we should consider that error management is conditioned by macroscopic aspects, that we often don’t think about holistically, such as:

The analysis of requirements and their level of volatility
How we design for software modularity (decoupling and cohesion)
Performance and the impact of error management on it
The development experience, and therefore the readability of the code base, but also the awareness of the context in which it is being produced

Another classification we often don’t notice is the one that sees error management locally or at the edges of a system. Software design continuously alternates a complexity seen locally (a class, for example) or that is seen at the system level or between different systems.

Exceptions are for exceptional cases

When should exceptions be used? The classical answer is: In the presence of exceptional cases! But what does it mean? The sentence is cryptic from a practical point of view, and when it is our duty, it becomes rather complicated.

If we consider Modularity, we should have exceptions strictly linked to the dysfunctions that classes can present within a single module. At the same time, if we assume the collaboration between multiple modules, the errors we wanted to define must be part of something shared between the various modules. It’s necessary to study which are those cases where the stability of an ecosystem is disturbed by invalid states and express them accordingly.

If we consider Performance, we should strive to have a limited number of exceptions thrown: creating and processing Exceptions can be expensive, mainly if they occur too frequently. Defining all error situations only on Exceptions does not consider the performance aspects, as well as increasing the cognitive load factor, and it is for this reason that the advice not to use the exception for the control flow is invoked.

Another hint comes from Bertrand Meyer, who reports to use exceptions when you cannot know whether a call/invocation will succeed or fail. This vision justifies the adoption of a design-by-contract where we consider preconditions, invariants, and post-conditions and where exceptions occur due to their violation. For example, software that delivers a protocol as the target is related to the design-by-contract; the idea of a formal contract between parts, as explained by Joe Armstrong, can be an influential adoption in defining error conditions in distributed systems.

It is worth noting that adopting formal methods to model error conditions is increasing for large-scale system concurrency, and the adoption of TLA+.

A criterion that we should use, but which we often forget, is to understand the usefulness and objectives of the system: why are we creating this software? For example, data entry validation can be considered different from applying business rules: in the first subject, we can avoid exceptions, and in the second subject, we have specific exceptions related to the entities of that domain, and we need to reason about when to stop error management.

Start to migrate: what are good practices for Exceptions?

What we have discussed may be valid in general as well as for exceptions, but we can deduce some general suggestions from the book: A Philosophy of Software Design by John Ousterhout. Summarizing what is proposed by the text, we can distinguish the following cases:

Masking exceptions: encapsulating the management of exceptions where they occur so that there is no propagation and so that possible corrective action takes place promptly
Aggregate the exceptions: in contrast to what was said before, it’s suggested to place the actions that can define similar exceptions together to have blocks of instructions to manage in individual handlers
Not using exceptions: avoiding the use of exceptions means replacing them with return values and with explicit management of the returned values
Eliminate exceptional cases: avoid having conditional logic for special cases, i.e., for those values at the edges of a domain or introduced to make up for a sudden requirement. These can lead to unexpected errors and increase the code base’s cognitive load.

Coordinates to migrate from Exceptions

The last two points start from what we see as a move away from exceptions.

In this context, Michael Feathers indicates the need to extend our domain to avoid the use of exceptions: this means reworking the problem to bypass or limit the presence of errors as much as possible thanks to hints like Tell don’t Ask. As an example, the author reformulates the suggestion from a data point of view, revealing that "asking for data" can fail while providing it, when we have it, cannot!

Source.withData(ID, data->data.someOperation(...))

We can see that error handling is hidden and has a limited scope due to how we operate on the data.

It becomes essential again to have a way to express the error condition without taking control away from the Developer. For this purpose, it is reasonable to rely on the Type System and support for Generics.

Avoiding the use of null both as a return value and as a sentinel value, patterns such as the NullObject, containers such as Optionals, use of Default Values or functions, and encapsulating communications in an event approach are starting points that need to be revised.

Constructs from Functional Programming have been embraced in programming languages (Optional, Either, Try Monad, Railway Oriented Programming, Structural Pattern Matching) where the importance of expressions is highlighted compared to statements.

We can see more of the concepts exposed in the Rust language.

The Result type is not only a Union Type but supports a series of methods that allow error management to be expressed as a Fluent API and to transform the Result outcome in an Optional or using mapping function. An interesting aspect is that there is a compiler enforcement on this type that needs to be managed to avoid compilation warnings.

io::stdin().read_line()
           .ok() // We have the new line
           .expect("Failed to read a line!"); // no new line so we crash with a message

However, to avoid having too much code, the Rust syntax allows you to take advantage of specialized operators to synthesize error expressions, It is also possible to have structured pattern-matching to manage unions or closures to specialize or nest the processing.

let greeting_file_result = File::open("hello.txt");

let greeting_file = match greeting_file_result{
    Ok(file) => file,
    Err(error) => panic!("Problem opening the file: {:?}", error),
};

Compiler support in the case of Rust is also an essential ingredient for error propagation mechanisms and for defining code optimized for example, the presence or absence of stack traces.

No Exception at all?

In everyday life, there may be situations where interrupting the flow of an execution can be helpful. It is interesting what is suggested in the Rust documentation, talking about the use of the panic macro (which defines a software crash):

it is acceptable in defining code examples, prototype code, and tests
it is acceptable when your code can lead to a "bad state". From a design-by-contract perspective, an invalid state occurs when some assumptions, guarantees, contracts, or invariants have been broken: invalid, contradictory, or missing values are passed to the code.
it is acceptable for security reasons due to the inappropriate state of the program.

Conclusion

Error management requires attention and care equal to that used to define business logic. It is a complex task, but many reference points and different conceptual tools exist. Leaving aside trends and partisan discussions, it is better to consider the evolution of the design of error management because they are supported or because they can be adopted by one’s language, paying attention to the limits of one’s domain and rationalizing the effort in favor of the actual usefulness of the own software.

To go further:

Follow @nicolas_frankel