This post includes affiliate links; I may receive compensation if you purchase the book from the different links provided in this post.
This review is about Event Streams in Action by Alexander Dean and Valentin Crettaz from Manning.
Between 2009 and 2013, I published ten book reviews on this blog. And since then, nothing. Reading a book is a huge commitment, not to mention the review.
During the lockdown, Manning approached me for a "partnership opportunity". In general, I turn down such offers. But I already bought and read books from Manning in the past: they range from above-average to good reads.
I proposed to amend the deal like this: Manning sends a book of my choice for free and I write an honest review. The publisher has a chance to approve it or to discard it. If it’s approved, then I don’t change a single comma.
- 11 chapters, $26.99
- As the name implies, the book is about Event Streams
Here’s a rapid sum-up of each chapter:
- Explores the concept behind Event Streams and their unbounded nature
- Describes the properties of a unified log: unique, append-only, distributed, and ordered
- Introduces Apache Kafka
- Introduces Amazon Kinesis
- Describes stateful stream processing and illustrates it with an Apache Samza use-case
- What happens when code tries to process data it was not meant to? Presents schemas and describes Apache Avro in detail
- Archives events in Kafka, with Pinterest Secor
- Defines "railway-oriented programming", an approach to elegantly handle failures during processing by modeling them as events
- Defines the difference between events and commands
- Introduces event streams in analytics. Describes analytics-on-read i.e. dumps all events directly into a datastore
- Describes the other part, analytics-on-write where events are processed before being stored
Pros and cons
On the plus side, I liked the following items:
- Schemas: the reasons why you should use them - and more importantly, the different approaches to implement them
- Archiving events: once you have processed events, what if you need to archive them in long-term storage? What are the tools available?
- Error handling: in stream processing, you should handle errors differently than in traditional applications. A pipeline runs indefinitely. Hence, you need to differentiate between recoverable errors and non-recoverable errors. The section enumerates the different options.
- Commands and events: some events in the pipeline describe state and others describe actions to execute
- The book has a lot of illustrations: this helps a lot the understanding!
On the flip side:
- The book tries to cover a lot of different concepts, approaches, techniques, and tools around the world of stream processing. The goal is commendable but it’s hard to fit everything into a single book
- The book showcases different languages in the code samples: Java, Scala, and Python. While different tools require different languages, both Java and Scala produce bytecode and run on the JVM. I believe using a subset of them would have made the understanding of the sample easier
- The book uses two different fictional companies for use-cases and switches between them throughout the book
- The use-cases make use of a lot of different tools: Apache Kafka, Apache Kinesis, Apache Samza, Apache Hadoop Yarn, Apache Avro, Pinterest Secor, Apache Spark, Amazon Redshift, Amazon DynamoDB, AWS Lambda, … Because of that, the explanations about each of them is limited
Event Streams in Action offers a lot of interesting content. For beginners, it will provide fundamental knowledge but they will probably be at loss getting all the finer points from the use-cases. Already experienced practitioners will benefit from the code examples. Yet, the book tries to cover too much and could be improved by separating between the two personas and expanding the content toward each of them.