/ JAVA 8, LAMBDA, COLLECTOR

Custom collectors in Java 8

Among the many features available in Java 8, streams seem to be one of the biggest game changers regarding the way to write Java code. Usage is quite straightforward: the stream is created from a collection (or from a static method of an utility class), it’s processed using one or many of the available stream methods, and the collected back into a collection or an object. One generally uses one of the static method that the Collectors utility class offers:

  • Collectors.toList()
  • Collectors.toSet()
  • Collectors.toMap()
  • etc.

Sometimes, however, there’s a need for more. The goal of this post is to describe how to achieve that.

The Collector interface

Every one of the above static methods returns a Collector. But what is a Collector? The following is a simplified diagram:

class diagram
Interface From the JavaDocs

Supplier

Represents a supplier of results. There is no requirement that a new or distinct result be returned each time the supplier is invoked.

BiConsumer

Represents an operation that accepts two input arguments and returns no result. Unlike most other functional interfaces, BiConsumer is expected to operate via side-effects.

Function

Represents a function that accepts one argument and produces a result.

BinaryOperator

Represents an operation upon two operands of the same type, producing a result of the same type as the operands. This is a specialization of BiFunction for the case where the operands and the result are all of the same type.

The documentation of each dependent interface doesn’t tell much, apart from the obvious. Looking at the Collector documentation yields a little more:

A Collector is specified by four functions that work together to accumulate entries into a mutable result container, and optionally perform a final transform on the result. They are:

  • creation of a new result container (supplier())
  • incorporating a new data element into a result container (accumulator())
  • combining two result containers into one (combiner())
  • performing an optional final transform on the container (finisher())

The Stream.collect() method

The real insight comes from the Stream.collect() method documentation:

Performs a mutable reduction operation on the elements of this stream. A mutable reduction is one in which the reduced value is a mutable result container, such as an ArrayList, and elements are incorporated by updating the state of the result rather than by replacing the result. This produces a result equivalent to:

R result = supplier.get();
for (T element : this stream)
     accumulator.accept(result, element);
return result;

Note the combiner() method is not used - it is only used within parallel streams, and for simplification purpose, will be set aside for the rest of this post.

Examples

Let’s have some examples to demo the development of custom collectors.

Single-value example

To start, let’s compute the size of a collection using a collector. Though not very useful, it’s a good introduction. Here are the requirements for the 4 interfaces:

  1. Since the end result should be an integer, the supplier should probably also return some kind of integer. The problem is that neither int nor Integer are mutable, and this is required for the next step. A good candidate type would be MutableInt from Apache Commons Lang.
  2. The accumulator should only increment the MutableInt, whatever the element in the collection is.
  3. Finally, the finisher just returns the int value wrapped by the MutableInt.

Source is available on Github.

Grouping example

The second example shall be more useful. From a collection of strings, let’s create a Apache Commons Lang multi-valued map:

  • The key should be a char
  • The corresponding values should be the strings that start with this char
    1. The supplier is pretty straightforward, it returns a MultiValuedMap instance
    2. The accumulator just calls the put method from the multi-valued map, using the above "specs"
    3. The finisher returns the map itself

Source is available on Github.

Partitioning example

The third example matches a use-case I encountered this week: given a collection and a predicate, dispatch elements that match into a collection and elements that do not into another.

  1. As the supplier returns a single instance, a new data structure e.g. DoubleList should first be designed
  2. The accumulator must be initialized with the predicate, so that the accept() contract method signature is the same.
  3. As for the above example, the finisher should return the DoubleList itself

Source is available on Github.

Final consideration

Developing a custom collector is not that hard, provided one understands the basic concepts behind it.

The real issue behind collectors is the whole Stream API. Streams need to be created first and then collected afterwards. Newer languages, with Functional Programming paradigm designed from the start - such as Scala or Kotlin, provide collections with such capabilities directly backed-in.

For example, to filter out something from a map in Java:

map.entrySet().stream()
        .filter( entry -> entry.getKey().length() == 4)
        .collect(Collectors.toMap(Map.Entry::getKey, Map.Entry::getValue));

That would translate as the following in Kotlin:

map.entries.filter { it.key.length == 4 }
Nicolas Fränkel

Nicolas Fränkel

Developer Advocate with 15+ years experience consulting for many different customers, in a wide range of contexts (such as telecoms, banking, insurances, large retail and public sector). Usually working on Java/Java EE and Spring technologies, but with focused interests like Rich Internet Applications, Testing, CI/CD and DevOps. Also double as a trainer and triples as a book author.

Read More
Custom collectors in Java 8
Share this