/ EXERCISE, PROGRAMMING, STYLE

Exercises in Programming Style, Kwisatz Haderach-style

In previous posts, we had a simple problem to handle - find and sort the 25 most frequent words from a file. Then, we had to comply with different sets of constraints regarding the code.

This week, the constraint is to achieve the goal with the shortest code possible. Of course, this depends a lot on the programming language, and on the API available - whether baked-in or from a standard library.

This is the 3rd post in the Exercises in Programming Style focus series.Other posts include:

  1. Introducing Exercises in Programming Style
  2. Exercises in Programming Style, stacking things up
  3. Exercises in Programming Style, Kwisatz Haderach-style (this post)
  4. Exercises in Programming Style, recursion
  5. Exercises in Programming Style with higher-order functions
  6. Composing Exercises in Programming Style
  7. Exercises in Programming Style, back to Object-Oriented Programming
  8. Exercises in Programming Style: maps are objects too
  9. Exercises in Programming Style: Event-Driven Programming
  10. Exercises in Programming Style and the Event Bus
  11. Reflecting over Exercises in Programming Style
  12. Exercises in Aspect-Oriented Programming Style
  13. Exercises in Programming Style: FP & I/O
  14. Exercises in Relational Database Style
  15. Exercises in Programming Style: spreadsheets
  16. Exercises in Concurrent Programming Style
  17. Exercises in Programming Style: sharing data among threads
  18. Exercises in Programming Style with Hazelcast
  19. Exercises in MapReduce Style
  20. Conclusion of Exercises in Programming Style
The name of this post is a reference found in the Dune book by Frank Herbert, where Kwisatz Haderach means "The Shortening of the Way".

For that, the usage of Kotlin’s stdlib is more than welcome. Without further ado, here’s the code:

fun run(filename: String) = (read(filename)
    .flatMap { it.toLowerCase().split("\\W|_".toRegex()) }
    .filter { it.isNotBlank() && it.length >= 2 }
        - read("stop_words.txt").flatMap { it.split(",") })
    .groupBy { it }
    .map { it.key to it.value.size }
    .sortedBy { it.second }
    .takeLast(25)
    .toMap()

Let’s analyze the previous snippet.

read(filename).flatMap { it.toLowerCase().split("\\W|_".toRegex()) }

Read the file’s content into a list of strings, one element per line. Then, transform each line into lower case, and split those lines along non-word tokens.

filter { it.isNotBlank() && it.length >= 2 }

Remove blank strings, as well as those that have a length of 1.

- read("stop_words.txt").flatMap { it.split(",") }

Read the stop words file, and dispatch its single-string comma-separated list into a list of individual stop words. Remove them from the list of words.

groupBy { it }

Group the words by individual word. The key is the word itself, the value the repeated list of the same word.

map { it.key to it.value.size }

Transform each entry into a pair, the first item being the key, the second item the list’s size.

I believe the rest of the code to be pretty self-explanatory.

There’s an alternative using fold(), which can replace the groupBy() and map() function calls:

.fold(mutableMapOf<String, Int>()) { map, word ->
    map.merge(word, 1) { existing, new -> existing + new }
    map
}.toList()

Fold’s signature - also called reduce in some languages is:

fun <T, R> Iterable<T>.fold(initial: R, operation: (acc: R, T) -> R): R

It requires two parameters:

  1. An initial value of type R
  2. A function that transforms:
    • a R value
    • and the next item in the collection
    • into another R

Here, R is a mutable map. The reason to prefer a mutable data structure over an immutable one is because of the merge() method. This allows to put values in a map, handling the case whether the key already exists or not, in a very concise way.

Conclusion

While previous chapters required to think in a different way, this one requires to know one’s language API and libraries. This is a requirement that I’ve observed in other areas domains: Functional Programming in general, every kind of Reactive Programming API, the Clojure language, etc. This is the shortest I could come with, but I welcome you to add your suggestions in the comments.

The complete source code for this post can be found on Github.
Nicolas Fränkel

Nicolas Fränkel

Developer Advocate with 15+ years experience consulting for many different customers, in a wide range of contexts (such as telecoms, banking, insurances, large retail and public sector). Usually working on Java/Java EE and Spring technologies, but with focused interests like Rich Internet Applications, Testing, CI/CD and DevOps. Also double as a trainer and triples as a book author.

Read More
Exercises in Programming Style, Kwisatz Haderach-style
Share this