/ EXERCISE, PROGRAMMING, STYLE

Exercises in Programming Style: FP & I/O

In this week in Exercises in Programming Style’s post, we will explore one of the foundation of Functional Programming, I/O and how to cope with it.

This is the 13th post in the Exercises in Programming Style focus series.Other posts include:

  1. Introducing Exercises in Programming Style
  2. Exercises in Programming Style, stacking things up
  3. Exercises in Programming Style, Kwisatz Haderach-style
  4. Exercises in Programming Style, recursion
  5. Exercises in Programming Style with higher-order functions
  6. Composing Exercises in Programming Style
  7. Exercises in Programming Style, back to Object-Oriented Programming
  8. Exercises in Programming Style: maps are objects too
  9. Exercises in Programming Style: Event-Driven Programming
  10. Exercises in Programming Style and the Event Bus
  11. Reflecting over Exercises in Programming Style
  12. Exercises in Aspect-Oriented Programming Style
  13. Exercises in Programming Style: FP & I/O (this post)
  14. Exercises in Relational Database Style
  15. Exercises in Programming Style: spreadsheets
  16. Exercises in Concurrent Programming Style
  17. Exercises in Programming Style: sharing data among threads
  18. Exercises in Programming Style with Hazelcast
  19. Exercises in MapReduce Style
  20. Conclusion of Exercises in Programming Style

Purity, side-effects, and I/O

In FP, functions should be pure: a pure function’s return value must only depend on its input parameters. Once a function is pure, it also has referential transparency:

Referential transparency means that an expression can be replaced with its value. This requires that the expression is pure, that is to say the expression must be deterministic and side-effect free.

This works well in theory. As per the definition above, side-effects make functions "impure". Example of side effects include:

  • Reading the standard in
  • Reading from a file
  • Writing to the standard out
  • Writing to a file
  • etc.

When functions are pure, they can be composed. For example, here’s a sample of functions composed together:

A sample of functions composed together

Because of referential transparency, the previous design can be changed to the following design:

An equivalent sample of functions composed together

Let’s repeat it again. For that to work, every function involved needs to be pure: it should always return the same value when a specific parameter is passed, and be free of side-effects.

However, the real-world code is riddled with side-effects. For example, programs read parameters e.g. read from the command-line, do some transformation, and output the results e.g. write to the database. To cope with that, FP applied in day-to-day programming pushes I/O at the boundaries of the functions' "chain".

A real-world sample of functions composed together

Applying the theory

The original Python code reads from the standard input, and prints the results on the standard output. Compared to this, in all previous exercises, we already updated the design for automated testing purposes.

Function Description Pure

extractWords()

Read the words from a file

removeStopWords()

Read the stop words from a file and filter them out from the words list

frequencies()

Transform the words list into a frequencies map

sorted()

Transform the frequencies map into a list and sort the list by frequency

top()

Keep the top 25 words and transform the frequencies back into a list

Functions composition applied to the word frequencies problem

One can improve the design by pushing the I/O further: the removeStopWords() function is impure because it reads stop words from a file, a strong side-effect. It can be broken into an impure extracting part, and a pure removing part.

Better functions composition applied to the word frequencies problem

Higher-Order functions: purity and laziness

Yet, functions that are at the boundaries are still impure. One straightforward way to achieve purity across the whole chain is to wrap every function…​ in a function. That’s what Higher-Order functions are for. When a function always returns the same function, it’s pure by definition.

Moreover, using higher-order functions allows for laziness: it’s very cheap to pass computations around. Executing such a computation can be deferred until absolutely necessary - or not all, if not.

For the exercise, let’s model a simple list of functions of type (Any) → Any. There should be two features available:

  • One to add a function at the end of the list
  • One to execute the functions' pipeline
class Quarantine(private val filename: String) {

  private var functions = listOf<(Any) -> Any>()

  fun bind(function: Function1<*, Any>): Quarantine {
    functions = functions + (function as (Any) -> Any)               (1)
    return this
  }

  fun execute(): Any {
    tailrec fun internalExecute(functions: List<Function1<*, Any>>, result: Any): Any =
      if (functions.isEmpty()) result                                (2)
      else {
        val function = functions.first()
        internalExecute(functions - function, (function as (Any) -> Any)(result))
      }
    return internalExecute(functions, filename)
  }
}
1 Do nothing but store the computation
2 Actually kickstart the whole computation pipeline using recursion

This class is generic enough to be relevant in different contexts. Functions themselves are pretty straightforward. Here’s one as an example:

fun frequencies(): (List<String>) -> Map<String, Int> = { words ->
  words.groupingBy { it }
       .eachCount()
}

Handling different function signatures

The above Quarantine class binds any function that accepts a single parameter and returns a value. With that design, it’s possible to add the extractWords() function, but not the extractStopWords() one:

fun extractWords(): (String) -> List<String> = {                         (1)
  read(it)
    .flatMap { it.toLowerCase().split("\\W|_".toRegex()) }
    .filter { it.isNotBlank() && it.length >= 2 }
}

fun extractStopWords(): () -> List<String> = { read("stop_words.txt") }  (2)
1 Accepts a parameter, and conforms to (Any) → Any
2 Doesn’t accept a parameter

Even worse, those functions shouldn’t be added one by one to the pipeline, because they are not meant to be called sequentially. Hence, they should be first "merged", and added as such to the beginning of the composition chain. This requires the following code:

class Quarantine(private val filename: String,
                         extractWords: (String) -> List<String>,
                     extractStopWords: () -> List<String>) {

  private var functions: List<(Any) -> Any>

  init {
    functions = arrayListOf({it: String ->                    (1)
      extractWords.invoke(it) to extractStopWords.invoke()
    } as (Any) -> Any)
  }
  // Same as above
}
1 Compose a function of type (String) → List<String> and a function of type () → List<String> into a function of type (String) → Pair<List<String>, List<String>>

Conclusion

Purity is one of the foundations of FP. Side-effects in general, and I/O in particular, breaks purity. A straightforward way to cope with I/O is to wrap functions into other functions to create higher-order functions. Even better, higher-order functions provide laziness, and all their associated benefits.

The complete source code for this post can be found on Github.
Nicolas Fränkel

Nicolas Fränkel

Nicolas Fränkel is a Developer Advocate with 15+ years experience consulting for many different customers, in a wide range of contexts (such as telecoms, banking, insurances, large retail and public sector). Usually working on Java/Java EE and Spring technologies, but with focused interests like Rich Internet Applications, Testing, CI/CD and DevOps. Currently working for Hazelcast. Also double as a teacher in universities and higher education schools, a trainer and triples as a book author.

Read More