/ EXERCISE, PROGRAMMING, STYLE, RDBMS

Exercises in Relational Database Style

So far, we solved the word frequencies problem with code alone. This week, we will solve it with the help of infrastructure. And with our specific problem at hand, what better fit than a relational database?

This is the 14th post in the Exercises in Programming Style focus series.Other posts include:

  1. Introducing Exercises in Programming Style
  2. Exercises in Programming Style, stacking things up
  3. Exercises in Programming Style, Kwisatz Haderach-style
  4. Exercises in Programming Style, recursion
  5. Exercises in Programming Style with higher-order functions
  6. Composing Exercises in Programming Style
  7. Exercises in Programming Style, back to Object-Oriented Programming
  8. Exercises in Programming Style: maps are objects too
  9. Exercises in Programming Style: Event-Driven Programming
  10. Exercises in Programming Style and the Event Bus
  11. Reflecting over Exercises in Programming Style
  12. Exercises in Aspect-Oriented Programming Style
  13. Exercises in Programming Style: FP & I/O
  14. Exercises in Relational Database Style (this post)
  15. Exercises in Programming Style: spreadsheets
  16. Exercises in Concurrent Programming Style
  17. Exercises in Programming Style: sharing data among threads
  18. Exercises in Programming Style with Hazelcast
  19. Exercises in MapReduce Style
  20. Conclusion of Exercises in Programming Style

We can approach the problem by loading the data read from the files in the database, and retrieving the top 25 words with the relevant query. In SQL, this query would simply translate into the following:

SELECT value, COUNT(*) as count
  FROM words
  GROUP BY value
  ORDER BY count DESC
  LIMIT 25;

Both Python or Kotlin offer low-level database access API natively. However, none of them are very interesting, as it’s just connecting to the database and then sending SQL strings for queries. Instead, let’s take advantage of the occasion to use Exposed, the Kotlin SQL Framework:

Exposed is a prototype for a lightweight SQL library written over JDBC driver for Kotlin language. It does have two layers of database access: typesafe SQL wrapping DSL and lightweight data access objects.

Querying

Here’s a simplified diagram of Exposed’s querying feature:

Exposed’s querying API

This is how it can be used:

object Words : Table() {
  val id = long("id").primaryKey().autoIncrement()
  val docId = long("docId")
  val value = varchar("value", 50)
}

Words.slice(Words.value, Words.value.count()).selectAll()
  .groupBy(Words.value)
  .orderBy(Words.id.count(), SortOrder.DESC)
  .limit(25)
  .map {
    it[Words.value] to it[Words.value.count()]
  }.toMap()

DDL, connections and transactions

Of course, Exposed doesn’t only provide querying capabilities, but also a lot of other features, such as connecting to the database. Here’s a diagram of those used in the code:

Exposed’s general API

Here’s a sample of provided features, with code samples:

Table creation
object Words : Table() {                              (1)
  val id = long("id").primaryKey().autoIncrement()
  val docId = long("docId")
  val value = varchar("value", 50)
}

SchemaUtils.create(Words)                             (2)
1 Define a table by creating a singleton extending the provided Table class
2 Create the table in the database
Database connection
Database.connect(
  "jdbc:h2:mem:test",            (1)
  "org.h2.Driver"                (2)
)
1 JDBC URL
2 JDBC drive class name. The JAR containing the driver must be accessible
Transactions handling
transaction {
  createDbSchema()
  loadFileIntoDatabase(filename)
  query()
}

Every code snippet defined in the transaction block will be executed inside…​ a transaction. By default, the transaction is stored - and retrieved - from the thread local.

Conclusion

Loading text files in the database to execute queries is far from optimal, to say the least. Actually, that’s a very bad use-case for it.

However, in most cases, letting the database do all the work is the way to go: be sure to explore all capabilities of your datastore instead of doing the work on the memory side through the API provided by the language.

The complete source code for this post can be found on Github.
Nicolas Fränkel

Nicolas Fränkel

Nicolas Fränkel is a Developer Advocate with 15+ years experience consulting for many different customers, in a wide range of contexts (such as telecoms, banking, insurances, large retail and public sector). Usually working on Java/Java EE and Spring technologies, but with focused interests like Rich Internet Applications, Testing, CI/CD and DevOps. Currently working for Hazelcast. Also double as a teacher in universities and higher education schools, a trainer and triples as a book author.

Read More