Home > Development > Scala cheatsheet part 1 – collections

Scala cheatsheet part 1 – collections

As a follow-up of point 4 of my previous article, here’s a first little cheatsheet on the Scala collections API. As in Java, knowing API is a big step in creating code that is more relevant, productive and maintainable. Collections play such an important part in Scala that knowing the collections API is a big step toward better Scala knowledge.

Type inference

In Scala, collections are typed, which means you have to be extra-careful with elements type. Fortunaltey, constructors and companion objects factory have the ability to infer the type by themselves (most of the type). For example:

scala>val countries = List("France", "Switzerland", "Germany", "Spain", "Italy", "Finland")
countries: List1 = List(France, Switzerland, Germany, Spain, Italy, Finland)

Now, the countries value is of type List[String] since all elements of the collections are String.

As a corollary, if you don’t explicitly set the type if the collection is empty, you’ll have a collection typed with Nothing .

scala>val empty = List()
empty: List[Nothing] = List()

scala> 1 :: empty
res0: List[Int] = List(1)

scala> "1" :: empty
res1: List1 = List(1)

Adding a new element to the empty list will return a new list, typed according to the added element. This is also the case if a element of another type is added to a typed-collection.

scala> 1 :: countries
res2: List[Any] = List(1, France, Switzerland, Germany, Spain, Italy, Finland)

Default immutability

In Functional Programming, state is banished in favor of “pure” functions. Scala being both Object-Oriented and Functional in nature, it offers both mutable and immutable collections under the same name but under different packages: scala.collection.mutable and scala.collection.immutable. For example, Set and Map are found under both packages (interstingly enough, there’s a scala.collection.immutable.List but a scala.collection.mutable.MutableList). By default, collections that are imported in scope are those that are immutable in nature, through the scala.Predef companion object (which is imported implicitly).

The collections API

The heart of the matter lies in the API themselves. Beyond expected methods also found in Java (like size() and indexOf()), Scala brings to the table a unique functional approach to collections.

Filtering and partitioning

Scala collections can be filtered so that they return:

  • either a new collection that retain only elements that satisfy a predicate (filter())
  • or those that do not (filterNot())

Both take a function that takes the element as a parameter and return a boolean. The following example returns a collection which only retains countries whose name has more than 6 characters.

scala> countries.filter(_.length > 6)
res3: List1 = List(Switzerland, Germany, Finland)

Additionally, the same function type can be used to partition the original collection into a pair of two collections, one that satisfies the predicate and one that doesn’t.

scala> countries.partition(_.length > 6)
res4: (List1, List1) = (List(Switzerland, Germany, Finland),List(France, Spain, Italy))

Taking, droping and splitting

  • Taking a collection means returning a collection that keeps only the first n elements of the original one
    scala> countries.take(2)
    res5: List1 = List(France, Switzerland)
  • Droping a collection consists of returning a collection that keeps all elements but the first n elements of the original one.
    scala> countries.drop(2)
    res6: List1 = List(Germany, Spain, Italy, Finland)
  • Splitting a collection consists in returning a pair of two collections, the first one being the one before the specified index, the second one after.
    scala> countries.splitAt(2)
    res7: (List1, List1) = (List(France, Switzerland),List(Germany, Spain, Italy, Finland))

Scala also offers takeRight(Int) and dropRight(Int) variant methods that do the same but start with the end of the collection.

Additionally, there are takeWhile(f: A => Boolean) and dropWhile(f: A => Boolean) variant methods that respectively take and drop elements from the collection sequentially (starting from the left) while the predicate is satisfied.

Grouping

Scala collections elements can be grouped in key/value pairs according to a defined key. The following example groups countries by their name’s first character.

countries.groupBy(_(0))
res8: scala.collection.immutable.Map[Char,List1] = Map(F -> List(France, Finland), S -> List(Switzerland, Spain), G -> List(Germany), I -> List(Italy))

Set algebra

Three methods are available in the set algebra domain:

  • union (::: and union())
  • difference (diff())
  • intersection (intersect())

Those are pretty self-explanatory.

Map

The map(f: A => B) method returns a new collection, which length is the same as the original one, and whose elements have been applied a function.

For example, the following example returns a new collection whose names are reversed.

scala> countries.map(_.reverse)
res9: List[String] = List(ecnarF, dnalreztiwS, ynamreG, niapS, ylatI, dnalniF)

Folding

Folding is the operation of, starting from an initial value, applying a function to a tuple composed of an accumulator and the element under scrutiny. Considering that, it can be used as the above map if the accumulator is a collection, like so:

scala> countries.foldLeft(List[String]())((list, x) => x.reverse :: list)
res10: List[String] = List(dnalniF, ylatI, niapS, ynamreG, dnalreztiwS, ecnarF)

Alternatively, you can provide other types of accumulator, like a string, to get different results:

scala> countries.foldLeft("")((concat, x) => concat + x.reverse)
res11: java.lang.String = ecnarFdnalreztiwSynamreGniapSylatIdnalniF

Zipping

Zipping creates a list of pairs, from a list of single elements. There are two variants:

  • zipWithIndex() forms the pair with the index of the element and the element itself, like so:
    scala> countries.zipWithIndex
    res12: List[(java.lang.String, Int)] = List((France,0), (Switzerland,1), (Germany,2), (Spain,3), (Italy,4), (Finland,5))

    Note: zipping with index is very important when you want to use an iterator but still want to have a reference to the index. It keeps you from declaring a variable outside the iteration and incrementing the former inside the latter.

  • Additionally, you can also zip two lists together:
    scala> countries.zip(List("Paris", "Bern", "Berlin", "Madrid", "Rome", "Helsinki"))
    res13: List[(java.lang.String, java.lang.String)] = List((France,Paris), (Switzerland,Bern), (Germany,Berlin), (Spain,Madrid), (Italy,Rome), (Finland,Helsinki))

Note that the original collections don’t need to have the same size. The returned collection’s size will be the min of the sizes of the two original collections.

The reverse operation is also available, in the form of the unzip() method which returns two lists when provided with a list of pairs. The unzip3() does the same with a triple list.

Conclusion

I’ve written this article in the form of a simple fact-oriented cheat sheet, so you can use it as such. In the next months, I’ll try to add other such cheatsheets.

To go further:

I’ve found the following references around the web:

email
Send to Kindle
Categories: Development Tags:
  1. loic
    November 12th, 2012 at 00:23 | #1

    hey Nicolas!
    During the scala courses on coursera, I’ve read a post from M. Odersky saying that the operator ++ should be prefered to ::
    Of course ”::” still works but will be deleted in future scala releases.
    cheers,

  2. November 12th, 2012 at 15:52 | #2

    Hi Loïc,

    Thanks to take the time to read my blog despite your current situation. However, I think that :: and ++ are different operators: the former prepends a single element while the latter prepends a whole sequence. See the API for more details.

  1. No trackbacks yet.