/ ALGORITHM, PERFORMANCE, KOTLIN

# The sieve of Eratosthenes

Recently, I stumbled upon a Reddit thread pointing to a repository comparing the performances of implementations of the Sieve of Eratosthenes in different languages. In short, the Sieve is an (ancient) algorithm to find all prime numbers up to a given limit.

Results were intriguing, to say the least:

Language Performance (seconds)

C

0.501

``````clang -fblocks  -o prime_c prime.c
time ./prime_c &>/dev/null``````
```real    0m0.501s
user    0m0.490s
sys     0m0.006s```

Go 1.11

8.915

``````go build -o prime_go prime.go
time ./prime_go &>/dev/null``````
```real    0m8.915s
user    0m31.161s
sys     0m0.227s```

Python 3.6

10.85

``time python3 prime.py &>/dev/null``
```real    0m10.850s
user    0m10.739s
sys     0m0.066s```

Ruby 2.4

481.470

``time ruby prime.rb &>/dev/null``
```real    8m1.470s
user    7m46.913s
sys     0m3.977s```

Kotlin 1.3

2034.561

``````kotlinc prime.kt -include-runtime -d prime_kt.jar
time java -jar prime_kt.jar &>/dev/null``````
```real    33m54.561s
user    33m18.257s
sys     0m11.193s```

The biggest - and worst - surprise came from Kotlin. I couldn’t believe my eyes!

Then, I looked at the code:

``````fun main(args: Array<String>) {
fun make_sieve(src: Sequence<Int>, prime: Int) = src.filter { it % prime != 0 }
var sieve = sequence {
var x = 2
while (true) yield(x++)
}
for (i in 1..10000) {
val prime = sieve.first()
println(prime)
sieve = make_sieve(sieve, prime)
}
}``````

After some comments from Redditors, the author replaced `Sequence` by `Iterator`, and the updated code is able to run in 2 seconds. It’s still a lot. I believe it’s possible to do much better, so let’s do it.

## Naive reference implementation

Before trying to do better, let’s get back to the algorithm itself, in pseudo-code:

```Input: an integer n > 1.

Let A be an array of `Boolean` values, indexed by integers 2 to n,
initially all set to `true`.

for i = 2, 3, 4, …​, not exceeding √n:

if A[i] is `true`:

for j = i2, i2+i, i2+2i, i2+3i, …​, not exceeding n:

A[j] := `false`.

Output: all i such that A[i] is `true`.```

In order to compare, a reference implementation is required. The following snippet is the direct translation of the above naive straightforward algorithm in Kotlin:

``````fun sieveSimple(n: Int): Array<Boolean> {
val indices = Array(n) { true }
val limit = sqrt(n.toDouble()).toInt()
val range = 2..limit
for (i in range) {
if (indices[i]) {
var j = i.toDouble().pow(2).toInt()
while (j < n) {
indices[j] = false
j += i
}
}
}
return indices
}``````

Running this code on my machine takes about 9 milliseconds. This will be the baseline for improvements.

## Coroutines first draft

Kotlin coroutines allow to run code concurrently. One easy optimization is to run a computation for a specific `i` on a coroutine, and aggregate the results in one dedicated array.

``````fun sieveCoroutines(n: Int): Array<Boolean> {
val indices = Array(n) { true }
val limit = sqrt(n.toDouble()).toInt()
val range = 2..limit
runBlocking {
async {                                       (1)
for (i in range) {
val result = computeForSingleNumber(i, n)
mergeBooleanArrays(indices, result)       (2)
}
}.await()
}
return indices
}

fun computeForSingleNumber(i: Int, n: Int): Array<Boolean> {
val indices = Array(n) { true }
if (indices[i]) {
var j = i.toDouble().pow(2).toInt()
while (j < n) {
indices[j] = false
j += i
}
}
return indices
}

fun mergeBooleanArrays(a1: Array<Boolean>, a2: Array<Boolean>) {
val range = 0 until a1.size
for (i in range) a1[i] = a1[i] && a2[i]
}``````
 1 Run instructions of the block in parallel 2 Merge the computation results of the coroutine into the indices array

The above code is executed in about 300ms on my laptop. While much better than the code of the original poster, it takes 30 times more than the above baseline! Coroutines are not to blame.

The merging of the results after each coroutine run is the block that takes time.

## Coroutines and shared mutable state

Merging different arrays after each run is not the way to go.

Obviously, computations do not depend on one another: a single computation is enough to cross out a potential prime. The worst that can happen is that two different computations cross out the same number, which doesn’t change the result.

Hence, the array can be safely shared among all coroutines. The updated code looks like the following:

``````fun sieveSharedMutableState(n: Int): Array<Boolean> {
val indices = Array(n) { true }
val limit = sqrt(n.toDouble()).toInt()
val range = 2..limit
runBlocking {
async {                                (1)
for (i in range) {
computeForSingleNumber(i, indices) (2)
}
}.await()
}
return indices
}

fun computeForSingleNumber(i: Int, indices: Array<Boolean>) {
val n = indices.size
if (indices[i]) {
var j = i.toDouble().pow(2).toInt()
while (j < n) {
indices[j] = false                   (3)
j += i
}
}
}``````
 1 Run instructions in the block in parallel 2 Pass the shared mutable array to the coroutine 3 Update the shared mutable array from the coroutine

Running the above code on my machine takes ~4 milliseconds. A whopping improvement of 50% over my original baseline!

## Conclusions

Just as with standard Java concurrent programming APIs, coroutines do not guarantee anything. They should be used with the correct data structures. Sequences are lazy, but if they are called at every iteration, they can become slower than their eager mutable counterparts.

The complete source code for this post can be found on Github.

## To go further: #### Nicolas Fränkel

Developer Advocate with 15+ years experience consulting for many different customers, in a wide range of contexts (such as telecoms, banking, insurances, large retail and public sector). Usually working on Java/Java EE and Spring technologies, but with focused interests like Rich Internet Applications, Testing, CI/CD and DevOps. Also double as a trainer and triples as a book author.

The sieve of Eratosthenes