Apr 18, 2012 / DEVOXX

Devoxx Fr 2012 Day 1

Though your humble writer is a firm believer in abstraction from underlying mechanics, I chose the following conference because there are some use-cases when one is confronted by performance issues that come from mismatch between the developer’s intent and its implementation. Matching them require a better understanding of what really happens.

From `Runnable` and `synchronize` to `parallel()` and `atomically()` by Jose Paumard

The talk is all about concurrent programming and its associated side-effects, performance and bugs. Title explanation comes from the past and the future: Runnable is from JDK 1.1 (1995) and parallel from JDK 8 (2014?). In particular, we’ll see how hardware has an impact on software and on how software developers work.

The central question is how to leverage processors power? As a corollary, what are the problems of parallel applications (think about concurrency, synchronizing and immutability). Finally, whare are the tools available to Java developers today and in the near future to help us develop parallel applications?

Multithreading

In 1995, processors had a single core. Multithreading is about sharing the processor’s time between execution threads. A processor executes a thread when the current thread has finished, or if it’s waiting for a slowor a locked resource. At this time, parallelizing is all processors executeing the same instruction at the same time but on different data. This is very fast because threads don’t need to interact with each other (no need to pass data to each other).

From an API point of view, Java provides Runnable and Thread. Keywords synchronized and volatile are available. Until 2004, there’s no need to go beyond that.

As soon as 2005, multicore processors are generally available. Also, from this point on, power doesn’t come from an increase in core frequency (Moore’s Law), but by doubling cores number. To address that, Java 5 brings java.util.concurrent to the table. As a side note, there’s a backport EDU.oswego to Java 4. Many new notions are available in this package.

Today, every simple computer has from 2 to 8 cores. Professional computers, such as SPARC T3 has 128 cores. Tomorrow, the scale will be in ten thousands of cores, or even more.

Problems of multithreading

The increase of raw data makes it necessary, deal with it
Race conditions. Here comes the demonstration of a race condition in the Singleton pattern and the double-check locking pattern (which doesn’t work).

The Java language specification defines a principle: a read from a variable should return the last written value of this variable. In order to have determinism, there should be a link "happens before" between access of the same variable. When there’s no link, there’s no way to know for sure the control flow: ooops, there comes a nice definition of a bug. The solution is to have the keyword volatile on the variable to create the link "happens before". Unluckily, it drastically reduces performance down to the level of a single core.

As for the only way to have a Singleton implementation is:
```
public enum Singleton {
	instance;
}
```

Now, when you get down to the hardware level, it’s even more complex. Each core has two levels of cache (L1 and L2) and there’s a single cache shared between all cores (L3). Here is a table displaying time needed to pass data from the core to the different caches:

Destination	Time (nanosecond)
L1	1
L2	3
L3	15
RAM	80

In conclusion, without the need to synchronize, a program could be executed at its optimal speed. With it, you should take into account the way the processor manages the memory (including cache) wired into the processor’s architecture! That’s exactly what we’re trying to prevent with Java (write once, run everywhere). Perhaps Java is not ready to go this path.

`java.util.concurrent`

Java 5 java.util.concurrent is a new way to launch threads. Before, we instantiated a new Runnable, created a new Thread wrapper and start the latter. The run() method of Thread has no return value nor it doesn’t throw an exception.

With Java 5, we instantiate a Callable: its call method has both a return value and do throw an exception. We pass it to a thread executor service which manages its onw thread pool (and probably reuses an already existing thread). We delegate thread management to a service that can abstract (and adapt) to the underlying hardware. There are also other notions.

Lock: locks the block for 1 thread. It provides the way to get the lock in a method and release it in another. Besides, there are two possible methods to get a lock, lock() and tryLock(). Finally, there’s a way to have a ReadWriteLock which is like having as many readers as desired, and a single writer which blocks reads. The reads are not synchronized, the write is.
Semaphor: locks the block for n thread
Barrier: synchronizes threads
Latch: open/close a code block for all threads

Also, new collections are available: BlockingQueue (and DeQueue) which are fixed-size collections with timeout methods. They are thread-safe and are used for Producer/Consumer implementations.

Finally, a CopyOnWriteArrayList (and CopyOnWriteLinkedList) provide concurrency friendly that let every thread read and lock only when writing. Unfortunately, therey are optimized when there’s not many reads (the array is copied and the memory pointer changed when writing). Hash and set structures could be implemented to operate the same way (copy and change pointer).

Alternatives to `synchronize`

Memory transactions

Akka provides a Software Transactional Memory API. With the API, we shouldn’t manipulate objects directly, but instead references to objects. Besides, we need Atomic objects that will manage STM for us. Memory is managed with optimistic locking: do as nothing was changed and either commit if it’s the case. The best use-case is when there’s low access concurrency because transactions are replayed when they fail. In the case of high concurrency access, there will be a lot of replays.

Some implementations use locks, others do not. Those are not the same lock semantics as those in the JLS. Also, some implementation are very memory hungry, and scales with the number of concurrent transactions. Intel recently announced that Haswell architecture will support Transactional Synchronization Extension: this will probably have an impact on the Java language, sooner or later.

Actors

Actors are components that receive data in the form of read-only messages. They compute a result and may send it as a message to other actors (eventually its caller). This strategy is all based on immutability, there’s no need for concurrency in this case.

Akka also provides an Actor API: writing an Akka actor is very similar to writing a Java 5 Callable. Besides, CPU usage is about the same as well as execution time (to be precise, with no significant difference). So, why used actoris instead of executor services? To bring in the power of STM, through transactional actors. The provided example is banking account transfers.

Parallel computing

In Java 7, there’s the Fork/Join pattern. Alternatively, there are parallel arrays and Java 8 provides the parallel() method. The goal is to automate parallel computing on arrays and collections.

Fork Join

It’s based on tasks. If a task decide it’s too big, it will spread other smaller subtasks: this is known as fork. Then, a system managed thread pool is available, each thread having a tasks queue. When a queue has no more tasks, the thread is capable of getting tasks from other threads queues and put them in its own. When a task is finished, a blocking known as join can aggregate results from subtasks (previously forked).

As a consequence, a task implementation must not only describe its business code, but also know how to decide if it’s too big and how to fork itself. Moreover, Fork Join can be used in a recursive way or an iterative way. The latter approach has 30% performance gain compared to the former. The conclusion is to only use recursivity when you do not know the boundaries of your system in advance.

Parallel arrays

Parallel arrays are described in JSR 166y. [And so end my notes on the talk, along with my battery life] #fail

Conclusion

The talker not only knows his subject, he knows how to hold an audience captive. Aside from the points described in the talk, I think the most important point in the whole thing is his answer to the last question:

"Given the levels of abstraction that Java isolate developers from, isn’t it logical to use alternative languages to address faced with new multicores architecture?

- Probably yes"

[UPDATE]

Reduce pressure on memory allocation by Olivier Lamy and Benoit Perroud

The goal of Apache Direct Memory is to decrease latency provoked by the garbage collector. It’s currently in the Apache Incubator.

Cache off Heap

As a reminder, Java’s memory is segmented in different spaces (young, tenured and perm). Some parts are allocated to the JVM, the rest is native and used by network and such. The problem of the heap is that it’s managed by the JVM and that in order to free unused objects, a garbage collector has to clean up. This is a process that freezes program execution and renders the program non deterministic (you don’t know when GC will start and stop).

Since RAM is very cheap nowadays, many applications use local cache to increase performance. Two caches are possible:

On-heap: think a big hash map. It increases GC processing time.
Off-heap: it’s stored through serialization in native memory (and there’s a penalty associated) but decreases heap memory usage and hence, garbage collector time

Apache Direct Memory

ADM is based on the ByteBuffer class: the are allocated en mass and then separated on demand. The architecture is layered, with a Cache API on top.

Strategies to allocate memory have to address exactly the same problems as disk space fragmentation: either fuse byte buffers, or fixed size byte buffers (memory 'loss' but no fragmentation).

In real life, there should be a small portion of cached objects on-heap, and the rest off-heap. This is very similar to what Terracotta’s BigMemory does. This means that one can use EhCache’s API and delegate to ADM’s implementation.

Next steps include JSR 107 implementation, benchmarks, integration into other components (Cassandra, Lucene, Tomcat), hot configuration, monitoring and management features, etc. For more information, see the ADM site.

Note that ADM only uses Java’s public API under the cover (ByteBuffer) and no com.sun hacks.

Very interesting talk, and a nice alternative to Terracotta’s product. I don’t think ADM is production ready yet but definitively something to keep under watch.

Though I’m no fan of applications chocked by JavaScript, fact is, there are here. Better have some tools to manage them, instead of suffering from them. Perhaps a well-managed JavaScript application can be the equal of a well-managed Java application (evil laugh).

Take care your JavaScript code by Romain Linsolas

In essence, this talk is about how to write tests for JavaScript, analyze it, and manage it by Jenkins CI. In reality, only a small fraction do it already on JavaScript code.

Testing JavaScript

Jasmine is a behavior-driven testing library for JavaScript, while Underscore.js is a utility library. Both will be used to create our test harness. Akquinet Maven archetypes can help us bootstrap our project.

describe is a reference to test "class", while it is a reference to a test function. Also, expect let us assert. beforeEach plays the role of equivalent Before annotations in Java (JUnit or TestNG). OtherJasmine functions mimic their Java equivalent.

Analyzing JavaScript

Let us keep proven tools here. If you use the aforementioned Maven archetype, it’s already compatible with Sonar! In order to govern test coverage, a nice library js-test-driver is available. It’s a JUnit look-alike. No need to throw away our previous Jasmine tests, we just need to launch them with js-test-driver, hence some prerequisites implying some POM updates:

a Jasmine adapter
the JAR to compute coverage
and an available port to launch a web server

The metrics are readily available in Sonar! What is done through the Maven CLI can easily be reproduced in Jenkins.

Nice talk though developping JavaScript is antagonist with my approach to applications. At least, I will be ready to tackle those JavaScript-based application when I encounter them.

Follow me Follow me

Devoxx Fr 2012 Day 1

From `Runnable` and `synchronize` to `parallel()` and `atomically()` by Jose Paumard

Multithreading

Problems of multithreading

`java.util.concurrent`

Alternatives to `synchronize`

Memory transactions

Actors

Parallel computing

Fork Join

Parallel arrays

Conclusion

Reduce pressure on memory allocation by Olivier Lamy and Benoit Perroud

Cache off Heap

Apache Direct Memory

Take care your JavaScript code by Romain Linsolas

Testing JavaScript

Analyzing JavaScript

Jenkins Users Conference Paris

Spring: profiles or not profiles?

From Runnable and synchronize to parallel() and atomically() by Jose Paumard

Multithreading

Problems of multithreading

java.util.concurrent

Alternatives to synchronize

Memory transactions

Actors

Parallel computing

Fork Join

Parallel arrays

Conclusion

Reduce pressure on memory allocation by Olivier Lamy and Benoit Perroud

Cache off Heap

Apache Direct Memory

Take care your JavaScript code by Romain Linsolas

Testing JavaScript

Analyzing JavaScript

From `Runnable` and `synchronize` to `parallel()` and `atomically()` by Jose Paumard

`java.util.concurrent`

Alternatives to `synchronize`