Aug 12, 2018 / SERIALIZATION

Java serialization

In one of my recent courses, we talked about Java 5 annotations. I told my students that before that time, one had to use marker interface instead: an interface without any method. Then, I showed the Serializable interface as an example. I started to explain it, then realized I would need a lot of time to fully cover it. This post is an attempt at that.

Serialization is the process of transforming an existing in-memory Java object to a stream of bytes. That stream can then be transferred over the network, or written to a file.

Use-case(s)

I’ve no proof to back that up, but I believe that serialization was initially meant to transfer objects from one running JVM instance to another one. For example, a long time ago, EJBs were meant to cross JVM boundaries.

In order for an EJB to move from a JVM to another JVM, it had to be serialized. Hence, the first version of EJBs had to be Serializable.

It’s not the case anymore.

The only current serialization use-case I know about is the storage of session data between runs in the Tomcat servlet/JSP container: when Tomcat stops, a shutdown hook writes session data on disk. When it starts again, session data is read from disk, so that users have still access to their session after restart. In that regard, the process is pretty similar to Windows hibernate feature.

This behavior is obviously not enforced by the API, as HttpSession.setAttribute() accepts a value of type Object. However, it’s recommended that objects stored in the session implement Serializable to cover all bases.

Requirements

As stated above, Serializable is a marker interface: it has no methods. However, there are requirements, even if they are not related to interface implementation:

An attribute of a Serializable class must either be:
1. A primitive type
2. A Serializable type
3. Marked transient, so that it won’t be serialized
The first non-serializable class in a Serializable class hierarchy must offer a no-arg constructor. It will be used during the deserialization process.

Customizing serialization

Sometimes, an object has to be serialized, but its class cannot satisfy the rule #2 above: an attribute is of a type that is not Serializable and outside the developer’s control .e.g InitialContext.

To overcome that issue, Java allows to customize the serialization process via 3 methods:

private void readObject(java.io.ObjectInputStream stream)
private void writeObject(java.io.ObjectOutputStream stream)
private void readObjectNoData()

While the first 2 methods are pretty self-explanatory, the last one deserves a description:

The readObjectNoData method is responsible for initializing the state of the object for its particular class in the event that the serialization stream does not list the given class as a superclass of the object being deserialized. This may occur in cases where the receiving party uses a different version of the deserialized instance’s class than the sending party, and the receiver’s version extends classes that are not extended by the sender’s version. This may also occur if the serialization stream has been tampered; hence, readObjectNoData is useful for initializing deserialized objects properly despite a "hostile" or incomplete source stream.

— InputStream JavaDoc
https://docs.oracle.com/javase/9/docs/api/java/io/ObjectInputStream.html

By design, all previous methods must have the private modifier. This is designed so that they may not be overriden in sub-classes.

Externalizable

Externalizable is a specialization of Serializable that relies on interface implementation to customize serialization.

The de/serialization process will check if a Serializable is also an Externalizable. In the later case, it will call the external-related methods. If not, it will default to "default" serialization.

Class version

In the Tomcat session serialization scenario above, I made an implicit assumption: that the Class of the object being serialized will be the same as the Class of the one being deserialized.

Although not frequent, that might not be always the case. For example, Tomcat was stopped to update the webapp, and the class has been updated. To solve that issue, the compiler writes a version in a static final long serialVersionUID field.

Any access modifier is allowed, but private should be preferred

The serialization process will write the serialVersionUID value along with the object. During deserialization, the value will be compared to the one of the class currently on the classpath. If both are different, deserialization will fail with an InvalidClassException.

There’s no guarantee that keeping a class unchanged will generate the serialVersionUID across compiler versions and over time. Hence, it’s recommended to write that value yourself - and change it only for incompatible class changes e.g.:

adding an instance method is considered compatible
removing an attribute is not

Given that information, one may now understand why generating a random value with the IDE will fix the IDE warning but is utterly useless.

Conclusion

Java serialization is seldom necessary. However, when it is, it’s important to know about its finer points.

To go further:

Follow @nicolas_frankel

Java serialization

Use-case(s)

Requirements

Customizing serialization

Externalizable

Class version

Conclusion

To go further:

Decoding Clojure code, getting your feet wet

Spring Boot integration in IntelliJ IDEA