Aug 6, 2017 / CONTAINER, BUILD, OPTIMIZATION

Strategies for optimizing Maven Docker images

Last week, I wrote on how to design a generic Docker image for Maven-based auto-executable webapps. The designed build file has 3 different stages: checkout from Github, build with Maven and execute with Java. The Maven build stage takes quite a long time, mostly due to:

Executing tests
Downloading dependencies

Tests can be executed earlier in the build chain - then skipped for Docker, this post will focus on speeding up the download of the dependencies. Let’s check each option in turn.

Mount a volume

The easiest option is to mount a volume to the local $HOME/.m2 folder when running Maven. This means dependencies are downloaded on the first run, and they are available on every run afterwards.

The biggest constraint of going this way is that mounting volumes is only possible for run, not `build`. This is to enforce build stability. In the previous build file, the Maven build was not the final step. This is never the case in classical multi-stage builds. Hence, mounting volumes is definitely a no-go with multi-stage builds.

Parent image with dependencies

With build immutability in mind, an option is to "inherit" from a Maven image with a repository that is already provisioned with the required dependencies.

Naive first approach

Dependencies are specific to an app, so that they must be known early in the build stage. This requires reading the build descriptor file i.e. the pom.xml. Tailoring the parent image dependencies adds an additional build step, with no real benefits: dependencies will be downloaded nonetheless, just one step earlier.

Eagerly download the Internet

All dependencies - from all Maven repositories used by the organization (at least repo1), may be provisioned in the parent image. This image has to be refreshed to account for new versions being released. The obvious choice is to schedule building it at regular intervals. This unfortunately puts an unreasonable load on source repositories. For this reason, downloading the entire repo1 is frowned upon by the Maven community.

DO NOT wget THE ENTIRE REPOSITORY!

Please take only the jars you need. We understand this is may entail more work, but grabbing more than 1,7 TiB of binaries really kills our servers.

Intermediate proxy

In order to improve upon the previous solution, it’s possible to add an intermediate enterprise proxy repository i.e. Artifactory or Nexus. This way, the flow will look like the following:

Whether the enteprise repo is Dockerized or not is irrevelant and plays no role in the whole process.

Create the parent image: At regular intervals, the image is created not from remote repositories but from the proxy. At first, there will be no dependencies in the proxy. But after the initial build, the image will contain the required dependencies.
Initial build: The enterprise repository is empty. Dependencies will be downloaded from remote repositories, through the enterprise proxy, feeding it in the process.
Next builds: The enterprise repository is filled. Dependencies are already present in the image: no download is required. Note that if in the meantime a new dependency (or a new version thereof) is required, it will be downloaded during app build, but only it - which makes download time short. This allows to provision it into the enterprise repo, available for the next app and finally into the image.

The only "trick" is for the App image to use the latest Maven Dependencies image as the parent i.e. FROM dependencies:latest , so as to benefit from dependencies as they are added into the image.

Custom solution

Depending on the app stack, there might be dedicated solutions. For example, regarding the Spring Boot framework, there’s the Thin Launcher. It allows to download dependencies at first run, instead of build time. As an added benefit, it keeps images very small in size as dependencies are not packaged in each one.

Conclusion

Barring an existing hack for a specific stack, putting an enterprise repository in front of remote ones allows for fastest downloads. However, scheduling the creation of a Docker image at regular intervals allows to completely skip downloads.

Follow me Follow me