Jul 30, 2017 / DOCKER, CONTAINER, BUILD

A Dockerfile for Maven-based Github projects

Since the Docker "revolution", I’ve been interested in creating a Dockefile for Spring applications. I’m far from an ardent practitioner, but the principle of creating the Dockerfile is dead simple. As in Java - or probably any programming language, however, while it’s easy to achieve something that works, it’s much harder to create something that works well.

Multi-stage builds

In Docker, one of the main issue is the size of the final image. It’s not uncommon to end up with images over 1 GB even for simple Java applications. Since version 17.05 of Docker, it’s possible to have multiple builds in a single Dockerfile, and to access the output the previous build into the current one. Those are called multi-stage builds. The final image will be based on the last build stage.

Let’s imagine the code is hosted on Github, and that it’s based on Maven. Build stages would be as follow:

Clone the code from Github
Copy the folder from the previous stage; build the app with Maven
Copy the JAR from the previous stage; run it with java -jar

Here is a build file to start from:

FROM alpine/git
WORKDIR /app
RUN git clone https://github.com/spring-projects/spring-petclinic.git (1)

FROM maven:3.5-jdk-8-alpine
WORKDIR /app
COPY --from=0 /app/spring-petclinic /app (2)
RUN mvn install (3)

FROM openjdk:8-jre-alpine
WORKDIR /app
COPY --from=1 /app/target/spring-petclinic-1.5.1.jar /app (4)
CMD ["java -jar spring-petclinic-1.5.1.jar"] (5)

It maps the above build stages:

1	Clone the Spring PetClinic git repository from Github
2	Copy the project folder from the previous build stage
3	Build the app
4	Copy the JAR from the previous build stage
5	Run the app

Improving readability

Notice that in the previous build file, build stages are referenced via their index (starting from 0) e.g. COPY --from=0. While not a real issue, it’s always better to have something semantically meaningful. Docker allows to label stages and references those labels in later stages.

FROM alpine/git as clone (1)
WORKDIR /app
RUN git clone https://github.com/spring-projects/spring-petclinic.git

FROM maven:3.5-jdk-8-alpine as build (2)
WORKDIR /app
COPY --from=clone /app/spring-petclinic /app (3)
RUN mvn install

FROM openjdk:8-jre-alpine
WORKDIR /app
COPY --from=build /app/target/spring-petclinic-1.5.1.jar /app
CMD ["java -jar spring-petclinic-1.5.1.jar"]

1	Labels the first stage as `clone`
2	Labels the second stage as `build`
3	References the first stage using the label

Choosing the right image(s)

Multiple-stages build help tremendously with image size management, but it’s not the only criterion. The image to start with has a big impact on the final image. Beginners generally use full-blown Operating Systems images, such as Ubuntu, inflating the size of the final image for no good reason. Yet, there are lightweight OS, that are very well suited to Docker images, such as Alpine Linux.

It’s also a great fit for security purposes, as the attack surface is limited.

In the above build file, images are the following:

They all inherit transitively or directly from alpine.

Image sizes are as follow:

REPOSITORY                      TAG                 IMAGE ID            CREATED             SIZE
nfrankel/spring-petclinic       latest              293bd333d60c        10 days ago         117MB
openjdk                         8-jre-alpine        c4f9d77cd2a1        4 weeks ago         81.4MB

The difference in size between the JRE and the app images is around 36 MB which is the size of the JAR itself.

Exposing the port

The Spring Pet Clinic is a webapp, so it requires to expose the HTTP port it will bind to. The relevant Docker directive is EXPOSE. I choose 8080 as a port number to be the same as the embedded Tomcat container, but it could be anything. The last stage should be modified as such:

FROM openjdk:8-jre-alpine
WORKDIR /app
COPY --from=build /app/target/spring-petclinic-1.5.1.jar /app
EXPOSE 8080
CMD ["java -jar spring-petclinic-1.5.1.jar"]

Parameterization

At this point, it appears the build file can be used for building any webapp with the following features:

The source code is hosted on Github
The build tool is Maven
The resulting output is an executable JAR file

Of course, that suits Spring Boot applications very well, but this is not a hard requirement.

Parameters include:

The Github repository URL
The project name
Maven’s artifactId and version
The artefact name (as it might differ from the artifactId, depending on the specific Maven configuration)

Let’s use those to design a parameterized build file. In Docker, parameters can be passed using either ENV or ARG options. Both are set using the --build-arg option on the command-line. Differences are the following:

Type	`ENV`	`ARG`
Found in the image
Default value	Required	Optional

Type

ENV

ARG

Found in the image

Default value

Required

Optional

FROM alpine/git as clone
ARG url (1)
WORKDIR /app
RUN git clone ${url} (2)

FROM maven:3.5-jdk-8-alpine as build
ARG project (3)
WORKDIR /app
COPY --from=clone /app/${project} /app
RUN mvn install

FROM openjdk:8-jre-alpine
ARG artifactid
ARG version
ENV artifact ${artifactid}-${version}.jar (4)
WORKDIR /app
COPY --from=build /app/target/${artifact} /app
EXPOSE 8080
CMD ["java -jar ${artifact}"] (5)

1	`url` must be passed on the command line to set which Github repo to clone
2	`url` is replaced by the passed value
3	Same as <1>
4	`artifact` must be an `ENV`, so as to be persisted in the final app image
5	Use the `artifact` value at runtime

Building

The Spring Pet Clinic image can now be built using the following command-line:

docker build --build-arg url=https://github.com/spring-projects/spring-petclinic.git\
  --build-arg project=spring-petclinic\
  --build-arg artifactid=spring-petclinic\
  --build-arg version=1.5.1\
  -t nfrankel/spring-petclinic - < Dockerfile

Since the image doesn’t depend on the filesystem, no context needs to be passed and the Dockerfile can be piped from the standard input.

To build another app, parameters can be changed accordingly e.g.:

docker build --build-arg url=https://github.com/heroku/java-getting-started.git\
  --build-arg project=java-getting-started\
  --build-arg artifactid=java-getting-started\
  --build-arg version=1.0\
  -t nfrankel/java-getting-started - < Dockerfile

Running

Running an image built with the above command is quite easy:

docker run -ti -p8080:8080 nfrankel/spring-petclinic

Unfortunately, it fails with following error message: starting container process caused "exec: \"java -jar ${artifact}\": executable file not found in $PATH. The trick is to use the ENTRYPOINT directive. The updated Dockerfile looks as per the following:

FROM openjdk:8-jre-alpine
ARG artifactid
ARG version
ENV artifact ${artifactid}-${version}.jar (4)
WORKDIR /app
COPY --from=build /app/target/${artifact} /app
EXPOSE 8080
ENTRYPOINT ["sh", "-c"]
CMD ["java -jar ${artifact}"] (5)

As this point, running the container will (finally) work.

the second image uses a different port, so the command should be: docker run -ti -p8080:5000 nfrankel/java-getting-started.

Food for thoughts

External Git: The current build clones a repository, and hence doesn’t need sending the context to Docker. An alternative would be to clone outside the build file, e.g. in a continuous integration chain, and start from the context. That could be useful to build the image during development on developers machines.
Latest version or not: For the Git image, I used the latest version, while for the JDK and JRE images, I used a specific major version. It’s important for the Java version to be fixed to a major version, not so much for Git. It depends on the nature of the image.
Building from master: There’s no configuration to change branch after cloning. This is wrong, as most of the times, builds are executed on a dedicated tag e.g. v1.0.0. Obviously, there should be an additional command - as well as an additional build argument, to checkout a specific tag before the build.
Skipping tests: It takes an awful time for the Pet Clinic application to build, as it has a huge test harness. Executing those tests take time, and Maven must go through the test phase to reach the install phase. Depending on the specific continuous integration chain, tests might have been executed earlier and mustn’t be executed again during the image build.
Maven repository download: Spring Boot apps have a lot of dependencies. It takes a long time for the image to build, as all dependencies need to be downloaded every time for every app. There are a couple of solutions to handle that. It probably will be the subject of another post.

The complete source code for this post can be found on Github.

To go further:

Follow me Follow me