Archive

Posts Tagged ‘container’
  • Strategies for optimizing Maven Docker images

    :page-liquid: :icons: font :experimental: :imagesdir: /assets/resources/strategies-optimizing-docker-maven-image

    Last week, I link:{% post_url 2017-07-30-dockerfile-maven-based-github-projects %}[wrote^] on how to design a generic Docker image for Maven-based auto-executable webapps. The designed build file has 3 different stages: checkout from Github, build with Maven and execute with Java. The Maven build stage takes quite a long time, mostly due to:

    1. Executing tests
    2. Downloading dependencies

    Tests can be executed earlier in the build chain - then skipped for Docker, this post will focus on speeding up the download of the dependencies. Let’s check each option in turn.

    == Mount a volume

    The easiest option is to mount a volume to the local $HOME/.m2 folder when running Maven. This means dependencies are downloaded on the first run, and they are available on every run afterwards.

    The biggest constraint of going this way is that https://docs.docker.com/engine/tutorials/dockervolumes/#mount-a-host-file-as-a-data-volume[mounting volumes^] is only possible for run, not build. This is to enforce build stability. In the previous build file, the Maven build was not the final step. This is never the case in classical multi-stage builds. Hence, mounting volumes is definitely a no-go with multi-stage builds.

    == Parent image with dependencies

    With build immutability in mind, an option is to “inherit” from a Maven image with a repository that is already provisioned with the required dependencies.

    === Naive first approach

    Dependencies are specific to an app, so that they must be known early in the build stage. This requires reading the build descriptor file i.e. the pom.xml. Tailoring the parent image dependencies adds an additional build step, with no real benefits: dependencies will be downloaded nonetheless, just one step earlier.

    === Eagerly download the Internet

    All dependencies - from all Maven repositories used by the organization (at least repo1), may be provisioned in the parent image. This image has to be refreshed to account for new versions being released. The obvious choice is to schedule building it at regular intervals. This unfortunately puts an unreasonable load on source repositories. For this reason, downloading the entire repo1 is frowned upon by the https://maven.apache.org/community.html[Maven community^].

    [quote] __ DO NOT wget THE ENTIRE REPOSITORY!

    Please take only the jars you need. We understand this is may entail more work, but grabbing more than 1,7 TiB of binaries really kills our servers. __

    === Intermediate proxy

    In order to improve upon the previous solution, it’s possible to add an intermediate enterprise proxy repository i.e. https://www.jfrog.com/artifactory/[Artifactory^] or http://www.sonatype.org/nexus/[Nexus^]. This way, the flow will look like the following:

    image::image-hierarchy.svg[Components diagram,444,501,align=”center”]

    //// skinparam component { backgroundColor LightCyan backgroundColor«image» #FEFECE } skinparam componentStyle uml2

    [Git] «image» as git [Maven] «image» as mvn [Maven Dependencies] «image» as deps [Enterprise Maven Proxy] as proxy [App] «image» as app mvn -up-|> git : FROM deps -up-|> mvn : FROM app -up-|> deps : FROM deps .right.> proxy ////

    NOTE: Whether the enteprise repo is Dockerized or not is irrevelant and plays no role in the whole process.

    Create the parent image:: At regular intervals, the image is created not from remote repositories but from the proxy. At first, there will be no dependencies in the proxy. But after the initial build, the image will contain the required dependencies. Initial build:: The enterprise repository is empty. Dependencies will be downloaded from remote repositories, through the enterprise proxy, feeding it in the process. Next builds:: The enterprise repository is filled. Dependencies are already present in the image: no download is required. Note that if in the meantime a new dependency (or a new version thereof) is required, it will be downloaded during app build, but only it - which makes download time short. This allows to provision it into the enterprise repo, available for the next app and finally into the image. + The only “trick” is for the App image to use the latest Maven Dependencies image as the parent i.e. FROM dependencies:latest , so as to benefit from dependencies as they are added into the image.

    == Custom solution

    Depending on the app stack, there might be dedicated solutions. For example, regarding the Spring Boot framework, there’s the https://github.com/dsyer/spring-boot-thin-launcher[Thin Launcher^]. It allows to download dependencies at first run, instead of build time. As an added benefit, it keeps images very small in size as dependencies are not packaged in each one.

    == Conclusion

    Barring an existing hack for a specific stack, putting an enterprise repository in front of remote ones allows for fastest downloads. However, scheduling the creation of a Docker image at regular intervals allows to completely skip downloads.

    Categories: Development Tags: containerbuildoptimization
  • A Dockerfile for Maven-based Github projects

    Docker logo

    :page-liquid: :icons: font :experimental:

    Since the Docker “revolution”, I’ve been interested in creating a Dockefile for Spring applications. I’m far from an ardent practitioner, but the principle of creating the Dockerfile is dead simple. As in Java - or probably any programming language, however, while it’s easy to achieve something that works, it’s much harder to create something that works well.

    == Multi-stage builds

    In Docker, one of the main issue is the size of the final image. It’s not uncommon to end up with images over 1 GB even for simple Java applications. Since version 17.05 of Docker, it’s possible to have multiple builds in a single Dockerfile, and to access the output the previous build into the current one. Those are called multi-stage builds. The final image will be based on the last build stage.

    Let’s imagine the code is hosted on Github, and that it’s based on Maven. Build stages would be as follow:

    1. Clone the code from Github
    2. Copy the folder from the previous stage; build the app with Maven
    3. Copy the JAR from the previous stage; run it with java -jar

    Here is a build file to start from:

    [source]

    FROM alpine/git WORKDIR /app RUN git clone https://github.com/spring-projects/spring-petclinic.git <1>

    FROM maven:3.5-jdk-8-alpine WORKDIR /app COPY –from=0 /app/spring-petclinic /app <2> RUN mvn install <3>

    FROM openjdk:8-jre-alpine WORKDIR /app COPY –from=1 /app/target/spring-petclinic-1.5.1.jar /app <4> CMD [“java -jar spring-petclinic-1.5.1.jar”] <5> —-

    It maps the above build stages:

    <1> Clone the Spring PetClinic git repository from Github <2> Copy the project folder from the previous build stage <3> Build the app <4> Copy the JAR from the previous build stage <5> Run the app

    == Improving readability

    Notice that in the previous build file, build stages are referenced via their index (starting from 0) e.g. COPY --from=0. While not a real issue, it’s always better to have something semantically meaningful. Docker allows to label stages and references those labels in later stages.

    [source]

    FROM alpine/git as clone <1> WORKDIR /app RUN git clone https://github.com/spring-projects/spring-petclinic.git

    FROM maven:3.5-jdk-8-alpine as build <2> WORKDIR /app COPY –from=clone /app/spring-petclinic /app <3> RUN mvn install

    FROM openjdk:8-jre-alpine WORKDIR /app COPY –from=build /app/target/spring-petclinic-1.5.1.jar /app CMD [“java -jar spring-petclinic-1.5.1.jar”] —-

    <1> Labels the first stage as clone <2> Labels the second stage as build <3> References the first stage using the label

    == Choosing the right image(s)

    Multiple-stages build help tremendously with image size management, but it’s not the only criterion. The image to start with has a big impact on the final image. Beginners generally use full-blown Operating Systems images, such as https://hub.docker.com/_/ubuntu/[Ubuntu^], inflating the size of the final image for no good reason. Yet, there are lightweight OS, that are very well suited to Docker images, such as https://alpinelinux.org/[Alpine Linux^].

    NOTE: It’s also a great fit for security purposes, as the attack surface is limited.

    In the above build file, images are the following:

    1. https://hub.docker.com/r/alpine/git/[alpine/git^]
    2. https://hub.docker.com/_/maven/[maven:3.5-jdk-8-alpine^]
    3. https://hub.docker.com/_/openjdk/[openjdk:8-jre-alpine]

    They all inherit transitively or directly from https://hub.docker.com/_/alpine/[alpine].

    Image sizes are as follow:

    [source]

    REPOSITORY TAG IMAGE ID CREATED SIZE nfrankel/spring-petclinic latest 293bd333d60c 10 days ago 117MB openjdk 8-jre-alpine c4f9d77cd2a1 4 weeks ago 81.4MB —-

    The difference in size between the JRE and the app images is around 36 MB which is the size of the JAR itself.

    == Exposing the port

    The Spring Pet Clinic is a webapp, so it requires to expose the HTTP port it will bind to. The relevant Docker directive is EXPOSE. I choose 8080 as a port number to be the same as the embedded Tomcat container, but it could be anything. The last stage should be modified as such:

    [source]

    FROM openjdk:8-jre-alpine WORKDIR /app COPY –from=build /app/target/spring-petclinic-1.5.1.jar /app EXPOSE 8080 CMD [“java -jar spring-petclinic-1.5.1.jar”] —-

    == Parameterization

    At this point, it appears the build file can be used for building any webapp with the following features:

    1. The source code is hosted on Github
    2. The build tool is Maven
    3. The resulting output is an executable JAR file

    Of course, that suits Spring Boot applications very well, but this is not a hard requirement.

    Parameters include:

    • The Github repository URL
    • The project name
    • Maven’s artifactId and version
    • The artefact name (as it might differ from the artifactId, depending on the specific Maven configuration)

    Let’s use those to design a parameterized build file. In Docker, parameters can be passed using either ENV or ARG options. Both are set using the --build-arg option on the command-line. Differences are the following:

    [cols=”3*”] |===

    h Type
      ENV
      ARG
    h Found in the image
      icon:check-square-o[]
      icon:square-o[]
    h Default value
      Required
      Optional

    |===

    [source]

    FROM alpine/git as clone ARG url <1> WORKDIR /app RUN git clone ${url} <2>

    FROM maven:3.5-jdk-8-alpine as build ARG project <3> WORKDIR /app COPY –from=clone /app/${project} /app RUN mvn install

    FROM openjdk:8-jre-alpine ARG artifactid ARG version ENV artifact ${artifactid}-${version}.jar <4> WORKDIR /app COPY –from=build /app/target/${artifact} /app EXPOSE 8080 CMD [“java -jar ${artifact}”] <5> —-

    <1> url must be passed on the command line to set which Github repo to clone <2> url is replaced by the passed value <3> Same as <1> <4> artifact must be an ENV, so as to be persisted in the final app image <5> Use the artifact value at runtime

    == Building

    The Spring Pet Clinic image can now be built using the following command-line:

    [source]

    docker build –build-arg url=https://github.com/spring-projects/spring-petclinic.git\ –build-arg project=spring-petclinic\ –build-arg artifactid=spring-petclinic\ –build-arg version=1.5.1\ -t nfrankel/spring-petclinic - < Dockerfile —-

    NOTE: Since the image doesn’t depend on the filesystem, no context needs to be passed and the Dockerfile can be piped from the standard input.

    To build another app, parameters can be changed accordingly e.g.:

    [source]

    docker build –build-arg url=https://github.com/heroku/java-getting-started.git\ –build-arg project=java-getting-started\ –build-arg artifactid=java-getting-started\ –build-arg version=1.0\ -t nfrankel/java-getting-started - < Dockerfile —-

    == Running

    Running an image built with the above command is quite easy:

    [source]

    docker run -ti -p8080:8080 nfrankel/spring-petclinic —-

    Unfortunately, it fails with following error message: starting container process caused "exec: \"java -jar ${artifact}\": executable file not found in $PATH. The trick is to use the ENTRYPOINT directive. The updated Dockerfile looks as per the following:

    [source]

    FROM openjdk:8-jre-alpine ARG artifactid ARG version ENV artifact ${artifactid}-${version}.jar <4> WORKDIR /app COPY –from=build /app/target/${artifact} /app EXPOSE 8080 ENTRYPOINT [“sh”, “-c”] CMD [“java -jar ${artifact}”] <5> —-

    As this point, running the container will (finally) work.

    NOTE: the second image uses a different port, so the command should be: docker run -ti -p8080:5000 nfrankel/java-getting-started.

    == Food for thoughts

    External Git:: The current build clones a repository, and hence doesn’t need sending the context to Docker. An alternative would be to clone outside the build file, e.g. in a continuous integration chain, and start from the context. That could be useful to build the image during development on developers machines. Latest version or not:: For the Git image, I used the latest version, while for the JDK and JRE images, I used a specific major version. It’s important for the Java version to be fixed to a major version, not so much for Git. It depends on the nature of the image. Building from master:: There’s no configuration to change branch after cloning. This is wrong, as most of the times, builds are executed on a dedicated tag e.g. v1.0.0. Obviously, there should be an additional command - as well as an additional build argument, to checkout a specific tag before the build. Skipping tests:: It takes an awful time for the Pet Clinic application to build, as it has a huge test harness. Executing those tests take time, and Maven must go through the test phase to reach the install phase. Depending on the specific continuous integration chain, tests might have been executed earlier and mustn’t be executed again during the image build. Maven repository download:: Spring Boot apps have a lot of dependencies. It takes a long time for the image to build, as all dependencies need to be downloaded every time for every app. There are a couple of solutions to handle that. It probably will be the subject of another post.

    Categories: Development Tags: dockercontainerbuild
  • Spring can inject Servlets too!

    In this article, I will show you that Spring dependency injection mechanism is not restricted solely to Spring-managed beans, that is Spring can inject its beans in objects created by the new keywords, servlets instantiated in the servlet container, and pretty anything you like. Spring classical mode is to be an object factory. That is, Spring creates anything in your application, using the provided constructor. Yet, some of the objects you use are outer Spring’s perimeter. Two simple examples:

    • servlets are instantiated by the servlet container. As such, they cannot be injected out-of-the-box
    • some business objects are not parameterized in Spring but rather created by your own code with the new keyword

    Both these examples show you can’t delegate to Spring every object instantiation.

    There was a time when I foolishly thought Spring was a closed container. Either your beans were managed by Spring, or they didn’t: if they were, you could inject them with other Spring-managed beans. If they weren’t, tough luck! Well, this is dead wrong. Spring can inject its beans into pretty much anything provided you’re okay to use AOP.

    In order to do this, there are only 2 steps to take:

    Use AOP

    It is done in your Spring configuration file.

    <?xml version="1.0" encoding="UTF-8"?>
    <beans xmlns="http://www.springframework.org/schema/beans"
      xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
      xmlns:context="http://www.springframework.org/schema/context"
      xsi:schemaLocation="http://www.springframework.org/schema/beans
        http://www.springframework.org/schema/beans/spring-beans-2.0.xsd
        http://www.springframework.org/schema/context
    	http://www.springframework.org/schema/context/spring-context-2.5.xsd">
      <This does the magic />
      <context:spring-configured />
      <-- These are for classical annotation configuration -->
      <context:annotation-config />
      <context:component-scan base-package="ch.frankel.blog.spring.outcontainer" />
    </beans>
    

    You need also configure which aspect engine to use to weave the compiled bytecode. In this cas, it is AspectJ, which is the AOP component used by Spring. Since i’m using Maven as my build tool of choice, this is easily done in my POM. Ant users will have to do it in their build.xml:

    <project xmlns="http://maven.apache.org/POM/4.0.0"
      xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
      xsi:schemaLocation="http://maven.apache.org/POM/4.0.0
        http://maven.apache.org/maven-v4_0_0.xsd">
    ...
      <properties>
        <spring-version>2.5.6.SEC01</spring-version>
      </properties>
      <dependencies>
        <dependency>
          <groupId>org.springframework</groupId>
          <artifactId>spring-aop</artifactId>
          <version>${spring-version}</version>
        </dependency>
        <dependency>
          <groupId>org.springframework</groupId>
          <artifactId>spring-aspects</artifactId>
          <version>${spring-version}</version>
        </dependency>
        <dependency>
          <groupId>aspectj</groupId>
          <artifactId>aspectjrt</artifactId>
          <version>1.5.4</version>
          <scope>test</scope>
        </dependency>
      </dependencies>
      <build>
        <plugins>
          <plugin>
            <groupId>org.codehaus.mojo</groupId>
            <artifactId>aspectj-maven-plugin</artifactId>
            <configuration>
              <complianceLevel>1.5</complianceLevel>
              <aspectLibraries>
                <aspectLibrary>
                  <groupId>org.springframework</groupId>
                  <artifactId>spring-aspects</artifactId>
                </aspectLibrary>
              </aspectLibraries>
            </configuration>
            <executions>
              <execution>
                <goals>
                  <goal>compile</goal>
                </goals>
              </execution>
            </executions>
          </plugin>
        </plugins>
      </build>
    </project>
    

    Configure which object creation to intercept

    This is done with the @org.springframework.beans.factory.annotation.Configurable annotation on your injectable object.

    @Configurable
    public class DomainObject {
    
      /** The object to be injected by Spring. */
      private Injectable injectable;
    
      public Injectable getInjectable() {
        return injectable;
    
      }
    
      @Autowired
      public void setInjectable(Injectable injectable) {
        this.injectable = injectable;
      }
    }
    

    Now with only these few lines of configuration (no code!), I’m able to inject Spring-managed beans into my domain object. I leave to you to implement the same with regular servlets (which are much harder to display as unit test).

    You can find the Maven project used for this article here. The unit test packaged shows the process described above.

    To go further: