Archive

Posts Tagged ‘build’
  • Strategies for optimizing Maven Docker images

    :page-liquid: :icons: font :experimental: :imagesdir: /assets/resources/strategies-optimizing-docker-maven-image

    Last week, I link:{% post_url 2017-07-30-dockerfile-maven-based-github-projects %}[wrote^] on how to design a generic Docker image for Maven-based auto-executable webapps. The designed build file has 3 different stages: checkout from Github, build with Maven and execute with Java. The Maven build stage takes quite a long time, mostly due to:

    1. Executing tests
    2. Downloading dependencies

    Tests can be executed earlier in the build chain - then skipped for Docker, this post will focus on speeding up the download of the dependencies. Let’s check each option in turn.

    == Mount a volume

    The easiest option is to mount a volume to the local $HOME/.m2 folder when running Maven. This means dependencies are downloaded on the first run, and they are available on every run afterwards.

    The biggest constraint of going this way is that https://docs.docker.com/engine/tutorials/dockervolumes/#mount-a-host-file-as-a-data-volume[mounting volumes^] is only possible for run, not build. This is to enforce build stability. In the previous build file, the Maven build was not the final step. This is never the case in classical multi-stage builds. Hence, mounting volumes is definitely a no-go with multi-stage builds.

    == Parent image with dependencies

    With build immutability in mind, an option is to “inherit” from a Maven image with a repository that is already provisioned with the required dependencies.

    === Naive first approach

    Dependencies are specific to an app, so that they must be known early in the build stage. This requires reading the build descriptor file i.e. the pom.xml. Tailoring the parent image dependencies adds an additional build step, with no real benefits: dependencies will be downloaded nonetheless, just one step earlier.

    === Eagerly download the Internet

    All dependencies - from all Maven repositories used by the organization (at least repo1), may be provisioned in the parent image. This image has to be refreshed to account for new versions being released. The obvious choice is to schedule building it at regular intervals. This unfortunately puts an unreasonable load on source repositories. For this reason, downloading the entire repo1 is frowned upon by the https://maven.apache.org/community.html[Maven community^].

    [quote] __ DO NOT wget THE ENTIRE REPOSITORY!

    Please take only the jars you need. We understand this is may entail more work, but grabbing more than 1,7 TiB of binaries really kills our servers. __

    === Intermediate proxy

    In order to improve upon the previous solution, it’s possible to add an intermediate enterprise proxy repository i.e. https://www.jfrog.com/artifactory/[Artifactory^] or http://www.sonatype.org/nexus/[Nexus^]. This way, the flow will look like the following:

    image::image-hierarchy.svg[Components diagram,444,501,align=”center”]

    //// skinparam component { backgroundColor LightCyan backgroundColor«image» #FEFECE } skinparam componentStyle uml2

    [Git] «image» as git [Maven] «image» as mvn [Maven Dependencies] «image» as deps [Enterprise Maven Proxy] as proxy [App] «image» as app mvn -up-|> git : FROM deps -up-|> mvn : FROM app -up-|> deps : FROM deps .right.> proxy ////

    NOTE: Whether the enteprise repo is Dockerized or not is irrevelant and plays no role in the whole process.

    Create the parent image:: At regular intervals, the image is created not from remote repositories but from the proxy. At first, there will be no dependencies in the proxy. But after the initial build, the image will contain the required dependencies. Initial build:: The enterprise repository is empty. Dependencies will be downloaded from remote repositories, through the enterprise proxy, feeding it in the process. Next builds:: The enterprise repository is filled. Dependencies are already present in the image: no download is required. Note that if in the meantime a new dependency (or a new version thereof) is required, it will be downloaded during app build, but only it - which makes download time short. This allows to provision it into the enterprise repo, available for the next app and finally into the image. + The only “trick” is for the App image to use the latest Maven Dependencies image as the parent i.e. FROM dependencies:latest , so as to benefit from dependencies as they are added into the image.

    == Custom solution

    Depending on the app stack, there might be dedicated solutions. For example, regarding the Spring Boot framework, there’s the https://github.com/dsyer/spring-boot-thin-launcher[Thin Launcher^]. It allows to download dependencies at first run, instead of build time. As an added benefit, it keeps images very small in size as dependencies are not packaged in each one.

    == Conclusion

    Barring an existing hack for a specific stack, putting an enterprise repository in front of remote ones allows for fastest downloads. However, scheduling the creation of a Docker image at regular intervals allows to completely skip downloads.

    Categories: Development Tags: containerbuildoptimization
  • A Dockerfile for Maven-based Github projects

    Docker logo

    :page-liquid: :icons: font :experimental:

    Since the Docker “revolution”, I’ve been interested in creating a Dockefile for Spring applications. I’m far from an ardent practitioner, but the principle of creating the Dockerfile is dead simple. As in Java - or probably any programming language, however, while it’s easy to achieve something that works, it’s much harder to create something that works well.

    == Multi-stage builds

    In Docker, one of the main issue is the size of the final image. It’s not uncommon to end up with images over 1 GB even for simple Java applications. Since version 17.05 of Docker, it’s possible to have multiple builds in a single Dockerfile, and to access the output the previous build into the current one. Those are called multi-stage builds. The final image will be based on the last build stage.

    Let’s imagine the code is hosted on Github, and that it’s based on Maven. Build stages would be as follow:

    1. Clone the code from Github
    2. Copy the folder from the previous stage; build the app with Maven
    3. Copy the JAR from the previous stage; run it with java -jar

    Here is a build file to start from:

    [source]

    FROM alpine/git WORKDIR /app RUN git clone https://github.com/spring-projects/spring-petclinic.git <1>

    FROM maven:3.5-jdk-8-alpine WORKDIR /app COPY –from=0 /app/spring-petclinic /app <2> RUN mvn install <3>

    FROM openjdk:8-jre-alpine WORKDIR /app COPY –from=1 /app/target/spring-petclinic-1.5.1.jar /app <4> CMD [“java -jar spring-petclinic-1.5.1.jar”] <5> —-

    It maps the above build stages:

    <1> Clone the Spring PetClinic git repository from Github <2> Copy the project folder from the previous build stage <3> Build the app <4> Copy the JAR from the previous build stage <5> Run the app

    == Improving readability

    Notice that in the previous build file, build stages are referenced via their index (starting from 0) e.g. COPY --from=0. While not a real issue, it’s always better to have something semantically meaningful. Docker allows to label stages and references those labels in later stages.

    [source]

    FROM alpine/git as clone <1> WORKDIR /app RUN git clone https://github.com/spring-projects/spring-petclinic.git

    FROM maven:3.5-jdk-8-alpine as build <2> WORKDIR /app COPY –from=clone /app/spring-petclinic /app <3> RUN mvn install

    FROM openjdk:8-jre-alpine WORKDIR /app COPY –from=build /app/target/spring-petclinic-1.5.1.jar /app CMD [“java -jar spring-petclinic-1.5.1.jar”] —-

    <1> Labels the first stage as clone <2> Labels the second stage as build <3> References the first stage using the label

    == Choosing the right image(s)

    Multiple-stages build help tremendously with image size management, but it’s not the only criterion. The image to start with has a big impact on the final image. Beginners generally use full-blown Operating Systems images, such as https://hub.docker.com/_/ubuntu/[Ubuntu^], inflating the size of the final image for no good reason. Yet, there are lightweight OS, that are very well suited to Docker images, such as https://alpinelinux.org/[Alpine Linux^].

    NOTE: It’s also a great fit for security purposes, as the attack surface is limited.

    In the above build file, images are the following:

    1. https://hub.docker.com/r/alpine/git/[alpine/git^]
    2. https://hub.docker.com/_/maven/[maven:3.5-jdk-8-alpine^]
    3. https://hub.docker.com/_/openjdk/[openjdk:8-jre-alpine]

    They all inherit transitively or directly from https://hub.docker.com/_/alpine/[alpine].

    Image sizes are as follow:

    [source]

    REPOSITORY TAG IMAGE ID CREATED SIZE nfrankel/spring-petclinic latest 293bd333d60c 10 days ago 117MB openjdk 8-jre-alpine c4f9d77cd2a1 4 weeks ago 81.4MB —-

    The difference in size between the JRE and the app images is around 36 MB which is the size of the JAR itself.

    == Exposing the port

    The Spring Pet Clinic is a webapp, so it requires to expose the HTTP port it will bind to. The relevant Docker directive is EXPOSE. I choose 8080 as a port number to be the same as the embedded Tomcat container, but it could be anything. The last stage should be modified as such:

    [source]

    FROM openjdk:8-jre-alpine WORKDIR /app COPY –from=build /app/target/spring-petclinic-1.5.1.jar /app EXPOSE 8080 CMD [“java -jar spring-petclinic-1.5.1.jar”] —-

    == Parameterization

    At this point, it appears the build file can be used for building any webapp with the following features:

    1. The source code is hosted on Github
    2. The build tool is Maven
    3. The resulting output is an executable JAR file

    Of course, that suits Spring Boot applications very well, but this is not a hard requirement.

    Parameters include:

    • The Github repository URL
    • The project name
    • Maven’s artifactId and version
    • The artefact name (as it might differ from the artifactId, depending on the specific Maven configuration)

    Let’s use those to design a parameterized build file. In Docker, parameters can be passed using either ENV or ARG options. Both are set using the --build-arg option on the command-line. Differences are the following:

    [cols=”3*”] |===

    h Type
      ENV
      ARG
    h Found in the image
      icon:check-square-o[]
      icon:square-o[]
    h Default value
      Required
      Optional

    |===

    [source]

    FROM alpine/git as clone ARG url <1> WORKDIR /app RUN git clone ${url} <2>

    FROM maven:3.5-jdk-8-alpine as build ARG project <3> WORKDIR /app COPY –from=clone /app/${project} /app RUN mvn install

    FROM openjdk:8-jre-alpine ARG artifactid ARG version ENV artifact ${artifactid}-${version}.jar <4> WORKDIR /app COPY –from=build /app/target/${artifact} /app EXPOSE 8080 CMD [“java -jar ${artifact}”] <5> —-

    <1> url must be passed on the command line to set which Github repo to clone <2> url is replaced by the passed value <3> Same as <1> <4> artifact must be an ENV, so as to be persisted in the final app image <5> Use the artifact value at runtime

    == Building

    The Spring Pet Clinic image can now be built using the following command-line:

    [source]

    docker build –build-arg url=https://github.com/spring-projects/spring-petclinic.git\ –build-arg project=spring-petclinic\ –build-arg artifactid=spring-petclinic\ –build-arg version=1.5.1\ -t nfrankel/spring-petclinic - < Dockerfile —-

    NOTE: Since the image doesn’t depend on the filesystem, no context needs to be passed and the Dockerfile can be piped from the standard input.

    To build another app, parameters can be changed accordingly e.g.:

    [source]

    docker build –build-arg url=https://github.com/heroku/java-getting-started.git\ –build-arg project=java-getting-started\ –build-arg artifactid=java-getting-started\ –build-arg version=1.0\ -t nfrankel/java-getting-started - < Dockerfile —-

    == Running

    Running an image built with the above command is quite easy:

    [source]

    docker run -ti -p8080:8080 nfrankel/spring-petclinic —-

    Unfortunately, it fails with following error message: starting container process caused "exec: \"java -jar ${artifact}\": executable file not found in $PATH. The trick is to use the ENTRYPOINT directive. The updated Dockerfile looks as per the following:

    [source]

    FROM openjdk:8-jre-alpine ARG artifactid ARG version ENV artifact ${artifactid}-${version}.jar <4> WORKDIR /app COPY –from=build /app/target/${artifact} /app EXPOSE 8080 ENTRYPOINT [“sh”, “-c”] CMD [“java -jar ${artifact}”] <5> —-

    As this point, running the container will (finally) work.

    NOTE: the second image uses a different port, so the command should be: docker run -ti -p8080:5000 nfrankel/java-getting-started.

    == Food for thoughts

    External Git:: The current build clones a repository, and hence doesn’t need sending the context to Docker. An alternative would be to clone outside the build file, e.g. in a continuous integration chain, and start from the context. That could be useful to build the image during development on developers machines. Latest version or not:: For the Git image, I used the latest version, while for the JDK and JRE images, I used a specific major version. It’s important for the Java version to be fixed to a major version, not so much for Git. It depends on the nature of the image. Building from master:: There’s no configuration to change branch after cloning. This is wrong, as most of the times, builds are executed on a dedicated tag e.g. v1.0.0. Obviously, there should be an additional command - as well as an additional build argument, to checkout a specific tag before the build. Skipping tests:: It takes an awful time for the Pet Clinic application to build, as it has a huge test harness. Executing those tests take time, and Maven must go through the test phase to reach the install phase. Depending on the specific continuous integration chain, tests might have been executed earlier and mustn’t be executed again during the image build. Maven repository download:: Spring Boot apps have a lot of dependencies. It takes a long time for the image to build, as all dependencies need to be downloaded every time for every app. There are a couple of solutions to handle that. It probably will be the subject of another post.

    Categories: Development Tags: dockercontainerbuild
  • Polyglot everywhere - part 1

    This is the era of polyglot! Proponents of this practice spread the word that you’ve to choose the language best adapted to the problem at hand. And with a single team dedicated to a microservice, this might make sense.

    My pragmatic side tells me it means that developers get to choose the language they are developing with and don’t care how it will be maintained when they go away… On the other hand, my shiny-loving side just want to try - albeit in a more controlled environment, such as this blog!

    Introduction

    In this 3 parts serie, I’ll try to use polyglot on a project:

    • The first part is about the build system
    • The second part will be about the server side
    • The final part will be about the client-side

    My example will use a Vaadin project built with Maven and using a simple client-side extension. You can follow the project on Github.

    Polyglot Maven

    Though it may have been largely ignored, Maven can now talk many different languages since its version 3.3.1 thanks to an improved extension mechanism. In the end, the system is quite easy:

    • Create a .mvn folder at the root of your project
    • Create a extensions.xml file
    • Set the type of language you’d like to use:
    <?xml version="1.0" encoding="UTF-8"?>
    <extensions>
      <extension>
        <groupId>io.takari.polyglot</groupId>
        <artifactId>polyglot-yaml</artifactId>
        <version>0.1.8</version>
      </extension>
    </extensions>
    

    Here, I set the build “language” as YAML.

    In the end, the translation from XML to YAML is very straightforward:

    modelVersion: 4.0.0
    groupId: ch.frankel.blog.polyglot
    artifactId: polyglot-example
    packaging: war
    version: 1.0.0-SNAPSHOT
    dependencies:
        - { groupId: com.vaadin, artifactId: vaadin-spring, version: 1.0.0.beta2 }
    build:
        plugins:
            - artifactId: maven-compiler-plugin
              version: 3.1
              configuration:
                source: 1.8
                target: 1.8
            - artifactId: maven-war-plugin
              version: 2.2
              configuration:
                failOnMissingWebXml: false
    
    

    The only problem I had was in the YAML syntax itself: just make sure to align the elements of the plugin to the plugin declaration (e.g. align version with artifactId).

    Remember to check the POM on Github with each new part of the serie!

    Categories: Development Tags: buildmavenpolyglot
  • Stop the f... about Gradle

    Stop the f… about #Spring & #Hibernate migrating to #Gradle. Repeat after me: "my project do NOT have the same requirements" #Maven

    This was my week’s hate tweet, and I take full responsibility for every character in it. While that may seem like a troll, Twitter is not really the place to have a good-natured debate with factual arguments, so here is the follow up.

    Before going into full-blown rhetoric mode, let me first say that despite popular belief, I’m open to shiny new things. For example, despite being a Vaadin believer - which is a stateful server-side technology, I’m also interested in AngularJS - which is its exact opposite. I’m also in favor of TestNG over JUnit, and so on. I even went as far as going to a Gradle session at Devoxx France! So please, hear my arguments out, only then think them over.

    So far, I’ve heard only two arguments in favor of Gradle:

    1. It's flexible (implying Maven is not)
    2. Spring and Hibernate use it

    Both are facts, let’s go in detail over each of them to check why those are not arguments.

    Gradle is flexible

    There’s no denying that Gradle is flexible: I mean, it’s Groovy with a build DSL. Let us go further: how is flexibility achieved? My take is that it comes from the following.

    • Providing very fine-grained - almost atomic operations: compilation, copying, moving, packaging, etc.
    • Allowing to define lists of those operation - tasks: packaging a JAR would mean copying classes and resources files into a dedicated folder and zipping it
    • Enabling dependencies between those: packaging is dependent on compilation

    If you look at it closely, what I said can be applied to Gradle, of course, but also to Ant! Yes, both operate at the same level of granularity.

    Now the problem lies in that Gradle proponents tell Gradle is flexible as if flexibility was a desirable quality. Let me say: flexibility is not a quality for a build tool. I would even say it is a big disadvantage. That’s the same as having your broken arm put in a cast. That’s a definite lack of flexibility (to says the least) but it’s for your own good. The cast prevents you from moving in a painful way, just as unflexible build tools (such as Maven) make strange things definitely expensive.

    If you need to do something odd over and over because of your specific context, it’s because of a recurring requirement. In Maven, you’d create a plugin to address it and be done with that. If it’s a one-shot requirement, that’s probably no requirement but a quirk. Rethink the way you do it, it’s a smell something is definitely fishy.

    Spring and Hibernate both use Gradle

    That one really makes me laugh: because some framework choose a build, we should just use theirs, no questions asked? Did you really check why they migrated in the first place?

    I won’t even look at the Hibernate case, because it annoys me to no end to read arguments such as “I personally hate…” or “…define the build and directories the way that seemed to make sense to me”. That’s no good reason to change a build tool (but a real display of inflated ego in the latter case).

    For Spring, well… Groovy is just in their strategic path and has been for years. SpringSource Tools Suite fully support Groovy, so I guess using Gradle is a way to spread Groovy love all over the world. A more technical reason I’ve heard is that Spring must be compatible with different Java versions, and Maven cannot address that. I’m too lazy to check for myself but even if that’s true, it has only a slight chance of applying to your current project.

    Gradlew is one cool feature

    The only feature I know of that I currently lack - and would definitely love to have, is to set the build engine version once and for all, to be able to run my build 10 years from now if I need it. It’s amazing the number of software products that go to their first maintenance cycle in years and are at loss building from sources. Believe me, it happened to me (in a VB environment) but I had the fortune to have a genius at hand, something I unfortunately cannot count on.

    In the Gradle parlance, such feature is achieved through something known the Gradle Wrapper. Using that downloads the build engine itself, so you can put it into your version control. Too bad nobody ever raised this as an argument :-) though this is not enough to make me want to migrate.

    Note: during this writing, I just searched for a port of this feature to Maven and I found maven-wrapper. Any feedback?

    Wrap-up

    TL;DR:

    • Gradle is just Ant with Groovy instead of XML
    • Your context is (probably) different from those of frameworks which are using Gradle

    Both points have only one logical conclusion: there’s no real reason to use Gradle, stop the f… about it and go back to develop the next Google Search, there’s much more value in that!

    Categories: Java Tags: antbuildgradlemaven