Archive

Posts Tagged ‘optimization’
  • Strategies for optimizing Maven Docker images

    :page-liquid: :icons: font :experimental: :imagesdir: /assets/resources/strategies-optimizing-docker-maven-image

    Last week, I link:{% post_url 2017-07-30-dockerfile-maven-based-github-projects %}[wrote^] on how to design a generic Docker image for Maven-based auto-executable webapps. The designed build file has 3 different stages: checkout from Github, build with Maven and execute with Java. The Maven build stage takes quite a long time, mostly due to:

    1. Executing tests
    2. Downloading dependencies

    Tests can be executed earlier in the build chain - then skipped for Docker, this post will focus on speeding up the download of the dependencies. Let’s check each option in turn.

    == Mount a volume

    The easiest option is to mount a volume to the local $HOME/.m2 folder when running Maven. This means dependencies are downloaded on the first run, and they are available on every run afterwards.

    The biggest constraint of going this way is that https://docs.docker.com/engine/tutorials/dockervolumes/#mount-a-host-file-as-a-data-volume[mounting volumes^] is only possible for run, not build. This is to enforce build stability. In the previous build file, the Maven build was not the final step. This is never the case in classical multi-stage builds. Hence, mounting volumes is definitely a no-go with multi-stage builds.

    == Parent image with dependencies

    With build immutability in mind, an option is to “inherit” from a Maven image with a repository that is already provisioned with the required dependencies.

    === Naive first approach

    Dependencies are specific to an app, so that they must be known early in the build stage. This requires reading the build descriptor file i.e. the pom.xml. Tailoring the parent image dependencies adds an additional build step, with no real benefits: dependencies will be downloaded nonetheless, just one step earlier.

    === Eagerly download the Internet

    All dependencies - from all Maven repositories used by the organization (at least repo1), may be provisioned in the parent image. This image has to be refreshed to account for new versions being released. The obvious choice is to schedule building it at regular intervals. This unfortunately puts an unreasonable load on source repositories. For this reason, downloading the entire repo1 is frowned upon by the https://maven.apache.org/community.html[Maven community^].

    [quote] __ DO NOT wget THE ENTIRE REPOSITORY!

    Please take only the jars you need. We understand this is may entail more work, but grabbing more than 1,7 TiB of binaries really kills our servers. __

    === Intermediate proxy

    In order to improve upon the previous solution, it’s possible to add an intermediate enterprise proxy repository i.e. https://www.jfrog.com/artifactory/[Artifactory^] or http://www.sonatype.org/nexus/[Nexus^]. This way, the flow will look like the following:

    image::image-hierarchy.svg[Components diagram,444,501,align=”center”]

    //// skinparam component { backgroundColor LightCyan backgroundColor«image» #FEFECE } skinparam componentStyle uml2

    [Git] «image» as git [Maven] «image» as mvn [Maven Dependencies] «image» as deps [Enterprise Maven Proxy] as proxy [App] «image» as app mvn -up-|> git : FROM deps -up-|> mvn : FROM app -up-|> deps : FROM deps .right.> proxy ////

    NOTE: Whether the enteprise repo is Dockerized or not is irrevelant and plays no role in the whole process.

    Create the parent image:: At regular intervals, the image is created not from remote repositories but from the proxy. At first, there will be no dependencies in the proxy. But after the initial build, the image will contain the required dependencies. Initial build:: The enterprise repository is empty. Dependencies will be downloaded from remote repositories, through the enterprise proxy, feeding it in the process. Next builds:: The enterprise repository is filled. Dependencies are already present in the image: no download is required. Note that if in the meantime a new dependency (or a new version thereof) is required, it will be downloaded during app build, but only it - which makes download time short. This allows to provision it into the enterprise repo, available for the next app and finally into the image. + The only “trick” is for the App image to use the latest Maven Dependencies image as the parent i.e. FROM dependencies:latest , so as to benefit from dependencies as they are added into the image.

    == Custom solution

    Depending on the app stack, there might be dedicated solutions. For example, regarding the Spring Boot framework, there’s the https://github.com/dsyer/spring-boot-thin-launcher[Thin Launcher^]. It allows to download dependencies at first run, instead of build time. As an added benefit, it keeps images very small in size as dependencies are not packaged in each one.

    == Conclusion

    Barring an existing hack for a specific stack, putting an enterprise repository in front of remote ones allows for fastest downloads. However, scheduling the creation of a Docker image at regular intervals allows to completely skip downloads.

    Categories: Development Tags: containerbuildoptimization
  • WebJars and wro4j integration

    WebJars is an easy way for server-side developers (such as your humble servant) to manage client-side resources such as Bootstrap, jQuery and their like, within the same package management tool they use for their server-side libraries.

    In essence, what WebJars does is package a set version of the client-side resource (CSS or JavaScript) in the META-INF/resources of a JAR and upload it on Maven Central. Then, any Java EE compatible web-container makes the resource available under a static URL. For example, for a JAR packaging META-INF/resources/webjars/bootstrap.3.0.3/js/bootstrap.js, it can be referenced by webjars/bootstrap/3.0.3/css/bootstrap.css.

    Most providers offer a minified version of their resource, and it is packaged along in the JAR, so using a minified resource is a no-brainer (if the minified resource is packages along, of course). However, when using multiple WebJars, this increases the number of browser requests. Outside the context of WebJars, minimizing the number of requests could easily be achieved by using wro4j, a tool that manages both minifying and merging resources through lists of pre- and post-processors. A typical wro4j use-case was already described in an earlier article.

    The good thing is that WebJars and wro4j can be integrated with ease through wro4j.xml configuration file. As it stands, wro4j.xml configures resources merging. Those resources may come from a variety of sources; typically, they are internal resource and are referenced by their path relative from the webapp root:

    /sample.css
    

    However, the power of wro4j is to be able to reference other kind of resources, including those packaged inside JARs:

    classpath:META-INF/resources/webjars/bootstrap/3.0.3/css/bootstrap.css
    

    And with this configuration line only, we merge the resource inside the WebJar with other resources. From this point on, the merged resource can be referenced as the single resource inside our webapp. The following displays a wro4j configuration file that creates a compound.css file from an internal sample.css and the Bootstrap WebJar.

    <?xml version="1.0" encoding="UTF-8"?>
    <groups xmlns="http://www.isdc.ro/wro">
        <group name="compound">
            <css>classpath:META-INF/resources/webjars/bootstrap/3.0.3/css/bootstrap.css</css>
            <css>/sample.css</css>
        </group>
    </groups>
    

    An example project in Maven/IntelliJ format is provided as an attachment.

    Warning: though having a single minified resource for JavaScript (and one for CSS) improves performance with HTTP/1.1, it seems it might not be the case with HTTP/2.0.

    Categories: Java Tags: cssjavascriptoptimizationwebjarswro4j