• Strategies for optimizing Maven Docker images

    Last week, I wrote on how to design a generic Docker image for Maven-based auto-executable webapps. The designed build file has 3 different stages: checkout from Github, build with Maven and execute with Java. The Maven build stage takes quite a long time, mostly due to:

    1. Executing tests
    2. Downloading dependencies

    Tests can be executed earlier in the build chain - then skipped for Docker, this post will focus on speeding up the download of the dependencies. Let’s check each option in turn.

    Mount a volume

    The easiest option is to mount a volume to the local $HOME/.m2 folder when running Maven. This means dependencies are downloaded on the first run, and they are available on every run afterwards.

    The biggest constraint of going this way is that mounting volumes is only possible for run, not `build`. This is to enforce build stability. In the previous build file, the Maven build was not the final step. This is never the case in classical multi-stage builds. Hence, mounting volumes is definitely a no-go with multi-stage builds.

    Parent image with dependencies

    With build immutability in mind, an option is to "inherit" from a Maven image with a repository that is already provisioned with the required dependencies.

    Naive first approach

    Dependencies are specific to an app, so that they must be known early in the build stage. This requires reading the build descriptor file i.e. the pom.xml. Tailoring the parent image dependencies adds an additional build step, with no real benefits: dependencies will be downloaded nonetheless, just one step earlier.

    Eagerly download the Internet

    All dependencies - from all Maven repositories used by the organization (at least repo1), may be provisioned in the parent image. This image has to be refreshed to account for new versions being released. The obvious choice is to schedule building it at regular intervals. This unfortunately puts an unreasonable load on source repositories. For this reason, downloading the entire repo1 is frowned upon by the Maven community.

    DO NOT wget THE ENTIRE REPOSITORY!

    Please take only the jars you need. We understand this is may entail more work, but grabbing more than 1,7 TiB of binaries really kills our servers.

    Intermediate proxy

    In order to improve upon the previous solution, it’s possible to add an intermediate enterprise proxy repository i.e. Artifactory or Nexus. This way, the flow will look like the following:

    Components diagram
    Whether the enteprise repo is Dockerized or not is irrevelant and plays no role in the whole process.
    Create the parent image

    At regular intervals, the image is created not from remote repositories but from the proxy. At first, there will be no dependencies in the proxy. But after the initial build, the image will contain the required dependencies.

    Initial build

    The enterprise repository is empty. Dependencies will be downloaded from remote repositories, through the enterprise proxy, feeding it in the process.

    Next builds

    The enterprise repository is filled. Dependencies are already present in the image: no download is required. Note that if in the meantime a new dependency (or a new version thereof) is required, it will be downloaded during app build, but only it - which makes download time short. This allows to provision it into the enterprise repo, available for the next app and finally into the image.

    The only "trick" is for the App image to use the latest Maven Dependencies image as the parent i.e. FROM dependencies:latest , so as to benefit from dependencies as they are added into the image.

    Custom solution

    Depending on the app stack, there might be dedicated solutions. For example, regarding the Spring Boot framework, there’s the Thin Launcher. It allows to download dependencies at first run, instead of build time. As an added benefit, it keeps images very small in size as dependencies are not packaged in each one.

    Conclusion

    Barring an existing hack for a specific stack, putting an enterprise repository in front of remote ones allows for fastest downloads. However, scheduling the creation of a Docker image at regular intervals allows to completely skip downloads.

    Categories: Development Tags: containerbuildoptimization
  • A Dockerfile for Maven-based Github projects

    Docker logo

    Since the Docker "revolution", I’ve been interested in creating a Dockefile for Spring applications. I’m far from an ardent practitioner, but the principle of creating the Dockerfile is dead simple. As in Java - or probably any programming language, however, while it’s easy to achieve something that works, it’s much harder to create something that works well.

    Multi-stage builds

    In Docker, one of the main issue is the size of the final image. It’s not uncommon to end up with images over 1 GB even for simple Java applications. Since version 17.05 of Docker, it’s possible to have multiple builds in a single Dockerfile, and to access the output the previous build into the current one. Those are called multi-stage builds. The final image will be based on the last build stage.

    Let’s imagine the code is hosted on Github, and that it’s based on Maven. Build stages would be as follow:

    1. Clone the code from Github
    2. Copy the folder from the previous stage; build the app with Maven
    3. Copy the JAR from the previous stage; run it with java -jar

    Here is a build file to start from:

    FROM alpine/git
    WORKDIR /app
    RUN git clone https://github.com/spring-projects/spring-petclinic.git (1)
    
    FROM maven:3.5-jdk-8-alpine
    WORKDIR /app
    COPY --from=0 /app/spring-petclinic /app (2)
    RUN mvn install (3)
    
    FROM openjdk:8-jre-alpine
    WORKDIR /app
    COPY --from=1 /app/target/spring-petclinic-1.5.1.jar /app (4)
    CMD ["java -jar spring-petclinic-1.5.1.jar"] (5)

    It maps the above build stages:

    1 Clone the Spring PetClinic git repository from Github
    2 Copy the project folder from the previous build stage
    3 Build the app
    4 Copy the JAR from the previous build stage
    5 Run the app

    Improving readability

    Notice that in the previous build file, build stages are referenced via their index (starting from 0) e.g. COPY --from=0. While not a real issue, it’s always better to have something semantically meaningful. Docker allows to label stages and references those labels in later stages.

    FROM alpine/git as clone (1)
    WORKDIR /app
    RUN git clone https://github.com/spring-projects/spring-petclinic.git
    
    FROM maven:3.5-jdk-8-alpine as build (2)
    WORKDIR /app
    COPY --from=clone /app/spring-petclinic /app (3)
    RUN mvn install
    
    FROM openjdk:8-jre-alpine
    WORKDIR /app
    COPY --from=build /app/target/spring-petclinic-1.5.1.jar /app
    CMD ["java -jar spring-petclinic-1.5.1.jar"]
    1 Labels the first stage as clone
    2 Labels the second stage as build
    3 References the first stage using the label

    Choosing the right image(s)

    Multiple-stages build help tremendously with image size management, but it’s not the only criterion. The image to start with has a big impact on the final image. Beginners generally use full-blown Operating Systems images, such as Ubuntu, inflating the size of the final image for no good reason. Yet, there are lightweight OS, that are very well suited to Docker images, such as Alpine Linux.

    It’s also a great fit for security purposes, as the attack surface is limited.

    In the above build file, images are the following:

    1. alpine/git
    2. maven:3.5-jdk-8-alpine
    3. openjdk:8-jre-alpine

    They all inherit transitively or directly from alpine.

    Image sizes are as follow:

    REPOSITORY                      TAG                 IMAGE ID            CREATED             SIZE
    nfrankel/spring-petclinic       latest              293bd333d60c        10 days ago         117MB
    openjdk                         8-jre-alpine        c4f9d77cd2a1        4 weeks ago         81.4MB

    The difference in size between the JRE and the app images is around 36 MB which is the size of the JAR itself.

    Exposing the port

    The Spring Pet Clinic is a webapp, so it requires to expose the HTTP port it will bind to. The relevant Docker directive is EXPOSE. I choose 8080 as a port number to be the same as the embedded Tomcat container, but it could be anything. The last stage should be modified as such:

    FROM openjdk:8-jre-alpine
    WORKDIR /app
    COPY --from=build /app/target/spring-petclinic-1.5.1.jar /app
    EXPOSE 8080
    CMD ["java -jar spring-petclinic-1.5.1.jar"]

    Parameterization

    At this point, it appears the build file can be used for building any webapp with the following features:

    1. The source code is hosted on Github
    2. The build tool is Maven
    3. The resulting output is an executable JAR file

    Of course, that suits Spring Boot applications very well, but this is not a hard requirement.

    Parameters include:

    • The Github repository URL
    • The project name
    • Maven’s artifactId and version
    • The artefact name (as it might differ from the artifactId, depending on the specific Maven configuration)

    Let’s use those to design a parameterized build file. In Docker, parameters can be passed using either ENV or ARG options. Both are set using the --build-arg option on the command-line. Differences are the following:

    Type

    ENV

    ARG

    Found in the image

    Default value

    Required

    Optional

    FROM alpine/git as clone
    ARG url (1)
    WORKDIR /app
    RUN git clone ${url} (2)
    
    FROM maven:3.5-jdk-8-alpine as build
    ARG project (3)
    WORKDIR /app
    COPY --from=clone /app/${project} /app
    RUN mvn install
    
    FROM openjdk:8-jre-alpine
    ARG artifactid
    ARG version
    ENV artifact ${artifactid}-${version}.jar (4)
    WORKDIR /app
    COPY --from=build /app/target/${artifact} /app
    EXPOSE 8080
    CMD ["java -jar ${artifact}"] (5)
    1 url must be passed on the command line to set which Github repo to clone
    2 url is replaced by the passed value
    3 Same as <1>
    4 artifact must be an ENV, so as to be persisted in the final app image
    5 Use the artifact value at runtime

    Building

    The Spring Pet Clinic image can now be built using the following command-line:

    docker build --build-arg url=https://github.com/spring-projects/spring-petclinic.git\
      --build-arg project=spring-petclinic\
      --build-arg artifactid=spring-petclinic\
      --build-arg version=1.5.1\
      -t nfrankel/spring-petclinic - < Dockerfile
    Since the image doesn’t depend on the filesystem, no context needs to be passed and the Dockerfile can be piped from the standard input.

    To build another app, parameters can be changed accordingly e.g.:

    docker build --build-arg url=https://github.com/heroku/java-getting-started.git\
      --build-arg project=java-getting-started\
      --build-arg artifactid=java-getting-started\
      --build-arg version=1.0\
      -t nfrankel/java-getting-started - < Dockerfile

    Running

    Running an image built with the above command is quite easy:

    docker run -ti -p8080:8080 nfrankel/spring-petclinic

    Unfortunately, it fails with following error message: starting container process caused "exec: \"java -jar ${artifact}\": executable file not found in $PATH. The trick is to use the ENTRYPOINT directive. The updated Dockerfile looks as per the following:

    FROM openjdk:8-jre-alpine
    ARG artifactid
    ARG version
    ENV artifact ${artifactid}-${version}.jar (4)
    WORKDIR /app
    COPY --from=build /app/target/${artifact} /app
    EXPOSE 8080
    ENTRYPOINT ["sh", "-c"]
    CMD ["java -jar ${artifact}"] (5)

    As this point, running the container will (finally) work.

    the second image uses a different port, so the command should be: docker run -ti -p8080:5000 nfrankel/java-getting-started.

    Food for thoughts

    External Git

    The current build clones a repository, and hence doesn’t need sending the context to Docker. An alternative would be to clone outside the build file, e.g. in a continuous integration chain, and start from the context. That could be useful to build the image during development on developers machines.

    Latest version or not

    For the Git image, I used the latest version, while for the JDK and JRE images, I used a specific major version. It’s important for the Java version to be fixed to a major version, not so much for Git. It depends on the nature of the image.

    Building from master

    There’s no configuration to change branch after cloning. This is wrong, as most of the times, builds are executed on a dedicated tag e.g. v1.0.0. Obviously, there should be an additional command - as well as an additional build argument, to checkout a specific tag before the build.

    Skipping tests

    It takes an awful time for the Pet Clinic application to build, as it has a huge test harness. Executing those tests take time, and Maven must go through the test phase to reach the install phase. Depending on the specific continuous integration chain, tests might have been executed earlier and mustn’t be executed again during the image build.

    Maven repository download

    Spring Boot apps have a lot of dependencies. It takes a long time for the image to build, as all dependencies need to be downloaded every time for every app. There are a couple of solutions to handle that. It probably will be the subject of another post.

    Categories: Development Tags: dockercontainerbuild
  • Kotlin collections

    This post should have been the 5th in the Scala vs Kotlin serie.

    Unfortunately, I must admit I have a hard time reading the documentation of Scala collections e.g.:

    trait LinearSeq [+A] extends Seq[A] with collection.LinearSeq[A] with GenericTraversableTemplate[A, LinearSeq] with LinearSeqLike[A, LinearSeq[A]]

    Hence, I will only describe collections from the Kotlin side.

    Iterator

    At the root of Kotlin’s Collection API lies the Iterator interface, similar to Java’s. But the similitude stops after that. java.util.ListIterator features are broken down into different contracts:

    1. ListIterator to move the iterator index forward and backward
    2. MutableIterator to remove content from the iterator
    3. MutableListIterator inherits from the 2 interfaces above to mimic the entire contract of java.util.ListIterator
    Iterator API

    Collection, List and Set

    The hierarchy of collections in Kotlin are very similar as in Java: Collection, List and Set. (I won’t detail maps, but they follow the same design). The only, but huge, difference is that it’s divided between mutable and immutable types. Mutable types have methods to change their contents (_e.g. add() and `set()), while immutable types don’t.

    Of course, the hierarchy is a bit more detailed compared to Java, but that’s expected from a language that benefits from its parent’s experience.

    Collection, List and Set

    Implementations

    IMHO, the important bit about Kotlin collections is not their hierarchy - though it’s important to understand about the difference between mutable and immutable.

    As Java developers know, there’s no such things as out-of-the-box immutable collection types in Java. When an immutable collection is required, the mutable collection must be wrapped into an unmodifiable type via a call to the relevant Collections.unmodifiableXXX(). But unmodifiable types are not public, they are private in Collections: types returned are generic ones (List or Set interfaces). It means they implement all methods of the standard collections. Immutability comes from the mutable-related methods throwing exceptions: at compile time, there’s no way to differentiate between a mutable and an immutable collection.

    On the opposite, Kotlin offers a clean separation between mutable and immutable types. It also provides dedicated functions to create objects of the relevant type:

    Collections-creating functions

    As opposed to Scala, Kotlin doesn’t implement its own collection types, it reuses those from Java. That means that even when the compile-time type is immutable, the runtime type is always mutable. The downside is that it’s possible to change the collection elements by casting it to the correct runtime type. IMHO, this is no more severe than what allows standard reflection. There are several advantages, though:

    1. Faster time-to-market
    2. Java collection types benefited from years of improvement
    3. The underlying implementation can be changed in the future with full backward compatibility
    Categories: Development Tags: API
  • A use-case for Google Fusion Tables

    I’ve been traveling somewhat during the last few years, and I wanted to check on a map which countries I already visited. There’s one requirement: I want the whole area of countries I’ve visited to be highlighted. There are a couple of solutions for that:

    The Elastic stack

    Data can be stored in ElasticSearch, and there’s one world map visualization available for the Kibana dashboard. This works, but:

    1. The visualization handles geo-location, but no highlighting an entire country
    2. It requires some setup, and I’m a developer - read that I’m lazy.
    Dedicated app

    To be honest, I didn’t search but I assume "there is an app for that". But by using a dedicated app, you’re not the owner of your own data anymore and that doesn’t suit me.

    Google Maps

    Google Maps allows to add layers, and dedicated data. This solution would be a great fit, if it was possible to easily highlight countries. And still, there’s still a lot of JavaScript to write.

    Fusion Tables is an experimental service offered by Google.

    Creating the table

    While it’s possible to input the data directly into Fusion Tables, it’s easier to do that in a simple Google Spreadsheet. The following is sample data:

    Date Country Place Type Event

    2017/06

    Denmark

    Copenhagen

    Conference

    JDK.io

    2017/06

    Sweden

    Malmö

    Conference

    JKD.io

    2017/05

    Ukraine

    Kiev

    Conference

    JEEConf

    2017/05

    Greece

    Athens

    Conference

    Voxxed Days

    Such a spreadsheet can easily be imported into Fusion Tables.

    1. Connect Fusion Tables on Google Drive. From the Drive’s homepage, go on My Drive ▸ More ▸ Connect More Apps. Search for "fusion". Click on Connect.
      Connect Google Fusion Table on Drive
    2. Create a new Fusion Tables document. From the Drive’s homepage, click on New ▸ More ▸ Google Fusion Tables. Then select Google Spreadsheet.
      Import Google Spreadsheet into Fusion Tables
    3. Select the desired spreadsheet and click Select.
      Import new table
    4. Name the table accordingly and click Finish. It yields something akin to the following:
      Import new table

    Out-of-the-box, there’s a Map of Country tab that displays each data line on a world map. Unfortunately, it’s a simple dot at the center of the country. It doesn’t fulfil the initial requirement of highlighting the entire country area.

    Default world map view

    Changing the Location field to "Place" instead of "Country" will place dots at the correct location instead of the country center, but still no highlighting.

    Merging multiple Fusion Tables

    Fusion Tables support geometries that can be defined using the Keyhole Markup Language format. That can fulfil the highlighting requirement. Yet, that would mean defining the geometry of each country visited manually; it requires an effort I’m not prepared to make. Fortunately, it’s possible to "join" multiple tables - it’s called merging. Merging creates a new table, with both tables associated in it. Even better, if any of the initial table data changes, it’s reflected in the merged table.

    Good news: there’s an existing publicly accessible table defining all country geographies. Let’s merge the existing table with it in File ▸ Merge. In the Or paste a web address here field, paste the URL from the world countries above. Click Next. The opening pop-up requires to define the "join" columns of the tables.

    Default world map view

    Click Next. In the opening pop-up, tick the checkboxes of columns that will be part of the merged table. Click Merge. Wait for the merge to happen. Click View table.

    Now, on the world map tab, changing the Location field to "geometry" yields the expected result.

    Highlighted world map view

    At this point, the requirement is fulfilled. Further refinements would be to access the data via its REST API.

    Conclusion

    Fusion Tables is a no-fluff, just-stuff cloud service that allows to easily display data in various ways. With its ability to join on other tables, it’s easy to re-use existing tabular data.

    Categories: Technical Tags: databasedata visualization
  • Scala vs Kotlin: Multiple Inheritance and the Diamond problem

    This post is the 4th part in the serie dedicated to comparing Scala and Kotlin:

    1. Pimp my library
    2. Operator overloading
    3. inline and infix

    Inheritance is one of the basic tenet of Object-Oriented Programming, along with encapsulation and polymorphism. Alongside simple inheritance, there is multiple inheritance:

    Multiple inheritance is a feature of some object-oriented computer programming languages in which an object or class can inherit characteristics and features from more than one parent object or parent class. It is distinct from single inheritance, where an object or class may only inherit from one particular object or class.
    --Wikipedia

    C++ is famous for allowing multiple inheritance, and describing the diamond problem. It states that there’s an issue when a child class inherits from multiple classes that have the same method.

    C++ has its own way of coping with the diamond problem. In order to avoid it, Java completely disallows multiple-inheritance. Let’s check how Scala and Kotlin fare.

    Scala

    Scala doesn’t allow for multiple inheritance per se, but allows to extends multiple traits.

    Traits are used to share interfaces and fields between classes. They are similar to Java 8’s interfaces. Classes and objects can extend traits but traits cannot be instantiated and therefore have no parameters.
    --Scala Documentation

    The above diagram translates into the following code:

    trait Openable {
      def open() { ... }
    }
    
    trait Window extends Openable {
      def open() { ... }
    }
    
    trait Door extends Openable {
      def open() { ... }
    }
    
    class WindowDoor extends Door with Window {
      ...
    }
    

    Scala resolves the diamond problem by defining one main super trait - whose code will be used, among all super traits. The main one is set with the extends keyword, while other are with with.

    Hence, in the above example, WindowDoor.open() will by default use code from Door.open(). Of course, nothing prevents us from overriding the method.

    Kotlin

    As in Scala, Kotlin doesn’t allow to extend multiple super classes. Yet, interfaces can have concrete functions.

    Interfaces in Kotlin are very similar to Java 8. They can contain declarations of abstract methods, as well as method implementations. What makes them different from abstract classes is that interfaces cannot store state.
    --Kotlin Documentation

    The following is the code above translated in Kotlin:

    interface Openable {
        fun open() { ... }
    }
    
    interface Window : Openable {
        override fun open() { ... }
    }
    
    interface Door : Openable {
        override fun open() { ... }
    }
    
    class WindowDoor : Door, Window {
        override fun open() { ... }
    }
    

    Kotlin takes another path to solve the diamond problem: explicit overriding. The compiler detects diamond occurrences, and fires an error if a function is implemented by multiple parent classes. To fix this, the developer must explicitly code the desired behavior.

    Conclusion

    While Scala’s approach is more elegant, Kotlin’s is consistent with its philosophy: being explicit and readable before being concise.

    Categories: Development Tags: scalakotlin
  • Managing publications with Jekyll

    Jekyll logo

    Some time ago, I migrated my WordPress blog to Jekyll for different reasons including performance, security, and hosting costs - but mainly because I lost too much time maintaining the platform and the plugins up-to-date. So far, I’m very happy about the result.

    But I had to change the way I’m writing posts I intend to publish later. In WordPress, the process is very simple: write a draft anytime you want; edit it to your heart’s content and when ready, hit publish. Done.

    Jekyll is quite different. The site is generated statically so that there’s no magical button to click in order to publish. Posts are written in Markdown (or Asciidoc): in order to get the HTML, the site needs to be generated.

    There are several ways to manage the management of publications with Jekyll. This post is dedicated to a few of them.

    Using future dates

    Posts are stored in a specific _posts folder. Each post is tagged with a publish date. When generating the HTML site, only posts whose publish date are not after the date of the day are generated.

    Given the above behavior, the easiest way to publish is just to create posts with a date set in the future. That way, on D-day, re-generating the site will publish the new post.

    In order to preview the generated post before that date, the generation process can be launched with the --future option.

    This strategy requires to know the date a post will be published in advance. If no publication date can be planned, the next strategy is useful.

    Using drafts

    In order for a post to be generated, it needs to have its type meta-data set to…​ post. This is the case by default.

    But if the type is set to draft, they won’t be generated unless Jekyll build is launched with the --drafts option. For better management, all such drafts can (should?) be stored in a _drafts folder.

    When one wants to publish a draft, change the type from draft to post - and optionally move it to the main posts folder.

    Using a VCS

    When using a Version Control System - and I do hope it’s the case, posts/drafts can also be written in a specific publication branch e.g. feature/new_posts.

    Changing branch will preview those new posts with no consequence to the master branch. Publishing requires just cherry-picking the commit of the wanted post. Of course, it’s possible to combine cherry-picking with the above options to achieve full flexibility.

    As an added benefit to cherry-pick new publications from a dedicated branch, this branch can be cleaned up from time to time to keep the number of posts low, and the generation time as well.

    My way

    That’s currently how I manage my publishing flow:

    If I planned a specific publication date
    1. Commit the new post in the publication branch, feature/new_posts
    2. Cherry-pick it the day of the publication
    3. Generate the site
    If there’s none
    1. Same as above
    2. Optionally move the file from the _drafts to the _posts folder
    3. Amend the cherry-pick to:
      • Change the type from draft to post
      • Add the correct date
    4. Generate the site as above
    This way works quite well for multiple writers, with the editing process handled through Pull Requests.
    Categories: Technical Tags: jekyll
  • A SonarQube plugin for Kotlin - Creating the plugin proper

    SonarQube Continuous Inspection logo

    This is the 3rd post in a serie about creating a SonarQube plugin for the Kotlin language:

    • The first post was about creating the parsing code itself.
    • The 2nd post detailed how to use the parsing code to check for two rules.

    In this final post, we will be creating the plugin proper using the code of the 2 previous posts.

    The Sonar model

    The Sonar model is based on the following abstractions:

    Plugin
    Entry-point for plugins to inject extensions into SonarQube
    A plugin points to the other abstraction instances to make the SonarQube platform load them
    AbstractLanguage
    Pretty self-explanatory. Represents a language - Java, C#, Kotlin, etc.
    ProfileDefinition
    Define a profile which is automatically registered during sonar startup
    A profile is a mutable set of fully-configured rules. While not strictly necessary, having a Sonar profile pre-registered allows users to analyze their code without further configuration. Every language plugin offered by Sonar has at least one profile attached.
    RulesDefinition
    Defines some coding rules of the same repository
    Defines an immutable set of rule definitions into a repository. While a rule definition defines available parameters, default severity, etc. the rule (from the profile) defines the exact value for parameters, a specific severity, etc. In short, the rule implements the role definition.
    Sensor
    A sensor is invoked once for each module of a project, starting from leaf modules. The sensor can parse a flat file, connect to a web server... Sensors are used to add measure and issues at file level.
    The sensor is the entry-point where the magic happens.

    Starting to code the plugin

    Every abstraction above needs a concrete subclass. Note that the API classes themselves are all fairly decoupled. It’s the role of the Plugin child class to bind them together.

    class KotlinPlugin : Plugin {
    
        override fun define(context: Context) {
            context.addExtensions(
                    Kotlin::class.java,
                    KotlinProfile::class.java,
                    KotlinSensor::class.java,
                    KotlinRulesDefinition::class.java)
        }
    }
    

    Most of the code is mainly boilerplate, but for ANTLR code.

    Wiring the ANTLR parsing code

    On one hand, the parsing code is based on generated listeners. On the other hand, the sensor is the entry-point to the SonarQube parsing. There’s a need for a bridge between the 2.

    In the first article, we used an existing grammar for Kotlin to generate parsing code. SonarQube provides its own lexer/parser generating tool (SonarSource Language Recognizer). A sizeable part of the plugin API is based on it. Describing the grammar is no small feat for any real-life language, so I preferred to design my own adapter code instead.

    AbstractKotlinParserListener
    Subclass of the generated ANTLR KotlinParserBaseListener. It has an attribute to store violations, and a method to add such a violation.
    Violation
    The violation only contains the line number, as the rest of the required information will be stored into a KotlinCheck instance.
    KotlinCheck
    Abstract class that wraps an AbstractKotlinParserListener. Defines what constitutes a violation. It handles the ANTLR boilerplate code itself.

    This can be represented as the following:

    The sensor proper

    The general pseudo-code should look something akin to:

    FOR EACH source file
        FOR EACH rule
            Check for violation of the rule
            FOR EACH violation
                Call the SonarQube REST API to create a violation in the datastore
    

    This translates as:

    class KotlinSensor(private val fs: FileSystem) : Sensor {
    
        val sources: Iterable<InputFile>
            get() = fs.inputFiles(MAIN)
    
        override fun execute(context: SensorContext) {
            sources.forEach { inputFile: InputFile ->
                KotlinChecks.checks.forEach { check ->
                    val violations = check.violations(inputFile.file())
                    violations.forEach { (lineNumber) ->
                        with(context.newIssue().forRule(check.ruleKey())) {
                            val location = newLocation().apply {
                                on(inputFile)
                                message(check.message)
                                at(inputFile.selectLine(lineNumber))
                            }
                            at(location).save()
                        }
                    }
                }
            }
        }
    }
    

    Finally, the run

    Let’s create a dummy Maven project with 2 classes, Test1 and Test2 in one Test.kt file, with the same code as last week. Running mvn sonar:sonar yields the following output:

    Et voilà, our first SonarQube plugin for Kotlin, checking for our custom-developed violations.

    Of course, it has (a lot of) room for improvements:

    • Rules need to be activated through the GUI - I couldn’t find how to do it programmatically
    • Adding new rules needs updates to the plugin. Rules in 3rd-party plugins are not added automatically, as could be the case for standard SonarQube plugins.
    • So far, code located outside of classes seems not to be parsed.
    • The walk through the parse tree is executed for every check. An obvious performance gain would be to walk only once and do every check from there.
    • A lof of the above improvements could be achieved by replacing ANTLR’s grammar with Sonar’s internal SSLR
    • No tests…

    That still makes the project a nice starting point for a full-fledged Kotlin plugin. Pull requests are welcome!

    Categories: Technical Tags: code qualitySonarQubeKotlinpluginANTLR
  • A SonarQube plugin for Kotlin - Analyzing with ANTLR

    SonarQube Continuous Inspection logo

    Last week, we used ANTLR to generate a library to be able to analyze Kotlin code. It’s time to use the generated API to check for specific patterns.

    API overview

    Let’s start by having a look at the generated API:

    • KotlinLexer: Executes lexical analysis.
    • KotlinParser: Wraps classes representing all Kotlin tokens, and handles parsing errors.
    • KotlinParserVisitor: Contract for implementing the Visitor pattern on Kotlin code. KotlinParserBaseVisitor is its empty implementation, to ease the creation of subclasses.
    • KotlinParserListener: Contract for callback-related code when visiting Kotlin code, with KotlinParserBaseListener its empty implementation.

    Class diagrams are not the greatest diagrams to ease the writing of code. The following snippet is a very crude analysis implementation. I’ll be using Kotlin, but any JVM language interoperable with Java could be used:

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    val stream = CharStreams.fromString("fun main(args : Array<String>) {}")
    val lexer = KotlinLexer(stream)
    val tokens = CommonTokenStream(lexer)
    val parser = KotlinParser(tokens)
    val context = parser.kotlinFile()
    ParseTreeWalker().apply {
        walk(object : KotlinParserBaseListener() {
            override fun enterFunctionDeclaration(ctx: KotlinParser.FunctionDeclarationContext) {
                println(ctx.SimpleName().text)
            }
        }, context)
    }
    

    Here’s the explanation:

    1. Create a CharStream to feed the lexer on the next line. The CharStreams offers plenty of static fromXXX() methods, each accepting a different type (String, InputStream, etc.)
    2. Instantiate the lexer, with the stream
    3. Instantiate a token stream over the lexer. The class provides streaming capabilities over the lexer.
    4. Instantiate the parser, with the token stream
    5. Define the entry point into the code. In that case, it’s a Kotlin file - and probably will be for the plugin.
    6. Create the overall walker that will visit each node in turn
    7. Start the visiting process by calling walk and passing the desired behavior as an object
    8. Override the desired function. Here, it will be invoked every time a function node is entered
    9. Do whatever is desired e.g. print the function name

    Obviously, lines 1 to 7 are just boilerplate to wire all components together. The behavior that need to be implemented should replace lines 8 and 9.

    First simple check

    In Kotlin, if a function returns Unit - nothing, then explicitly declaring its return type is optional. It would be a great rule to check that there’s no such explicit return. The following snippets, both valid Kotlin code, are equivalent - one with an explicit return type and the other without:

    fun hello1(): Unit {
        println("Hello")
    }
    
    fun hello2() {
        println("Hello")
    }
    

    Let’s use grun to graphically display the parse tree (grun was explained in the previous post). It yields the following:

    As can be seen, the snippet with an explicit return type has a type branch under functionDeclaration. This is confirmed by the snippet from the KotlinParser ANTLR grammar file:

    functionDeclaration
      : modifiers 'fun' typeParameters?
          (type '.' | annotations)?
          SimpleName
          typeParameters? valueParameters (':' type)?
          typeConstraints
          functionBody?
          SEMI*
      ;
    

    The rule should check that if such a return type exists, then it shouldn’t be Unit. Let’s update the above code with the desired effect:

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    ParseTreeWalker().apply {
        walk(object : KotlinParserBaseListener() {
            override fun enterFunctionDeclaration(ctx: KotlinParser.FunctionDeclarationContext) {
                if (ctx.type().isNotEmpty()) {
                    val typeContext = ctx.type(0)
                    with(typeContext.typeDescriptor().userType().simpleUserType()) {
                        val typeName = this[0].SimpleName()
                        if (typeName.symbol.text == "Unit") {
                            println("Found Unit as explicit return type " +
                            	"in function ${ctx.SimpleName()} at line ${typeName.symbol.line}")
                        }
                    }
                }
            }
        }, context)
    }
    

    Here’s the explanation:

    • Line 4: Check there’s an explicit return type, whatever it is
    • Line 5: Strangely enough, the grammar allows for a multi-valued return type. Just take the first one.
    • Line 6: Follow the parse tree up to the final type name - refer to the above parse tree screenshot for a graphical representation of the path.
    • Line 8: Check that the return type is Unit
    • Line 9: Prints a message in the console. In the next step, we will call the SonarQube API there.

    Running the above code correctly yields the following output:

    Found Unit as explicit return type in function hello1 at line 1
    

    A more advanced check

    In Kotlin, the following snippets are all equivalent:

    fun hello1(name: String): String {
        return "Hello $name"
    }
    
    fun hello2(name: String): String = "Hello $name"
    
    fun hello3(name: String) = "Hello $name"
    

    Note that in the last case, the return type can be inferred by the compiler and omitted by the developer. That would make a good check: in the case of a expression body, the return type should be omitted. The same technique as above can be used:

    1. Display the parse tree from the snippet using grun:

    2. Check for differences. Obviously:
      • Functions that do not have an explicit return type miss a type node in the functionDeclaration tree, as above
      • Functions with an expression body have a functionBody whose first child is = and whose second child is an expression
    3. Refer to the initial grammar, to make sure all cases are covered.
      functionBody
        : block
        | '=' expression
        ;
      
    4. Code!
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    ParseTreeWalker().apply {
        walk(object : KotlinParserBaseListener() {
            override fun enterFunctionDeclaration(ctx: KotlinParser.FunctionDeclarationContext) {
                val bodyChildren = ctx.functionBody().children
                if (bodyChildren.size > 1
                        && bodyChildren[0] is TerminalNode && bodyChildren[0].text == "="
                        && ctx.type().isNotEmpty()) {
                    val firstChild = bodyChildren[0] as TerminalNode
                    println("Found explicit return type for expression body " +
                            "in function ${ctx.SimpleName()} at line ${firstChild.symbol.line}")
                }
            }
        }, context)
    }
    

    The code is pretty self-explanatory and yields the following:

    Found explicit return type for expression body in function hello2 at line 5
    
    Categories: Technical Tags: code qualitySonarQubeKotlinplugin
  • A SonarQube plugin for Kotlin - Paving the way

    SonarQube Continuous Inspection logo

    Since I started my journey into Kotlin, I wanted to use the same libraries and tools I use in Java. For libraries - Spring Boot, Mockito, etc., it’s straightforward as Kotlin is 100% interoperable with Java. For tools, well, it depends. For example, Jenkins works flawlessly, while SonarQube lacks a dedicated plugin. The SonarSource team has limited resources: Kotlin, though on the rise - and even more so since Google I/O 17, is not in their pipe. This post serie is about creating such a plugin, and this first post is about parsing Kotlin code.

    ANTLR

    In the realm of code parsing, ANTLR is a clear leader in the JVM world.

    ANTLR (ANother Tool for Language Recognition) is a powerful parser generator for reading, processing, executing, or translating structured text or binary files. It’s widely used to build languages, tools, and frameworks. From a grammar, ANTLR generates a parser that can build and walk parse trees.

    Designing the grammar

    ANTLR is able to generate parsing code for any language thanks to a dedicated grammar file. However, creating such a grammar from scratch for regular languages is not trivial. Fortunately, thanks to the power of the community, a grammar for Kotlin already exists on Github.

    With this existing grammar, ANTLR is able to generate Java parsing code to be used by the SonarQube plugin. The steps are the following:

    • Clone the Github repository
      git clone [email protected]:antlr/grammars-v4.git
      
    • By default, classes will be generated under the root package, which is discouraged. To change that default:
      • Create a src/main/antlr4/<fully>/<qualified>/<package> folder such as src/main/antlr4/ch/frankel/sonarqube/kotlin/api
      • Move the g4 files there
      • In the POM, remove the sourceDirectory and includes bits from the antlr4-maven-plugin configuration to use the default
    • Build and install the JAR in the local Maven repo
      cd grammars-v4/kotlin
      mvn install
      

    This should generate a KotlinLexer and a KotlinParser class, as well as several related classes in target/classes. As Maven goes, it also packages them in a JAR named kotlin-1.0-SNAPSHOT.jar in the target folder - and in the local Maven repo as well.

    Testing the parsing code

    To test the parsing code, one can use the grun command. It’s an alias for the following:

    java -Xmx500M -cp "<path/to/antlr/complete/>.jar:$CLASSPATH" org.antlr.v4.Tool
    

    Create the alias manually or install the antlr package via Homebrew on OSX.

    With grun, Kotlin code can parsed then displayed in different ways, textual and graphical. The following expects an input in the console:

    cd target/classes
    grun Kotlin kotlinFile -tree
    

    After having typed valid Kotlin code, it yields its parse tree in text. By replacing the -tree option with the -gui option, it displays the tree graphically instead. For example, the following tree comes from this snippet:

    fun main(args : Array<String>) { 
        val firstName : String = "Adam"
        val name : String? = firstName 
        print("$name") 
    }
    

    In order for the JAR to be used later in the SonarQube plugin, it has been deployed on Bintray. In the next post, we will be doing proper code analysis to check for violations.

    Categories: Technical Tags: code qualitySonarQubeKotlinpluginANTLR
  • It depends

    Light bulb

    In the industry, there’s this widespread joke. Whatever the question you ask a consultant, the answer will be:

    It depends.

    This joke is meant to highlight that consultants never give a straight answer to a simple question, because they don’t want to take any responsibility. While I understand the frustration of the business when faced with this situation, I’d like to write about the other side of the fence.

    Let’s enlarge the scope: in the IT industry, it’s probably more about every developer rather than only consultants. But why is this answer so common? For one single reason: if the question - deemed simple by the person asking, doesn’t provide enough context - and they rarely do, then the answer cannot be any other.

    Let’s highlight that with a simple question:

    What’s the best transportation?

    Guess what? There’s no right answer to that, because no context has been provided. For example, context parameters that could help refine the answer include:

    • the distance, do you expect to move 200 meters or 200 kilometers?
    • the nature of the terrain, road vs. land vs. water
    • the time constraint
    • the expected comfort level
    • the cost
    • the physical condition - it’s hard to consider running/biking when in bad shape
    • for some, the pollution
    • etc.

    I hope this pretty down-to-earth example makes those who ask questions realize that the quality of the answer is heavily constrained by the quality of the question - garbage in, garbage out.

    I could stop there and everyone would ponder how stupid the business is, but every coin has 2 sides. In general, developers don’t react in a constructive way to context-less questions: "It depends", "I don’t know", "I have no clue", "Nobody can answer that", or even "That’s a stupid question" are comments I’ve heard already. If you’re a developer and reading this post, know it’s your job to guide the business. What do they know about scalability, databases, and so on? Probably nothing because it’s not their job. So, pro-actively guide them through questions. With the example above, the following could be an actual dialog:

    "What’s the best transportation?
    - Where do you want to go?
    - Munich
    - Ok, what’s your starting point?
    - Home
    - Ah…​ Where’s you home located?
    - Germany
    - Which city in Germany?
    - etc."

    I’m very aware that most developers - including myself, don’t like to talk to the business so much. I mean, that’s the reason most of us chose to work in IT, because we are more comfortable talking to computers than people. But in essence, it changes nothing.

    Be professional, ask for context instead of ditching the question.

    Categories: Miscellaneous Tags: communication