Apr 5, 2020 / KUBERNETES, CONTROLLER, JAVA, GRAALVM

Your own Kubernetes controller - Improving and deploying

In the first post of this series, we described the concept behind a Kubernetes controller. In short, it’s just a plain control loop that reconciles the desired state of the cluster with its current state. In the second post, we implemented a sidecar controller in Java. This third and last post will be focused on where to deploy this Java controller and how to improve it to be on par with a Go one.

Running outside the cluster or inside?

As mentioned in the first post, there’s no requirement regarding the location of a controller. It can run outside the cluster, as long as it’s able to communicate with it. By default, both the official Kubernetes client and the Fabric8 client try to use the credentials stored in the ~/.kube/config configuration file. That means that if a specific user can access the cluster using the kubectl command, it can run the controller as well.

The deployment itself can take any form: a standalone JAR, a webapp deployed in an application server, or even directory with a bunch of compiled classes. The downside of this approach is that one should take care of all usual tasks associated with the chosen approach.

On the other hand, running a containerized app inside a Kubernetes cluster has many benefits: automation, monitoring, auto-scaling, self-healing, etc. There’s no reason why our controller shouldn’t benefit from those features as well. In order to go along with this approach, we need to containerize our controller first.

Containerizing the controller

The most straightforward way to containerize a simple Java application is with the Jib plugin. It’s available for Maven (and Gradle); it’s compatible with plain Java applications, as well as Spring Boot and Micronaut applications. Moreover, the resulting image is organized in different layers: the top layer contains the classes, while the underlying layer contains the libraries. The image takes care of managing the classpath of the "exploded" app. This approach speeds up the image generation: when classes are updated with new bytecode, only the topmost layer is replaced.

Here’s a sample Jib configuration:

<plugin>
    <groupId>com.google.cloud.tools</groupId>
    <artifactId>jib-maven-plugin</artifactId>
    <version>1.8.0</version>
    <configuration>
        <from>
            <image>gcr.io/distroless/java:debug</image>       (1)
        </from>
        <to>
            <image>jvm-operator:${project.version}</image>    (2)
        </to>
    </configuration>
    <executions>
        <execution>
            <phase>compile</phase>                            (3)
            <goals>
                <goal>dockerBuild</goal>                      (4)
            </goals>
        </execution>
    </executions>
</plugin>

1	The default parent image provides no shell. To allow for debugging with shell access, one should use an image appended `:debug`.
2	The target image is tagged with the version from the POM
3	The plugin is bound to run during the `compile` phase. Note that since the image runs the app in exploded format, the `package` phase is not necessary.
4	Two goals are available: `build` and `dockerBuild`. The former doesn’t require a local Docker installation, and uploads the image to a configured DockerHub account. The latter builds the image to a local Docker daemon.

At this point, it’s easy to write a Kubernetes configuration:

deploy.yml

apiVersion: v1
kind: Pod
metadata:
  namespace: jvmoperator
  name: custom-operator
spec:
  containers:
    - name: custom-operator
      image: jvm-operator:1.10
      imagePullPolicy: Never

The above snippet schedules a simple Pod for brevity reasons. A real-world configuration would involve a Deployment.

Applying it is a no-brainer:

kubectl apply -f deploy.yml

Unfortunately, the result of the above command is a failure, with the following stack trace:

java.net.ProtocolException: Expected HTTP 101 response but was '403 Forbidden'
  at okhttp3.internal.ws.RealWebSocket.checkResponse(RealWebSocket.java:229)
  at okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:196)
  at okhttp3.RealCall$AsyncCall.execute(RealCall.java:203)
  at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
  at java.lang.Thread.run(Thread.java:748)

Adding authorizations

The reason for that failure is that when running as a pod inside the cluster, the application has no specific privilege. Sending commands to the Kubernetes API server is seen as a security risk, and rightly so. By default, every request to the API server ends up in a HTTP error. Thus, the container needs to be given the adequate permissions:

---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole                           (1)
metadata:
  namespace: jvmoperator
  name: operator-example
rules:
  - apiGroups:
      - ""
    resources:
      - pods                                (2)
    verbs:
      - watch                               (3)
      - create                              (3)
      - delete                              (3)
---
apiVersion: v1
kind: ServiceAccount                        (4)
metadata:
  name: operator-service
---
kind: ClusterRoleBinding                    (5)
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: operator-example
subjects:
  - kind: ServiceAccount
    name: operator-service
    namespace: jvmoperator
roleRef:
  kind: ClusterRole
  name: operator-example
  apiGroup: rbac.authorization.k8s.io

1	Create a named role, with the listed permissions
2	Resources on which permissions are given
3	List of required permissions
4	Create a service account
5	Bind the role to the service account

The authorization model in Kubernetes is based on RBAC. In all honesty, it’s a huge beast. Please refer to the relevant documentation if you want to have more than just an overview.

Once the above snippet has been applied, the pod can run with the newly created service account. This needs a slight adjustment to the Pod configuration:

apiVersion: v1
kind: Pod
metadata:
  namespace: jvmoperator
  name: custom-operator
spec:
  serviceAccountName: operator-service      (1)
  containers:
    - name: custom-operator
      image: jvm-operator:1.8
      imagePullPolicy: Never

1	The magic happens here: the pod now is associated with the `operator-service` service account

Pitfalls of containerizing JVM-based applications

Early versions of the Java Virtual Machinen returned the memory and the number of cores of the host, not of the container. This lead to OutOfMemoryError, because the JVM tried to claim memory that was not there to start with. In turn, Kubernetes killed what it perceived as faulty pods. If those were part of replica sets or deployments, new ones were spawned. While working in a sense, that was far from ideal. From JDK 10 on, this bug has been resolved (and backported to prior versions down to 8 as well).

On one side, the JVM platform is able to adapt the application’s compiled code according to the workload. This is a benefit compared to statically-compiled native executables. On the flip side, the platform requires a lot of additional memory to achieve that. Also, it’s a well-known fact that the startup time of the JVM is quite long. Additionally, because adapting the compiled code requires time, performance are not on par for some time after startup. That’s the reason why performance metrics on the JVM should always be measured after a lengthy warmup-time. Finally, the size of the container is much bigger compared to the size of a native executable, as it embeds the JVM platform itself.

REPOSITORY            TAG          IMAGE ID            CREATED             SIZE
jvm-operator          1.8          bdaa419c75e2        50 years ago        141MB

For all those reasons, the JVM platform is not a great foundation for containerized apps.

Coping with the JVM limitations

There are two ways to cope with the limitations described above:

One way is to make use of the Java Platform Module System introduced in Java 9. From version 11 onwards, the JDK provides a way to make a native executable containing only used modules, discarding the rest. It makes the executable’s size smaller.
Another way is to use SubstrateVM, part of Graal VM.

In a few words:

Substrate VM is a framework that allows Ahead-Of-Time compilation of Java applications under closed-world assumption into executable images.

With Graal VM, one can:

Package the application into a single fat JAR
Create a native executable from the JAR
Containerize the native executable

Unfortunately, Jib has no configuration option with GraalVM yet. It’s time to move the build to a proper multi-stage Dockerfile to:

Build the JAR
Create the native executable from the JAR
Containerize the JAR

ARG VERSION=1.10

FROM zenika/alpine-maven:3 as build
COPY src src
COPY pom.xml pom.xml
RUN mvn package

FROM oracle/graalvm-ce:19.2.1 as native
ARG VERSION
COPY --from=build /usr/src/app/target/jvm-operator-$VERSION.jar \
                  /var/jvm-operator-$VERSION.jar
WORKDIR /opt/graalvm
RUN gu install native-image \                                           (1)
 && native-image -jar /var/jvm-operator-$VERSION.jar \                  (2)
 && mv jvm-operator-$VERSION /opt/jvm-operator-$VERSION

FROM scratch                                                            (3)
ARG VERSION
WORKDIR /home
COPY --from=native /opt/jvm-operator-$VERSION operator
ENTRYPOINT ["./operator"]

1	SubstrateVM is not included by default in the Graal VM distribution. It first needs to be installed.
2	Execute the `native-image` process on the previously created JAR
3	Start from the `scratch` image (i.e. empty image). This requires the native image compilation process to also package dependency libraries in the executable via the `--static` flag

This allows to divide the size of the final container by a factor of 3.

REPOSITORY            TAG          IMAGE ID            CREATED             SIZE
jvm-operator          1.10         340d4d9a767e        6 weeks ago         52.7MB

Note that SubstrateVM allows a lot of configuration options. For the above to work, those were set inside the codebase itself. Here’s the whole configuration for information:

native-image.properties

Args=  -J-Xmx3072m \
       --static \
       --allow-incomplete-classpath \
       --no-fallback \
       --no-server \
       -H:EnableURLProtocols=https \
       -H:ConfigurationFileDirectories=/var/config

Coping with reflection

Note that the AOT process has several limitations beyond reflection. Depending on how the underlying code was written, it might be subject to more than just that. In somes cases, there are different ways to fix that. Those will be covered in a future post: let’s focus on reflection proper for now.

In Java, some of the underlying code relies more or less on runtime-based reflection. Unfortunately, that means that at compile-time, SubstrateVM will remove code that it deems is not required, though it is. This can be configured, though, via JSON files. Given the amount of calls relying on reflection, manual configuration is a daunting task.

SubstrateVM offers a better alternative, though: it provides a Java agent that can set on the command-line of the running controller. This agent intercepts every reflection call made inside the controller application, and records it in a dedicated reflect-config.json file. That first requires to build a container image with this agent, and to access every nook and cranny of the containerized app. Leave no stone unturned!

On a later stage, this file (along with other similar ones) can be fed to the compilation process, so that the code accessed through reflection is kept. One way to feed them is through the command-line. Another one is to package them inside the JAR in a dedicated folder: this allows libraries' providers to offer AOT-compatible JARs and should be the preferred way.

Additional steps might be required depending on the specific application. For more information, please check How to cope with incompatible code in Graal VM AOT compilation.

Conclusion

In this series, we detailed what is a Kubernetes controller. By developing our own, we proved it’s not a daunting task. Icing on the cake, one can reuse one’s technology stack and associated tools.

In this final post, we scheduled the Java controller developed earlier on a Kubernetes cluster. In order to achieve that, we used Graal VM to create a native executable. While it makes the build process more complex, using such native executable removes some limitations that come with the JVM platform: it drastically decreases the image size, the memory consumption, as well as the startup time.

The complete source code for this post can be found on Github in Maven format.

To go further:

Follow me Follow me