Monthly Archives: April 2017

Creating Smaller Java Image using Docker Multi-stage Build

Two of the announcements at DockerCon 2017 directly relevant to Java developers are:

This blog explains the purpose of Docker multi-stage build and provide examples of how they help us generate smaller and more efficient Java Docker images.

Docker Multi-stage Build

Just show me the code: github.com/arun-gupta/docker-java-multistage.

What is the issue?

Building a Docker image for a Java application typically involves building the application and package the generated artifact into an image. A Java developer would likely use Maven or Gradle to build a JAR or WAR file. If you are using the Maven base image to build the application then it will download the required dependencies from the configured repositories and keep them in the image. The number of JARs in the local repository could be significant depending upon the number of dependencies in the pom.xml. This could leave a lot of cruft in the image.

Let’s take a look at a sample Dockerfile:

In this Dockerfile:

  • maven:3.5-jdk-8 is used as the base image
  • Application source code is copied to the image
  • Maven is used to build the application artifact
  • WildFly is downloaded and installed
  • Generated artifact is copied to the deployments directory of WildFly
  • Finally, WildFly is started

There are several issues with this kind of flow:

  • Using maven as the base image restricts on what functionality is available in the image. This requires WildFly to be downloaded and configured explicitly.
  • Building the artifact downloads all Maven dependencies. These stay in the image and are not needed at runtime. This causes an unnecessary bloat in the image size at runtime.
  • Change in WildFly version will require to update the Dockerfile. This would’ve been much easier if we could use the jboss/wildfly base image by itself.
  • In addition, unit tests may run before packaging the artifact and integration tests after the image is created. The test dependencies and results is again not needed to live in the production image.

There are other ways to build the Docker image. For example, splitting the Dockerfile into two files. The first file will then build the artifact and copy the artifact to a common location using volume mapping. The second file will then pick up the generated artifact and then use the lean base image. This approach has also has issues where multiple Dockerfiles need to be maintained separately. Additional, there is an out-of-band hand-off between the two Dockerfiles.

Let’s see how these issues are resolved with multi-stage build.

What are Docker multi-stage build?

Multi-stage build allows multiple FROM statements in a Dockerfile. The instructions following each FROM statement and until the next one, creates an intermediate image. The final FROM statement is the final base image. Artifacts from intermediate stages can be copied using COPY --from=<image-number>, starting from 0 for the first base image. The artifacts not copied over are discarded. This allows to keep the final image lean and only include the relevant artifacts.

FROM syntax is updated to specify stage name using as <stage-name>. For example:

This allows to use the stage name instead of the number with --from option.

Let’s take a look at a sample Dockerfile:

In this Dockerfile:

  • There are two FROM instructions. This means this is a two-stage build.
  • maven:3.5-jdk-8 is the base image for the first build. This is used to build the WAR file for the application. The first stage is named as BUILD.
  • jboss/wildfly:10.1.0.Final is the second and the final base image for the build. WAR file generated in the first stage is copied over to this stage using COPY --from syntax. The file is directly copied in the WildFly deployments directory.

Let’s take a look at what are some of the advantages of this approach.

Advantages of Docker multi-stage build

  • One Dockerfile has the entire build process defined. There is no need to have separate Dockerfiles and then coordinate transfer of artifact between “build” Dockerfile and “run” Dockerfile using volume mapping.
  • Base image for the final image can be chosen appropriately to meet the runtime needs. This helps with reduction of the overall size of the runtime image. Additionally, the cruft from build time is discarded during intermediate stage.
  • Standard WildFly base image is used instead of downloading and configuring the distribution manually. This makes it a lot easier to update the image if a newer tag is released.

Size of the image built using a single Dockerfile is 816MB. In contrast, the size of the image built using multi-stage build is 584MB.

Docker Multi-stage Java Image

So, using a multi-stage helps create a much smaller image.

Is this a typical way of building Docker image? Are there other ways by which the image size can be reduced?

Sure, you can use docker-maven-plugin as shown at github.com/arun-gupta/docker-java-sample to build/test the image locally and then push to repo. But this mechanism allows you to generate and package artifact without any other dependency, including Java.

Sure, maven:jdk-8-alpine image can be used to create a smaller image. But then you’ll have to create or find a WildFly image built using jdk-8-alpine, or something similar, as well. But the cruft, such as maven repository, two Dockerfiles, sharing of artifact using volume mapping or some other similar technique would still be there.

There are other ways to craft your build cycle. But if you are using Dockerfile to build your artifact then you should seriously consider multi-stage builds.

Read more discussion in PR #31257.

As mentioned earlier, the complete code for this is available at github.com/arun-gupta/docker-java-multistage.

Sign up for Docker Online Meetup to get a DockerCon 2017 recap.

Docker Remote API on Windows and OSX

There are multiple ways to monitor Docker containers.

  • Docker CLI provides the docker container stats API that gives basic information about the running containers.
  • Docker Remote API provides more detailed information about the containers.
  • Starting with Docker 1.13, there is an experimental feature with a Prometheus endpoint
  • cAdvisor is an open source tool that provides last container usage and performance characteristics. This data can be stored in a time series database, such as InfluxDB. This data can then be shown in a fancy graph using a Kibana dashboard.

These options were covered in detail in an earlier blog.

There are other commercial options like Docker EE, Sysdig, Datadog, New Relic, App Dynamics and others. If you are running containers on AWS, then CloudWatch can provide integrated monitoring.

OSX is my primary development platform. But recently, I needed a way to monitor Docker containers using the Remote API (aka REST API) on a Windows machine. The output of the REST API is exactly same independent of the operating system. But the way to access the Docker REST API using curl is different on an OSX and a Windows machine. This blog will explain how to exactly access this API on these two operating systems.

Check out 1.27 swagger specification to learn more about the capabilities of the REST API. A nicer and a more readable version of the API can be seen using Swagger UI. This is broken until #32649 is fixed.

Complete details about how the REST API corresponds to different Docker versions is explained in Docker REST API Versioning.

We’ll dig into this a bit later but first let’s take a look on how this API can be accessed.

Docker Remote API on OSX

On OSX, curl connects using a Unix domain socket as shown:

A WildFly container can be started as:

Stats can then be obtained using the command:

This will start printing stats as:

These stats are refreshed every second.

Any other REST API can be invoked very easily with this. This is simple and straight forward.

Install Docker on Windows

Docker installation on Windows depends on the flavor of your operating system.

Are you using Windows 10+ Pro 64-bit, then use Docker for Windows.

If using any older version of Windows, then you need to use Docker Toolbox.

Are you installing Windows in a virtual machine? Then Virtual Box cannot be used to create the VM. This is because Virtual Box does not support nested virtualization. This is required as Docker Toolbox uses Virtual Box to create and start Docker Machine. VMWare Fusion seems to work fine here though.

Now that you know that VMWare Fusion needs to be used, make sure you enable nested virtualization before starting the virtual machine.

Many thanks to Stefan Scherer (@stefscherer) for helping me understand the Windows configuration details.

Let’s see how the Docker REST API can now be invoked.

Docker Remote API on Windows 7/8

This section shows how to invoke the REST API using curl on Windows 7/8.

The REST API can be invoked using curl as shown:

First of all, the REST API, /containers/<name-or-id>/stats, is exactly the same. The way this API can be invoked is a bit different.

There are four differences: The first three parameters specify security credentials for the Docker Machine generated by Docker Toolbox on your Windows box:

  • <CERT> is the SSL certificate for Docker Machine
  • <CA_CERT> is the Certificate Authority certificate for the Docker Machine
  • <KEY> is the key generated for the Docker Machine

The values of these configuration parameters is those generated by docker-machine CLI.

The last change is that the protocol is changed from http to https.

Here is the exact command that worked on Windows 7 VM:

This invocation will print the exact same stats output on Windows 7 VM.

Now that you know how to use this API on OSX and Windows, you can also this API to do everything that Docker CLI. This is because the Docker CLI is just a convenient wrapper over the REST API. So a docker container run command is invoking the appropriate REST API on the Docker Host.

Docker Remote API on Windows 10

If you are using Windows 10, then use Docker for Windows. After that, you need to figure out which curl command to be used. There are couple of options:

  • Use Bash shell on Windows. It has curl command that works like Unix command that we all know pretty well. In this case, the REST API can be invoked as:
    Docker for Windows listens on port 2375 on Windows.
  • If you are Powershell user, then install curl command as:
    Now the command to invoke the REST API is:
    Note, there is a curl alias in Powershell and that is an alias for Invoke-WebRequest. So make sure to use curl.exe to invoke the REST API as this is the command installed using Chocolatey.

This blog provided different options of how to invoke the Docker Remote API using curl on Windows and OSX.

 

Bye Bye Couchbase, Hello Amazon Web Services!

After spending a little over 18 months at Couchbase, the future is cloudy, very cloudy!

Friday, April 7th, 2017, was my last day at Couchbase. This Monday, April 10th, 2017, is my first day at Amazon.

aws

What will I be doing?

I’ll be part of the newly formed Open Source team at Amazon Web Services. I’m super excited to be working with Adrian Cockroft (@adrianco) and Zaheda Bhorat (@zahedab).

As a Principal Open Source Technologist, my initial focus will be to make sure AWS continues to be the best platform for running your containerized solutions. Yes, we’d like you to use EC2 Container Service. But if you want to use Docker, Kubernetes, DC/OS or any other open source orchestration framework, so be it! We will continue to work with our partners and the open source community, including contributing to these projects, to make sure AWS remains the best place to run your containerized workloads.

In addition, there are numerous other opportunities around open source and AWS like mxnet, Blox and likely many more to be created.

Why change?

I had a lot of fun working at a Silicon Valley startup. The amount of learning in terms of implementing the pipeline from adoption -> engagement -> monetization was immense. Working with different teams very closely, learning their machinery and helping them understand the relevance of community was quite a thrilling experience. Having significant part of the company colocated in a single location allowed a different level of interaction altogether. Working with Developer Advocacy team to meet, and quite often exceed, the metrics every month was a lot of fun.

However, for those who’ve followed me speaking at conferences and read my content over the past couple of years know that I’m passionate about containers. As Oprah Winfrey said:

Passion is energy!

Feel the power that comes from focusing on what excites you

This opportunity at AWS allows me to follow my heart and passion.

Some other quotes that truly symbolize my state of mind at this time …

passion1 passion3

This personal change is by no means any indication on the quality of Couchbase products. Both Couchbase Server and Couchbase Mobile are very well positioned for enterprise adoption. N1QL allows database developers to leverage their SQL skills and apply them to a NoSQL document database. Couchbase Mobile is a unique offering that provides offline capability for mobile applications and synchronization with a backend database when online. It will continue to blaze new trails and bring new types of customers. I wish all of them good luck!

A popular saying is “change is the only constant”. And so here I go again making another change in my career. Looking forward to see you at conferences and meetups around the world.

This also means, the blog title will change to Miles to go 4.0 now (2.0, 3.0)!

Where will you see me?

Some upcoming speaking engagements are DockerCon (Austin), GIDS (Bangalore), OSCON (Austin). Amazon Web Services is a gold sponsor at DockerCon and OSCON and so you can find me at the booth as well.

You’ll also see me at some AWS Summits and re:Invent.

And of course, you can follow me on twitter @arungupta to find out what’s keeping me busy!

In the meanwhile, here are some links for you to learn more about AWS:

Looking forward to AWSome and exciting weeks/months/years ahead!