Building a Docker image for a Java application typically involves building the application and package the generated artifact into an image. A Java developer would likely use Maven or Gradle to build a JAR or WAR file. If you are using the Maven base image to build the application then it will download the required dependencies from the configured repositories and keep them in the image. The number of JARs in the local repository could be significant depending upon the number of dependencies in the pom.xml. This could leave a lot of cruft in the image.
Let’s take a look at a sample Dockerfile:
RUN mvn-f/usr/src/myapp/pom.xml clean package
RUN cd$WILDFLY_HOME&&curl http://download.jboss.org/wildfly/$WILDFLY_VERSION/wildfly-$WILDFLY_VERSION.tar.gz | tar zx && mv $WILDFLY_HOME/wildfly-$WILDFLY_VERSION $WILDFLY_HOME/wildfly
Generated artifact is copied to the deployments directory of WildFly
Finally, WildFly is started
There are several issues with this kind of flow:
Using maven as the base image restricts on what functionality is available in the image. This requires WildFly to be downloaded and configured explicitly.
Building the artifact downloads all Maven dependencies. These stay in the image and are not needed at runtime. This causes an unnecessary bloat in the image size at runtime.
Change in WildFly version will require to update the Dockerfile. This would’ve been much easier if we could use the jboss/wildfly base image by itself.
In addition, unit tests may run before packaging the artifact and integration tests after the image is created. The test dependencies and results is again not needed to live in the production image.
There are other ways to build the Docker image. For example, splitting the Dockerfile into two files. The first file will then build the artifact and copy the artifact to a common location using volume mapping. The second file will then pick up the generated artifact and then use the lean base image. This approach has also has issues where multiple Dockerfiles need to be maintained separately. Additional, there is an out-of-band hand-off between the two Dockerfiles.
Let’s see how these issues are resolved with multi-stage build.
What are Docker multi-stage build?
Multi-stage build allows multiple FROM statements in a Dockerfile. The instructions following each FROM statement and until the next one, creates an intermediate image. The final FROM statement is the final base image. Artifacts from intermediate stages can be copied using COPY --from=<image-number>, starting from 0 for the first base image. The artifacts not copied over are discarded. This allows to keep the final image lean and only include the relevant artifacts.
FROM syntax is updated to specify stage name using as <stage-name>. For example:
This allows to use the stage name instead of the number with --from option.
There are two FROM instructions. This means this is a two-stage build.
maven:3.5-jdk-8 is the base image for the first build. This is used to build the WAR file for the application. The first stage is named as BUILD.
jboss/wildfly:10.1.0.Final is the second and the final base image for the build. WAR file generated in the first stage is copied over to this stage using COPY --from syntax. The file is directly copied in the WildFly deployments directory.
Let’s take a look at what are some of the advantages of this approach.
Advantages of Docker multi-stage build
One Dockerfile has the entire build process defined. There is no need to have separate Dockerfiles and then coordinate transfer of artifact between “build” Dockerfile and “run” Dockerfile using volume mapping.
Base image for the final image can be chosen appropriately to meet the runtime needs. This helps with reduction of the overall size of the runtime image. Additionally, the cruft from build time is discarded during intermediate stage.
Standard WildFly base image is used instead of downloading and configuring the distribution manually. This makes it a lot easier to update the image if a newer tag is released.
Size of the image built using a single Dockerfile is 816MB. In contrast, the size of the image built using multi-stage build is 584MB.
So, using a multi-stage helps create a much smaller image.
Is this a typical way of building Docker image? Are there other ways by which the image size can be reduced?
Sure, you can use docker-maven-plugin as shown at github.com/arun-gupta/docker-java-sample to build/test the image locally and then push to repo. But this mechanism allows you to generate and package artifact without any other dependency, including Java.
Sure, maven:jdk-8-alpine image can be used to create a smaller image. But then you’ll have to create or find a WildFly image built using jdk-8-alpine, or something similar, as well. But the cruft, such as maven repository, two Dockerfiles, sharing of artifact using volume mapping or some other similar technique would still be there.
There are other ways to craft your build cycle. But if you are using Dockerfile to build your artifact then you should seriously consider multi-stage builds.
cAdvisor is an open source tool that provides last container usage and performance characteristics. This data can be stored in a time series database, such as InfluxDB. This data can then be shown in a fancy graph using a Kibana dashboard.
There are other commercial options like Docker EE, Sysdig, Datadog, New Relic, App Dynamics and others. If you are running containers on AWS, then CloudWatch can provide integrated monitoring.
OSX is my primary development platform. But recently, I needed a way to monitor Docker containers using the Remote API (aka REST API) on a Windows machine. The output of the REST API is exactly same independent of the operating system. But the way to access the Docker REST API using curl is different on an OSX and a Windows machine. This blog will explain how to exactly access this API on these two operating systems.
This invocation will print the exact same stats output on Windows 7 VM.
Now that you know how to use this API on OSX and Windows, you can also this API to do everything that Docker CLI. This is because the Docker CLI is just a convenient wrapper over the REST API. So a docker container run command is invoking the appropriate REST API on the Docker Host.
Docker Remote API on Windows 10
If you are using Windows 10, then use Docker for Windows. After that, you need to figure out which curl command to be used. There are couple of options:
Use Bash shell on Windows. It has curl command that works like Unix command that we all know pretty well. In this case, the REST API can be invoked as:
Docker for Windows listens on port 2375 on Windows.
If you are Powershell user, then install curl command as:
Docker 1.13 introduced a new version of Docker Compose. The main feature of this release is that it allow services defined using Docker Compose files to be directly deployed to Docker Engine enabled with Swarm mode. This enables simplified deployment of multi-container application on multi-host.
This blog will show use a simple Docker Compose file to show how services are created and deployed in Docker 1.13.
Here is a Docker Compose v2 definition for starting a Couchbase database node:
This definition can be started on a Docker Engine without Swarm mode as:
This will start a single replica of the service define in the Compose file. This service can be scaled as:
docker-compose scale db=2
If the ports are not exposed then this would work fine on a single host. If swarm mode is enabled on on Docker Engine, then it shows the message:
WARNING:The Docker Engine you're using isrunning inswarm mode.
Compose does notuseswarm mode todeploy services tomultiple nodes inaswarm.All containers will be scheduled on the current node.
Todeploy your application across the swarm,use`docker stack deploy`.
Docker Compose gives us multi-container applications but the applications are still restricted to a single host. And that is a single point of failure.
Swarm mode allows to create a cluster of Docker Engines. With 1.13, docker stack deploy command can be used to deploy a Compose file to Swarm mode.
The list of containers running within the service can be seen using docker service ps command:
ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS
rchu2uykeuuj couchbase_db.1arungupta/couchbase:latest moby Running Running52seconds ago
In this case, a single container is running as part of the service. The node is listed as moby which is the default name of Docker Engine running using Docker for Mac.
The service can now be scaled as:
docker service scale couchbase_db=2
The list of container can then be seen again as:
ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS
rchu2uykeuuj couchbase_db.1arungupta/couchbase:latest moby Running Running3minutes ago
kjy7l14weao8 couchbase_db.2arungupta/couchbase:latest moby Running Running23seconds ago
Note that the containers are given the name using the format <service-name>_n. Both the containers are running on the same host.
Also note, the two containers are independent Couchbase nodes and are not configured in a cluster yet. This has already been explained at Couchbase Cluster using Docker and a refresh of the steps is coming soon.
A service will typically have multiple containers running spread across multiple hosts. Docker 1.13 introduces a new command docker service logs <service-name> to stream the log of service across all the containers on all hosts to your console. In our case, this can be seen using the command docker service logs couchbase_db and looks like:
The preamble of the log statement uses the format <container-name>.<container-id>@<host>. And then actual log message from your container shows up.
At first instance, attaching container id may seem redundant. But Docker services are self-healing. This means that if a container dies then the Docker Engine will start another container to ensure the specified number of replicas at a given time. This new container will have a new id. And thus it allows to attach the log message from the right container.
Amazon Web Services introduced Serverless Application Model, or SAM, a couple of months ago. It defines simplified syntax for expressing serverless resources. SAM extends AWS CloudFormation to add support for API Gateway, AWS Lambda and Amazon DynamoDB. This blog will show how to create a simple microservice using SAM. Of course, we’ll use Couchbase instead of DynamoDB!
Defines two resources, both of Lambda Function type identified by AWS::Serverless::Function attribute. Name of the Lambda function is defined by Resources.<resource>.
Class for each handler is defined by the value of Resources.<resource>.Properties.Handler attribute
Java 8 runtime is used to run the Function defined by Resources.<resource>.Properties.Runtime attribute
Code for the class is uploaded to an S3 bucket, in our case to s3://serverless-microservice/microservice-http-endpoint-1.0-SNAPSHOT.jar
Resources.<resource>.Properties.Environment.Variables.COUCHBASE_HOST attribute value defines the host where Couchbase is running. This can be easily deployed on EC2 as explained at Setup Couchbase.
Each Lambda function is triggered by an API. It is deployed using AWS API Gateway. The path is defined by Events.GetResource.Properties.Path. HTTP method is defined using Events.GetResource.Properties.Method attribute.
The aws lambda create-event-source-mapping CLI allows to create an event source for Lambda function. As of AWS CLI version 1.11.21, only Amazon Kinesis stream or an Amazon DynamoDB stream can be used. But for this blog, we’ll use IoT button as a trigger. And this has to be configured using AWS Lambda Console.
IoT Button is only supported in a limited number of regions. For example, it is not supported in the us-west-1 region but us-west-2 region works.
The list of regions not supported are greyed out in the following list:
Serverless architecture runs custom code in ephemeral containers that are fully managed by a 3rd party. The custom code is typically a small part of a complete application. It is also called as function. This gives another name for serverless architecture as Function as a Service (FaaS). The container is ephemeral because it may only last for one invocation. The container may be reused but that’s not something you can rely upon. As a developer, you upload the code to FaaS platform, the service then handles all the capacity, scaling, patching and administration of the infrastructure to run your code.
An application built using Serverless Architecture follows the event-driven approach. For example, an activity happened in the application such as a click. This is
This is very different from a classical architecture where the application code is typically deployed in an application server such as Tomcat or WildFly. Scaling your application means starting additional instances of the application server or spinning up additional containers with the packaged application server. The Load Balancer need to be updated with the new IP addresses. Operating system need to be patched, upgraded and maintained.
Serverless Architectures explain the difference between the classical programming model and this new serverless architecture.
FaaS platform takes your application is divided into multiple functions. Each function is deployed in FaaS. The service spins up additional compute instances to meet the scalability demands of your application. FaaS platform provides the execution environment and takes care of starting and tearing down the containers to run your function.
This blog will show how to write your first AWS Lambda function.
What is AWS Lambda?
AWS Lambda is FaaS service from Amazon Web Services. It runs your code on a high-availability compute infrastructure and performs all of the administration of the compute resources, including server and operating system maintenance, capacity provisioning and automatic scaling, code monitoring and logging.
AWS Lambda charges you for the duration your code runs in increments of 100ms. There is no cost associated with storing the Lambda function in AWS. First million requests per month are free and the pricing after that is nominal. Read more details on Lambda pricing. It also provides visibility into performance by providing real time metrics and logs to AWS CloudWatch. All you need to do is write the code!
Also checkout Serverless Architectural Patterns and Best Practices from AWS ReInvent 2016:
The code you run on AWS Lambda is called a Lambda Function. You upload your code as a zip file or design it using the AWS Lambda Management Console. There is a built-in support for AWS SDK and this simplifies the ability to call other AWS services.
In short, Lambda is scalable, serverless, compute in the cloud.
handleRequest method is where the function code is implemented. Context provides useful information about Lambda execution environment. Some of the information from the context is stored a JSON document. Finally, Couchbase Java SDK API upsert is used to write a JSON document to the identified Couchbase instance. Couchbase on Amazon EC2 provide complete instructions to install Couchbase on AWS EC2.
Information about the Couchbase server is obtained as:
This is once again using Couchbase Java API CouchbaseCluster as a main entry point to the Couchbase cluster. The COUCHBASE_HOST environment variable is passed when the Lambda function is created. In our case, this would point to a single node Couchbase cluster running on AWS EC2. Environment variables were recently introduced in AWS Lambda.
AWS Lambda function needs a deployment package. This package is either a .zip or .jar file that contains all the dependencies of the function. Our application is packaged using Maven, and so we’ll use a Maven plugin to create a deployment package.
The application has pom.xml with the following plugin fragment:
--function-name provides the function name. The function name is case sensitive.
--role specifies Amazon Resource Name (ARN) of an IAM role that Lambda assume when it executes your function to access any other AWS resources. If you’ve executed a Lambda function using AWS Console then this role is created for you.
--zip-file points to the deployment package that was created in previous step. fileb is an AWS CLI specific protocol to indicate that the content uploaded is binary.
--handler is the Java class that is called to begin execution of the function
--publish request AWS Lambda to create the Lambda function and publish a version as an atomic operation. Otherwise multiple versions may be created and may be published at a later point.
Lambda Console shows:
Test AWS Lambda Function
Test the AWS Lambda Function using AWS CLI.
aws lambda invoke\
It shows the output as:
The output from the command is stored in hellocouchbase.out and looks like:
Invoking this function stores a JSON document in Couchbase. Documents stored in Couchbase can be seen using Couchbase Web Console. The password is Administrator and password is the EC2 instance id.
All data buckets in this Couchbase instance are shown below:
Note that the serverless bucket is manually created.
Clicking on Documents shows details of different documents stored in the bucket:
Clicking on each document shows more details about the JSON document:
Lambda function can also be tested using the Console:
Update AWS Lambda Function
If the application logic changes then a new deployment package needs to be uploaded for the Lambda function. In this case, mvn package will create a deployment package and aws lambda CLI command is used to update the function code:
During writing of this blog, this was often used to debug the function as well. This is because Lambda functions do not have any state or box associated with them. And so you cannot log in to a box to check out if the function did not deploy correctly. You can certainly use CloudWatch log statements once the function is working.
By default, this command display statistics for all the running containers. A list of container names or ids can be specified, separated by a space, to restrict the stream to a subset of running containers.
For example, stats for only the Couchbase container can be seen as:
Docker daemon provides a Remote REST API. This API is used by the Client to communicate with the engine. This API can be also be invoked by by other tools, such as curl or Chrome Postman REST Client. If you are creating Docker daemons using Docker Machine on OSX Mavericks, then getting this API to work is a bit tricky.
Docker Universal Control Plane (DUCP) allows to manage and deploy Dockerized distributed applications, all from within the firewall. It integrates with key systems like LDAP/AD to manage users and provides and interface for IT operations teams to deploy and manage. RBAC, SSO integration with Docker Trusted Registry, simple and easy to use web UI are some of the key features. Read product overview for complete set of features.
Have you felt the need to run Docker containers on Amazon?
Amazon Container Service requires extensive setup and manual work. This is meant for programmers who have plenty of time and willing to debug through multiple steps. For mundane programmers, like me, who like simple and easy to use steps, there is Docker Tutum!
What is Docker Tutum?
Docker Tutum is a SaaS that allows you to build, deploy and manage Docker containers in a variety of clouds.
There are three main features:
Build and run your code using Tutum’s free private registry
Deploy applications using Tutum to manage Clusters that are fault tolerant and scalable. Tutum handles the orchestration of your infrastructure and application containers.
Manage your applications through Tutum’s intuitive Dashboard, simple API, or CLI tool. With built-in logs and data monitoring, all the info you need is at your fingertips.
The main party line is:
Experience the simplicity of PaaS with none of its constraints. Enjoy the flexibility of IaaS with none of its complexity.
Key Concepts of Docker Tutum
The main concepts of Docker Tutum are explained below:
(A) Node clusters are logical groups of nodes of the same type. Tutum pools your nodes resources, so your apps can run together thereby reducing complexity and waste. Node Clusters can be easily scaled with a drag of the slider.
(B) Nodes are individual Linux hosts/VMs used to deploy and run your applications. New nodes can be provisioned right from within Tutum to increase the capacity of your Node Clusters.
(C) Containers, (D) Links and (E) Volumes are Docker concepts.
(F) Services are logical groups of Docker containers from the same image. Services make it simple to scale your application across different nodes. Simply drag a slider to increase or decrease the availability, performance, and redundancy of your application.
Deploy Couchbase Docker Container on Amazon using Tutum
Docker Tutum Getting Started provides detailed steps on how to get started. Here is what I did to run Couchbase Docker container in Amazon using Docker Tutum:
Get started for free (at least while its in beta) by logging in using Docker Hub account.
Link Amazon Web Services credentials with Tutum. I just had to specify Access Key Id and Secret Access Key.If you create a new account for this then you may have to attach a policy to enable privileges such that new instances can be provisioned on your behalf.
A big change in Docker Machine is where implementation of drivers such as virtualbox, digitalocean, amazonec2, etc are no longer packaged in the main binary. Instead the distribution is a zip bundle with multiple drivers packaged and referenced from the main binary. These are now packaged separately and has the following benefits:
Each driver can evolve rapidly without waiting for merging into upstream
Additional drivers can be written and used without merging into the upstream
New version of the drivers can be released more frequently. Hopefully more clarity will be available on how these drivers will be distributed.
That’s why installation is slightly different and looks like:
%Total%Received%Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
After installation, the Docker Machine can be created as:
couchbase2_1|Starting Couchbase Server--Web UI available at http://<ip>:8091
couchbase3_1|Starting Couchbase Server--Web UI available at http://<ip>:8091
couchbase1_1|Starting Couchbase Server--Web UI available at http://<ip>:8091
Configure Couchbase Cluster
Lets configure these nodes to be part of a cluster now.
Find IP address of the Docker Machine:
>docker-machine ip default
Access Couchbase Admin Console at http://<DOCKER_MACHINE_IP:8091. This is http://192.168.99.104:8091 in our case. It will show the output as:
Click on “Setup”.
Each container is given an internal IP address by Docker, and each of these IPs is visible to all other containers running on the same host. We need to use these internal IP address when adding a new node to the cluster.Find IP address of the first container:
In this CLI, run command runs the container using the image id specified as the last argument, -p publish port 8091 from the container to 8091 on the Docker Machine, -d runs the container in background and prints the container id.
Watch the container status as:
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
5b789d231948couchbase"/entrypoint.sh couch"7minutes ago Up7minutes8092/tcp,11207/tcp,11210-11211/tcp,0.0.0.0:8091->8091/tcp,18091-18092/tcp sick_yonath
Find out IP address of the Docker Machine:
>docker-machine ip default
Access the setup console at 192.168.99.100:8091, make sure to specify the exact IP address in your case. This will show the screen:
Configure Couchbase Server
First run of Couchbase Server requires you to configure it, lets do that next!
Click on the Setup button. Scroll to bottom of the screen, change the Data RAM Quota to 500 (MB-16530), and click on Next.
In Couchbase, data is stored in buckets. The server comes pre-installed with some sample buckets. Select the travel-sample bucket to install it and click on Next.
Configure the bucket by taking defaults:
Click on Next.
Enter personal details, agree to T&C, click on Next:
Deploying an application in Kubernetes require to create multiple resources such as Pods, Services, Replication Controllers, and others. Typically each resource is define in a configuration file and created using kubectl script. But if multiple resources need to be created then you need to invoke kubectl multiple times. So if you need to create the following resources:
Your typical business application would consist of a variety of servers such as WildFly, MySQL, Apache, ActiveMQ, and others. They each have a log format, with minimal to no consistency across them. The log statement typically consist of some sort of timestamp (could be widely varied) and some text information. Logs could be multi-line. If you are running a cluster of servers then these logs are decentralized, in different directories.
How do you aggregate these logs? Provide a consistent visualization over them? Make this data available to business users?
This blog will:
Introduce ELK stack
Explain how to start it
Start a WildFly instance to send log messages to the ELK stack (Logstash)
View the messages using ELK stack (Kibana)
What is ELK Stack?
ELK stack provides a powerful platform to index, search and analyze your data. It uses Logstash for log aggregation, Elasticsearch for searching, and Kibana for visualizing and analyzing data. In short, ELK stack:
Collect logs and events data (Logstash)
Make it searchable in fast and meaningful ways (Elasticsearch)
Use powerful analytics to summarize data across many dimensions (Kibana)
Logstash is a flexible, open source data collection, enrichment, and transportation pipeline.
Elasticsearch is a distributed, open source search and analytics engine, designed for horizontal scalability, reliability, and easy management.
Kibana is an open source data visualization platform that allows you to interact with your data through stunning, powerful graphics.
How does ELK Stack work?
Logstash can collect logs from a variety of sources (using input plugins), process the data into a common format using filters, and stream data to a variety of sources (using output plugins). Multiple filters can be chained to parse the data into a common format. Together, they build a Logstash Processing Pipeline.
Inputs and outputs support codecs that enable you to encode or decode the data as it enters or exits the pipeline without having to use a separate filter.
Logstash can then store the data in Elasticsearch and Kibana provides a visualization of that data. Here is a sample pipeline that can collect logs from different servers and run it through the ELK stack.
Start ELK Stack
You can download individual components of ELK stack and start that way. There is plenty of advise on how to configure these components. But I like to start with a KISS, and Docker makes it easy to KISS!
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
cdc61acc1623 kibana:4.1.1"/docker-entrypoint. 59 seconds ago Up 58 seconds 0.0.0.0:80->5601/tcp elk_kibana_1
e184f2efcf95 arungupta/logstash:1.5.3 "/opt/logstash/bin/lAboutaminute ago Up58seconds0.0.0.0:5000->5000/tcp,0.0.0.0:5000->5000/udp elk_logstash_1
0622b55e8645arungupta/elasticsearch:1.7.1"/opt/elasticsearch/Aboutaminute ago Up59seconds0.0.0.0:9200->9200/tcp,0.0.0.0:9300->9300/tcp elk_elasticsearch_1
It shows all the containers running.
WildFly and ELK
James (@the_jamezp) blogged about Centralized Logging for WildFly with ELK Stack. The blog explains how to configure WildFly to send log messages to Logstash. It uses the highly modular nature of WildFly to install jboss-logmanager-ext library and install it as a module. The configured logmanager includes @timestamp field to the log messages sent to logstash. These log messages are then sent to Elasticsearch.
Instead of following the steps, lets Docker KISS and use a pre-configured image to get you started.
Distributed logging and visualization is a critical component in a microservices world where multiple services would come and go at a given time. A future blog will show how to use ELK stack with a microservices architecture based application.