Docker Swarm Review

In my current series about Kubernetes under DevOPS methods I wrote about Kubernetes in general, a short comparison of VM vs bare metal and lately about Docker itself, the latter being more of a basic review for Docker standalone, Docker-Compose and how to use Puppet to manage Docker standalone.

Overview

This post will shed a bit light on Docker Swarm, which is the next level on the way to Kubernetes. Docker standalone typically provides single or multiple containers, while Docker Compose is a better way of managing connected / multi-container applications in a controlled fashion. The latter is already a great way of running applications through CICD pipelines rather than via CLI, which makes it much easier and more resistant to failure.

But it still is not a (good) answer to real scalability. Even if your environment consists of multiple Docker hosts, you still have to manually carry out much of the work on resource planning, distribution and container availability yourself. Of course it helps a lot for the latter if you use Puppet, Ansible or other mechanisms of Infrastructure as Code.

So how about having such scalability and high availability done in a (mostly) fully automated way? Mostly, because there is no such thing as 'fully automated' as of now. One can always plan availability of resources, the required infrastructure, the deployment of applications and even a certain fault resistance.

But it is impossible to calculate every single aspect of possible failures such as fire, blackouts (much worse so called brown outages, where power levels drop below a certain point and create all sort of "spooky" problems, which are hard to troubleshoot in first place, let go plan for them), human error (still counting for most of the problems in first place etc) and so on.

Scenario

Nonetheless, a certain fault tolerance is possible, so let's look at that in more detail. Let's assume your task at hand is to provide the infrastructure for an application driven and powered by stateless micro services, say spring boot. Perhaps a web service to sell and deliver something to external customers through a smart phone app, a way of authentication, securing payments, and to monitor the whole setup.

Since you run this productive for thousandths of users and need to maintain high availability, there must be a fault tolerance of n+1, meaning you must be able to loose a host and still can deliver every aspect of your product.

There will need to be multiple hosts for the containers, multiple instances of each service for fault tolerance and load balance, and a surrounding support infrastructure, i.e. databases etc. You need to define a desired state, and have a mechanism in place to turn the desired state into a running state, at any given time. You will want to automatically configure and deploy your environment and proactively solve typical problems.

To achieve that, you will need an orchestration mechanism, which is given by Docker Swarm (Kubernetes can do that even better, but this article is about Docker Swarm). We also want to utilize Puppet to a certain point.

Swarm Basics

Docker swarm basically consists of two types of nodes:

  • Manager nodes: Responsible for resource scheduling, swarm integrity and -communication. Manager nodes also can run container workloads, if they are not overloading the manager's capability to manage resources in first place. When not running any container load, , manager nodes require much less resources than worker nodes. Typically you will see an odd amount of manager nodes, i.e. 3 or 5. This is a requirement to ensure a true quorum about the cluster status. Manager nodes elect a leader, which then becomes the master.
  • Worker nodes: Responsible exclusively for running container workload tasks. They cannot do any of the manager node's tasks, and only run assigned tasks. You can promote a worker to a manager though. It is very helpful to apply labels to worker nodes, so it is easier to assign specific tasks based on those labels. An example here would be labeling after disk types, i.e. SSD vs HDD.

So when you first play with Docker swarm just to get to know it, you probably will be setting up 1 manager and 1 worker, as that gives you all the required things to learn the technology. When you set up a production environment, you most likely will 3 masters and as many workers as needed.

Typically, applications are developed and tested in different stages say 'development', 'test' and 'production', the latter one being the real thing. Often, that means 3 different clusters, vlans and supporting infrastructure. But in fact you can almost always achieve this in one single larger cluster by splitting nodes and networks into the various stages through namespaces and networks etc. That might make your infrastructure much easier to manage, but all obviously depends on a variety of things, including security matters for the least.

The most important thing when using Docker swarm are proper resource definitions. Much like with Docker-Compose, you will use yaml (.yml, or .yaml) files to define container images, runtime variables, volumes etc. through so called stack files. Those are then deployed via docker stack deploy:

 $ sudo docker stack deploy -c your-stack.yaml your-stack

There are multiple ways of doing that:

  • the manual ways as listed above
  • using portainer, a very helpful graphical interface, which comes in a premium business edition and a free community edition and runs as container as well.
  • using CICD pipelines like jenkins, Gitlab.ci (the recommended way as per below)

It is pretty easy to set up the Docker swarm itself through Puppet as well. All we need are the the usual docker binaries, the service management plus the swarm setup. I did write my own module for that to avoid the overhead from the external module from the puppet forge.

The swarm setup itself (swarm init, join nodes) tends to be done better executing through CICD or via CLI. You only want to automate the installation and service management, so the Docker service is running on all required nodes. Docker swarm does the orchestration itself internally and yo do not want to interfere with that at all, other than through resource definitions. I used Puppet for the installation, Portainer for GUI management and CICD for testing and deployment, and always had a very robust and reliable environment over various years.

Deployment through CICD

In our scenario above we want to deploy a few different services, on multiple hosts. We want to have versioned stack file definitions, and for that we use a git repository. We also need to store the image somewhere, so the Docker swarm can download and run it. Gitlab offers all the required features including the Docker registry, so we use a self-hosted Gitlab-CE.

Our git repo contains a few source code files:

  • Dockerfile to build our image
  • app.py which is our application
  • requirements.txt which describes requirements for the app
  • stack.yaml with the definitions of our stack to be deployed
  • Jenkinsfile with instructions for the tasks to be executed on Jenkins
    • i.e. yaml validation for testing purposes
    • deployment task

Our image will be built to our specs, uploaded to Gitlab's Container registry, and deployed from there too. As example we'll be partially using the application as described on the Docker documentation.

Note that typically you also will have files like .dockerignore and .gitignore files in your repo. The first one to mention files which should be ignored when building the image and the latter one to keep unwanted files out of the git process / cache. But as those are not playing a role in this tutorial, we'll keep them out.

Dockerfile

The Dockerfile is used by the docker engine to build a docker image based on basic instructions like FROM, ADD, COPY, RUN and CMD. In our app, it would look like this:

 # syntax=docker/dockerfile:1
 FROM python:3.4-alpine
 ADD . /code
 WORKDIR /code
 RUN pip install -r requirements.txt
 CMD ["python", "app.py"]

In order for this to work, the other files have to be in place.

app.py

The actual application is a simple python file:

 from flask import Flask
 from redis import Redis

 app = Flask(__name__)
 redis = Redis(host='redis', port=6379)

 @app.route('/')
 def hello():
     count = redis.incr('hits')
     return 'Hello World! I have been seen {} times.\n'.format(count)

 if __name__ == "__main__":
     app.run(host="0.0.0.0", port=8000, debug=True)

requirements.txt

 flask
 redis

stack.yaml

 version: "3.9"

 services:
   web:
     image: 127.0.0.1:5000/stackdemo
     build: .
     ports:
       - "8000:8000"
   redis:
     image: redis:alpine

Jenkinsfile

With those files in place, we can start putting our CICD pipeline together. Basically we want to achieve the following:

  • build the image
  • tag the image for our container registry
  • push the image to the registry
  • deploy the stack to our swarm

Of course there are many tests which should be done in a proper CICD pipeline, for instance validation against SonarQube, yamllint etc, but this would really blow up this post way too much and will be covered separately in future posts.

The following is an example declarative pipeline for the Jenkinsfile:

pipeline {
  agent {
    label 'general'
  }

    post {
        always {
            deleteDir() /* clean up our workspace */
        }
        success {
            updateGitlabCommitStatus state: 'success'
        }
        failure {
            updateGitlabCommitStatus state: 'failed'
            step([$class: 'Mailer', notifyEveryUnstableBuild: true, recipients: '', sendToIndividuals: true])
        }
    }

    options {
          gitLabConnection('mygitlabconnection')
    }

  stages{

    stage('manage image') {
      steps {
        sh '''
        docker login mygitlabhost:4567
        docker build -t mygitlabhost:4567/my-app:latest .
        docker push mygitlabhost:4567/my-app:latest
        '''
      }
    }

    stage('deploy stack') {
      steps {
        sh '''
        docker stack deploy --compose-file stack.yml my-stack
        '''
      }
    }
  }
}

You likely will have to add a proper login mechanism using username and password or token. Also the deploy stage will require a transport mechanism, otherwise the stack will be deployed locally on Jenkins, which is not the goal. There are many ways and this depends on your setup. I often use the sshagent plugin, which allows remote execution via SSH, so the command would be executed on the target host, one of the manager nodes in your swarm. If you happen to use portainer to manage your swarm, the API can be used for that too.

Leave a Reply

Your email address will not be published. Required fields are marked *

2 × three =