This article was originally published on Scene-SI by Tit Petric. With their kind permission, we’re sharing it here for Codeship readers.
Scaling your service has usually been in the domain of system operators who installed servers and developers who tweaked software when the load got high enough to warrant scaling. Soon enough you’re looking at tens or even hundreds of instances that take a lot of time to manage.
After the release of Docker 1.12, you now have orchestration built in -- you can scale to as many instances as your hosts can allow. And setting up a Docker swarm is easy-peasy.
Initialize Swarm
First off, I’m starting with a clean Docker 1.12.0
installation. I’ll be creating a swarm with a few simple steps:
root@swarm1:~$ docker swarm init Swarm initialized: current node (4i0lko1qdwqp4x1aqwn6o7obh) is now a manager. To add a worker to this swarm, run the following command: docker swarm join \ --token SWMTKN-1-0445f42yyhu8z4k3lgxsbnusnd8ws83urf56t02rv1vdh1zqlj-9ycry5kc20rnw5cbxhyduzg1f \ 10.55.0.248:2377 To add a manager to this swarm, run the following command: docker swarm join \ --token SWMTKN-1-0445f42yyhu8z4k3lgxsbnusnd8ws83urf56t02rv1vdh1zqlj-09swsjxdz80bfxbc1aed6mack \ 10.55.0.248:2377
I now have a swarm consisting of exactly one manager node. You can attach additional swarm workers or add new managers for high availability. If you’re running a swarm cluster with only one manager and several workers, you’re risking an interruption of service if the manager node fails.
“In Docker Swarm, the Swarm manager is responsible for the entire cluster and manages the resources of multiple Docker hosts at scale. If the Swarm manager dies, you must create a new one and deal with an interruption of service.”
As we’re interested in setting up a two-node swarm cluster, it makes sense to make both nodes in the swarm be managers. If one goes down, the other would take its place.
root@swarm2:~# docker swarm join \ > --token SWMTKN-1-0445f42yyhu8z4k3lgxsbnusnd8ws83urf56t02rv1vdh1zqlj-09swsjxdz80bfxbc1aed6mack \ > 10.55.0.248:2377 This node joined a swarm as a manager.
To list the nodes in the swarm, run docker node ls
.
root@swarm2:~# docker node ls ID HOSTNAME STATUS AVAILABILITY MANAGER STATUS 4i0lko1qdwqp4x1aqwn6o7obh swarm1 Ready Active Leader 9gyk5t22ngndbwtjof80hpg54 * swarm2 Ready Active Reachable
Creating a Service
As you see, when adding a new manager node, it’s automatically added but not promoted to the leader. Let’s start some service that will perform something that can be scaled over both hosts. I will ping google.com
, for example. I want to have five instances of this service available from the start by using the --replicas
flag.
root@swarm2:~# docker service create --replicas 5 --name helloworld alpine ping google.com 31zloagja1dlkt4kaicvgeahn
As the service started without problems, we just get the id of the service which was started. By using docker service ls
, we can get more information about the running service.
root@swarm2:~# docker service ls ID NAME REPLICAS IMAGE COMMAND 31zloagja1dl helloworld 5/5 alpine ping google.com
Of course, as we’re talking orchestration, the services in the examples are split between swarm1
and swarm2
nodes. You can still use docker ps -a
on indivudual nodes to inspect single containers, but there’s the handy docker service ps [name]
.
root@swarm1:~# docker service ps helloworld ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR 5fxtllouvmd91tmgzoudtt7a4 helloworld.1 alpine swarm1 Running Running 7 minutes ago cqvgixx3djhvtiahba971ivr7 helloworld.2 alpine swarm2 Running Running 7 minutes ago 99425nw3r4rf5nd66smjm13f5 helloworld.3 alpine swarm2 Running Running 7 minutes ago 1dj3cs7v5ijc93k9yc2p42bhj helloworld.4 alpine swarm1 Running Running 7 minutes ago 0hy3yzwqzlnee10gat6w2lnp2 helloworld.5 alpine swarm1 Running Running 7 minutes ago
Testing Fault Tolerance
As we connected two managers to run our service, let’s just bring one of them down. I’m going to power off swarm1
, the current leader, so that it will hopefully do the following:
Elect a new leader (swarm2).
Start up additional
helloworld
containers to cover the outage.
root@swarm1:~# poweroff Connection to 10.55.0.248 closed by remote host. Connection to 10.55.0.248 closed.
First off, let’s list the cluster state.
root@swarm2:~# docker node ls Error response from daemon: rpc error: code = 2 desc = raft: no elected cluster leader
Uh-oh, this was slightly unexpected. After bringing up swarm1
, I’m seeing that swarm2
was promoted to a leader. But it’s not exactly the fail-over I imagined. While swarm1
was offline, the ping service only ran as 2⁄5 and didn’t automatically scale on swarm2
as expected.
root@swarm2:~# docker node ls ID HOSTNAME STATUS AVAILABILITY MANAGER STATUS 4i0lko1qdwqp4x1aqwn6o7obh swarm1 Ready Active Reachable 9gyk5t22ngndbwtjof80hpg54 * swarm2 Ready Active Leader root@swarm2:~# docker service ps helloworld ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR 4x0zgeiucsizvmys5orih2bru helloworld.1 alpine swarm1 Running Running 3 minutes ago 5fxtllouvmd91tmgzoudtt7a4 \_ helloworld.1 alpine swarm1 Shutdown Complete 3 minutes ago cqvgixx3djhvtiahba971ivr7 helloworld.2 alpine swarm2 Running Running 21 minutes ago 99425nw3r4rf5nd66smjm13f5 helloworld.3 alpine swarm2 Running Running 21 minutes ago 5xzldwvoplqpg1qllg28kh2ef helloworld.4 alpine swarm1 Running Running 3 minutes ago 1dj3cs7v5ijc93k9yc2p42bhj \_ helloworld.4 alpine swarm1 Shutdown Complete 3 minutes ago avm36h718yihd5nomy2kzhy7m helloworld.5 alpine swarm1 Running Running 3 minutes ago 0hy3yzwqzlnee10gat6w2lnp2 \_ helloworld.5 alpine swarm1 Shutdown Complete 3 minutes ago
So, what went wrong? A bit of reading and I’ve come up to the following explanation of how Docker uses the RAFT consensus algorithm for leader selection:
Consensus is fault tolerant up to the point where quorum is available. If a quorum of nodes is unavailable, it is impossible to process log entries or reason about peer membership. For example, suppose there are only two peers: A and B. The quorum size is also two, meaning both nodes must agree to commit a log entry. If either A or B fails, it is now impossible to reach quorum.
Adding an Additional Manager to Enable Fault Tolerance
So, if you have three managers, one manager can fail, and the remaining two represent a majority, which can decide which one of the remaining managers will be elected as a leader. I quickly add a swarm3
node to the swarm. You can retrieve credentials to add nodes by issuing docker swarm join-token [type]
, where type
can be either worker
or manager
.
root@swarm2:~# docker swarm join-token manager To add a manager to this swarm, run the following command: docker swarm join \ --token SWMTKN-1-0445f42yyhu8z4k3lgxsbnusnd8ws83urf56t02rv1vdh1zqlj-09swsjxdz80bfxbc1aed6mack \ 10.55.0.238:2377
And we run this command on our swarm3
machine.
root@swarm3:~# docker swarm join \ > --token SWMTKN-1-0445f42yyhu8z4k3lgxsbnusnd8ws83urf56t02rv1vdh1zqlj-09swsjxdz80bfxbc1aed6mack \ > 10.55.0.238:2377 This node joined a swarm as a manager. root@swarm3:~# docker node ls ID HOSTNAME STATUS AVAILABILITY MANAGER STATUS 4i0lko1qdwqp4x1aqwn6o7obh swarm1 Ready Active Reachable 9gyk5t22ngndbwtjof80hpg54 swarm2 Ready Active Leader b9dyyc08ehtnl62z7e3ll0ih3 * swarm3 Ready Active Reachable
Yay! Our swarm3
is ready. I cleared out the container inventory to start with a clean swarm.
Scaling Our Service with Fault Tolerance
I deleted the service with a docker service rm helloworld
and cleaned up the containers with a docker ps -a -q | xargs docker rm
. Now I can start the service again from zero.
root@swarm1:~# docker service create --replicas 5 --name helloworld alpine ping google.com 5gmrllue1sgdwl1yd5ubl16md root@swarm1:~# docker service scale helloworld=10 helloworld scaled to 10 root@swarm1:~# docker service ps helloworld ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR 2hb76h8m7oop9pit4jgok2jiu helloworld.1 alpine swarm1 Running Running about a minute ago 5lxefcjclasna9as4oezn34i8 helloworld.2 alpine swarm3 Running Running about a minute ago 95cab7hte5xp9e8mfj1tbxms0 helloworld.3 alpine swarm2 Running Running about a minute ago a6pcl2fce4hwnh347gi082sc2 helloworld.4 alpine swarm2 Running Running about a minute ago 61rez4j8c5h6g9jo81xhc32wv helloworld.5 alpine swarm1 Running Running about a minute ago 2lobeil8sndn0loewrz8n9i4s helloworld.6 alpine swarm1 Running Running 20 seconds ago 0gieon36unsggqjel48lcax05 helloworld.7 alpine swarm1 Running Running 21 seconds ago 91cdmnxarluy2hc2fejvxnzfg helloworld.8 alpine swarm3 Running Running 21 seconds ago 02x6ppzyseak8wsdcqcuq545d helloworld.9 alpine swarm3 Running Running 20 seconds ago 4gmn24kjfv7apioy6t8e5ibl8 helloworld.10 alpine swarm2 Running Running 21 seconds ago root@swarm1:~#
And powering off swarm1
gives us:
root@swarm2:~# docker node ls ID HOSTNAME STATUS AVAILABILITY MANAGER STATUS 4i0lko1qdwqp4x1aqwn6o7obh swarm1 Ready Active Unreachable 9gyk5t22ngndbwtjof80hpg54 * swarm2 Ready Active Leader b9dyyc08ehtnl62z7e3ll0ih3 swarm3 Ready Active Reachable
Additional containers have spawned, just as intended:
root@swarm2:~# docker service ps helloworld ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR bb8nwud2h75xpvkxouwt8rftm helloworld.1 alpine swarm2 Running Running 26 seconds ago 2hb76h8m7oop9pit4jgok2jiu \_ helloworld.1 alpine swarm1 Shutdown Running 2 minutes ago 5lxefcjclasna9as4oezn34i8 helloworld.2 alpine swarm3 Running Running 2 minutes ago 95cab7hte5xp9e8mfj1tbxms0 helloworld.3 alpine swarm2 Running Running 2 minutes ago a6pcl2fce4hwnh347gi082sc2 helloworld.4 alpine swarm2 Running Running 2 minutes ago 8n1uonzp2roy608kd6v888y3d helloworld.5 alpine swarm3 Running Running 26 seconds ago 61rez4j8c5h6g9jo81xhc32wv \_ helloworld.5 alpine swarm1 Shutdown Running 2 minutes ago 17czblq9saww4e2wok235kww8 helloworld.6 alpine swarm2 Running Running 26 seconds ago 2lobeil8sndn0loewrz8n9i4s \_ helloworld.6 alpine swarm1 Shutdown Running about a minute ago 6f3tm5vvhq07kwqt3zu0xr5mi helloworld.7 alpine swarm3 Running Running 26 seconds ago 0gieon36unsggqjel48lcax05 \_ helloworld.7 alpine swarm1 Shutdown Running about a minute ago 91cdmnxarluy2hc2fejvxnzfg helloworld.8 alpine swarm3 Running Running about a minute ago 02x6ppzyseak8wsdcqcuq545d helloworld.9 alpine swarm3 Running Running about a minute ago 4gmn24kjfv7apioy6t8e5ibl8 helloworld.10 alpine swarm2 Running Running about a minute ago
Move the Services Away From a Specific Node (Drain)
With this setup, we can tolerate failure of one manager node. But say we wanted a more “graceful” procedure of removing containers from one node? We can set the availability to drain to empty containers on one node.
root@swarm2:~# docker node update --availability drain swarm3 swarm3 root@swarm2:~# docker service ps helloworld | grep swarm3 5lxefcjclasna9as4oezn34i8 \_ helloworld.2 alpine swarm3 Shutdown Shutdown 19 seconds ago 8n1uonzp2roy608kd6v888y3d \_ helloworld.5 alpine swarm3 Shutdown Shutdown 19 seconds ago 6f3tm5vvhq07kwqt3zu0xr5mi \_ helloworld.7 alpine swarm3 Shutdown Shutdown 19 seconds ago 91cdmnxarluy2hc2fejvxnzfg \_ helloworld.8 alpine swarm3 Shutdown Shutdown 19 seconds ago 02x6ppzyseak8wsdcqcuq545d \_ helloworld.9 alpine swarm3 Shutdown Shutdown 19 seconds ago root@swarm2:~# docker service ps helloworld | grep swarm2 | wc -l 10
All the containers on swarm3
shut down and started up on the remaining node, swarm2
. Let’s scale down the example to only one instance.
root@swarm2:~# docker service scale helloworld=1 helloworld scaled to 1 root@swarm2:~# docker service ps helloworld | grep swarm2 | grep -v Shutdown 17czblq9saww4e2wok235kww8 helloworld.6 alpine swarm2 Running Running 7 minutes ago
Cleaning up the containers is still very much in the domain of the sysadmin. I started up swarm1
and scaled our service to 20 instances.
root@swarm2:~# docker service scale helloworld=20 helloworld scaled to 20 root@swarm2:~# docker service ls ID NAME REPLICAS IMAGE COMMAND 5gmrllue1sgd helloworld 2/20 alpine ping google.com root@swarm2:~# docker service ls ID NAME REPLICAS IMAGE COMMAND 5gmrllue1sgd helloworld 10/20 alpine ping google.com root@swarm2:~# docker service ls ID NAME REPLICAS IMAGE COMMAND 5gmrllue1sgd helloworld 16/20 alpine ping google.com root@swarm2:~# docker service ls ID NAME REPLICAS IMAGE COMMAND 5gmrllue1sgd helloworld 20/20 alpine ping google.com
As you can see here, it does take some time for the instances to start up. Let’s see how it distributed.
root@swarm3:~# docker service ps -f "desired-state=running" helloworld | grep swarm | awk '{print $4}' | sort | uniq -c 10 swarm1 10 swarm2
Enabling and Scaling to a New Node
As we put swarm3
into drain availability, we don’t have any instances running on it. Let’s fix that very quickly by putting it back into active availability mode.
root@swarm3:~# docker node update --availability active swarm3 swarm3
As the already running service will stay the same, we need to scale our service to populate swarm3
.
root@swarm3:~# docker service scale helloworld=30 helloworld scaled to 30 root@swarm3:~# docker service ps -f "desired-state=running" helloworld | grep swarm | awk '{print $4}' | sort | uniq -c 10 swarm1 10 swarm2 10 swarm3
It takes a bit of getting used to, but docker service
is a powerful way to scale out your microservices. It might be slightly more tricky when it comes to data volumes (mounts), but that’s the subject of another post.
Closing Words
Keep in mind, if you’re provisioning swarm managers, you need a majority to resolve failures gracefully. That means you should have an odd number of managers, where N > 2. A cluster of N managers is able to tolerate failure of ((N-1)/2) nodes; for example, 3 managers = 1 failed node; 5 managers = 2 failed nodes; 7 managers = 3 failed nodes; and so on.
A worker, in comparison, doesn’t replicate the manager state, and you can’t start or query services from a worker. You should only do that from any of the manager nodes -- the commands will be run on the leader node.