Dockerception : Part 1

Running and managing docker instances from within docker containers

Here at Edgemesh we run 100% of our backend using Docker containers running on Joyent’s Triton. We also have a rather strict policy of Wiping Every Datacenter Every Night which requires us to have our deployments run autonomously as well. This requires what we call the datacenter’s Command and Control nodes (C&C nodes). So do we run the C&C nodes on our laptops, or a Virtual Machine? NO! They run in Docker!

Building a Command and Control Service

Our Command and Control service needs all the required tools to orchestrate a rollout for Edgemesh.

Command and Control with Docker-Compose

For us this means the C&C Docker instances need:

  1. the Docker Client binary
  2. the Docker Compose binary
  3. the Triton CLI utilities.

There may be some additional needs (e.g. DNS update scripts etc) but these 3 components supply the base of an automated deployment service.

Setting up a docker image with the docker client is pretty straightforward. Assuming you are using an Ubuntu base image this is just:

RUN apt-get -yqq install docker.io

We will also need the docker-compose binary, which we can get with

ENV DOCKER_COMPOSE_VERSION 1.15.0
RUN curl -L https://github.com/docker/compose/releases/download/$DOCKER_COMPOSE_VERSION/docker-compose-`uname -s`-`uname -m` > /usr/local/bin/docker-compose \
&& chmod +x /usr/local/bin/docker-compose

Finally our last utility is the Triton CLI tools. In our case we’re also going to need the Manta utilities as well, so let’s get everything together now.

# Going to need Node.js @v6.11.1
ENV NPM_CONFIG_LOGLEVEL warn
ENV NODE_VERSION 6.11.1
RUN curl -SLO "https://nodejs.org/dist/v$NODE_VERSION/node-v$NODE_VERSION-linux-x64.tar.xz" \
&& tar -xJf "node-v$NODE_VERSION-linux-x64.tar.xz" -C /usr/local --strip-components=1 \
&& rm "node-v$NODE_VERSION-linux-x64.tar.xz" \
&& ln -s /usr/local/bin/node /usr/local/bin/nodejs
# Let's get the Manta utilities and the awesome manta-sync package
RUN npm install manta -g \
&& npm install json -g \
&& npm install bunyan -g \
&& npm install manta-sync -g
# Finally let's add Triton
RUN npm i triton -g

The next step is adding secrets into the instances. There are a number of ways to do this, and I would encourage you to take a look at Hashicorp Vault and the Autopilot Pattern for Vault. The key bit is getting your Joyent authorized key into the instance, as this key file (and it’s keyID) will allow you to manage your Triton based instances.

Once thats setup, we can have our image go ahead and create triton environments for each datacenter (similar to docker-machine instances from a docker client point of view). This can be done in the build phase (if you are putting the key in the image) or On Startup (via Containerpilot) if you are accessing this key from vault etc.

You may put a single datacenter as your base config, then expand this later on. For example you might have: BASE_TRITON.json which looks like:

{
"name":"us-east-1",
"url": "https://us-east-1.api.joyent.com",
"account": "TRITON_ACCOUNT_NAME",
"keyId": "12:34:56:78:90:ab:cd:ef:gh:ij:kl:mn:op:qr:st:uv"
}

You can then set this as your default Triton env via:

RUN /bin/bash -c 'triton profile create -y -f /BASE_TRITON.json'

From there we want every C&C node to be able to communicate with every Triton datacenter. A little bash will get us there:

RUN triton datacenters -H -o name | while read -r dc; do triton profile get -j us-east-1 | sed "s/us-east-1/$dc/g" | triton profile create -y -f -; done

Once that’s run we will have Triton profiles for each datacenter. You can confirm this with:

>triton profile list
NAME CURR ACCOUNT USER URL
env tritonacct - https://us-east-1.api.joyent.com
eu-ams-1 tritonacct - https://eu-ams-1.api.joyent.com
us-east-1 * tritonacct - https://us-east-1.api.joyent.com
us-east-2 tritonacct - https://us-east-2.api.joyent.com
us-east-3 tritonacct - https://us-east-3.api.joyent.com
us-sw-1 tritonacct - https://us-sw-1.api.joyent.com
us-west-1 tritonacct - https://us-west-1.api.joyent.com

We can now interact with each datacenter with a simple

# switch to the us-east-1 datacenter as a control plane
eval "$(triton env us-east-1)"

Our C&C images can now (having eval’d into the profile “us-east-1”) communicate with the Triton us-east-1 datacenter like any old Docker-machine instance. For example, a simple docker ps will return all the currently running docker instances in that datacenter. Pretty sweet :)

Consul (3 nodes) are the first step in deployment

Consul and Locks

Before we go much further we want to talk about Consul and locking.

Obviously we need to have more than one C&C running in each datacenter, but we don’t want to have multiple instances issuing commands to the Joyent control plane simultaneously. Since we are using Consul everywhere, we can lean on the lock function.

Of note, we do not run Consul across data centers, rather each and every datacenter has its own Consul cluster that operates independently from every other datacenter.

Before a deployment (or scaling) command is issued by a Command and Control node, we preface that command with a lock. Note that consul is executing the command with within a /bin/sh by default. If you want bash, simply define a SHELL environment variable and point it to /bin/bash .

For example, if we want to use docker-compose to scale a service up, we would do:

# consul lock <lock key> <command to run>
# the `scale` command is deprecated, see:
#
https://docs.docker.com/compose/reference/scale/
# but I continue to use it anyways
consul lock scale_web docker-compose -f compose.yml scale consul=3

BUT WAIT? How does auto-deploy work if Consul isn’t running? Who deploys Consul? This is the catch, you need to have at least one datacenter deployed the old fashion way … e.g. from a laptop ;)

Coordinating the Datacenter Dance: Who deploys whom?

For us, we have a complicated re-deployment model that depends heavily on observed load at the time of Clean Slate (midnight UTC). We also allow Edgemesh C&C to deploy to new datacenters (those without an existing bootsrapped install) under certain conditions. A streamlined flow is show below:

simplified datacenter re-deploy scheduler

In practice this might work as follows:

  1. At midnight UTC, each datacenter reports it’s 30 minute time-series of active client count. From there we estimate the client load (linear regression) for the next 30 minutes. This helps us account for the load estimate during the actual deployments. For example, us-west-1 may not be the busiest at 24:00 UTC (5:00 PM PST) but by 5:30 PM PST it will be.

2. Sort the active datacenters by estimated load 30 minutes out in ascending order. This might return something like:

"us-east-3" #least busy
"us-sw-1"
"eu-ams-1"
"us-east-1"
"us-west-1" #most busy - do this last :)

3. Calculate the estimated latency for clients connecting to each Available Datacenters (defaults to all known datacenters). That might look something like the table below. This says a client normally handled by eu-ams-1 would take ~147ms to reach us-east-1 , and the fastest handoff is actually us-east-3 at 96ms.

4. Set the deployment order: We start by taking the first active datacenter which is us-east-3 in our list. This datacenter is the least busy, and if we hit an error and abort starting here ensures this would effect the least clients. For us-east-3 we then find the lowest latency peer from the available datacenter matrix, in this case thats us-east-2 at a 21ms estimated latency.

So us-east-2 will redeploy us-east-3 , because we want to ensure that us-east-2 is available while us-east-3 is offline. If us-east-2 is doing the re-deployment for us-east-3 then it can’t be offline by definition.

We then remove us-east-3 from the Active Datacenter list and move on to the next least busy datacenter.

We follow this process and get a deployment map which looks like:

src_dc         dst_dc
us-east-2 -> us-east-3
us-west-1 -> us-sw-1
us-east-3 -> eu-ams-1
us-east-3 -> us-east-1
us-sw-1 -> us-west-1

If there are any source datacenters that are not currently active, our first step is starting new deploys for those datacenters (the least busy Active Datacenter does the deploys). In the scenario above, this means us-east-2 would be deployed (from a scratch) from us-east-3 . So our full deployment scenario is:

src_dc         dst_dc
us-east-3 +> us-east-2 //create
us-east-2 -> us-east-3 //re-deploy
us-west-1 -> us-sw-1 //re-deploy
us-east-3 -> eu-ams-1 //re-deploy
us-east-3 -> us-east-1 //re-deploy
us-sw-1 -> us-west-1 //re-deploy
Complicated? Yea, and for most scenarios overly so

A more practical ‘who deploys whom’ model

A simpler option might be to do the following:

  1. Get a list of what data centers currently have deployed instances.
  2. Sort them, and create a ring

For step 1 we can do this with a one liner:

# list datacenters, then for each datacenter list eval into it
# then let's count the number of instances that are Command and
# Control instance (we name them CandC as the service in our compose
# files, so they will be named edgemesh_CandC_N)
# then print the datacenter name and the count of Command and
# Control nodes available
# pipe this to grep and exclude all dc's that have no C&C nodes
triton datacenters -H -o name | while read -r dc; do eval "$(triton env $dc)";triton instance list -H |grep CandC| echo $dc,$(wc -l); done |grep -v ",0"

This will return a list of all datacenters and the number of instances that are running in each datacenter. For example:

#datacenter, count of C&C instances running
eu-ams-1,3
us-east-3,3
us-sw-1,3
us-west-1,3

This tells us that we can issue deployments from any of these 4 datacenters.

Now lets sort this list, and create a ring. (we’ll leave this little awk challenge open to the user). The end result would be:

eu-ams-1 -> us-east-3 -> us-sw-1 -> us-west-1 -> eu-ams-1

Let’s Dance

Assuming every command and control instance in each datacenter arrives at the same deployment model, actually kicking off the deployment is really straight forward.

At midnight UTC each datacenter calculates the deployment model for tonight, and then checks to see if it’s the first datacenter in the chain. So for our deployment chain above, each datacenter would arrive at the chain

eu-ams-1 -> us-east-3 -> us-sw-1 -> us-west-1 -> eu-ams-1

The C&C nodes then publish this deployment model to a Consul kev value store. Each C&C node has a Consul Watcher that looks for updates to this key, and on update reads the key and checks to see if they are the datacenter listed as the first value in the model. If so, the C&C model will eval into the next value in the chain, the run a Clean Slate Process. So for the above eu-ams-1 starts us off.

step 1: All C&C nodes in eu-ams-1 realize they start off the process. They attempt a Consul lock and whichever instance succeeds it:

step 2: evals into the us-east-3 datacenter via eval $"(triton env us-east-3)"

step 3: Removes all DNS entries for any services running in us-east-3 to ensure traffic doesn’t get routed to there.

step 4: Stops all docker instances and then remove them. Something like the below will do it

step 5: deploy via docker compose

docker compose -f deployment_file.yml up

step 6: Confirm all services are running. We do this with a health-check across all services. Alternatively you could just count the running services and make sure they match what you expect.

docker ps -f "desired-state=Running" --format "{{.Names}}"

step 7: Re-enable DNS (assuming the containers don’t self register)

step 8: Double check services are still running now that traffic is flowing. You can also add some logic to check log files or even Consul health checks. You can also use docker exec here to run commands on instances in the remote datacenter. For example, double check that there are no services in the critical state by asking consul in the remote data center:

curl http://localhost:8500/v1/health/state/critical

step 9: Now let’s go ahead an drop the first datacenter from our deployment chain, and update the deployment model in Consul’s us-east-3 instance (via a docker exec into a us-east-3 C&C instance). This will trigger all C&C nodes in that datacenter to start the process anew and begin deploying us-sw-1 . It’s important to do this asynchronously here (via pub and sub model) as we need the deployment to start from a running C&C instance — not the shell we are docker exec’ing from.

Below is a full diagram which shows this flow:

Rolling deployments
Like what you read? Give Edgemesh Corporation a round of applause.

From a quick cheer to a standing ovation, clap to show how much you enjoyed this story.