Migrating Docker Containers to Kubernetes cluster

So far, we have build the following apps:

  • Day 1: Indeed Job scraper (indeed_jobs)
  • Day 4: Android App Builder (android-builder)
  • Day 6: Online JukeBox (chillbeats)

Publishing Docker Images to Registery

The images were built locally on our old docker server, we need a way to share them with our Kubernetes cluster. I decided to use the Docker public registry as a broker.

First step is to push the images to Docker registery. The command to push is, as you guessed it, docker push. Remember the way we build those images? For example:

docker build -t dohsimpson/chillbeats .

The reason we include the 'dohsimpson/' part is to make docker push understand whose repository we want to push to. Of course, this is my username, you should put your username here.

So the command to push chillbeats image is:

docker push dohsimpson/chilbeats

This will push the image to my repository on Docker Hub. I will do the same for the other two images.

Deployment, Service, Pod, Oh My!

It is simple enough to run a docker image with Kubernetes: kubectl run. Behind the scene, however, there is actually a lot going on. Kubernetes created a deployment, a replicaset, a service, and a pod. If you are familiar with these concepts, feel free to skip this section. Otherwise, let me explain them in simple English:

A Deployment is a state. It describes the way we want to deploy this Docker image on Kubernetes. For example, it describes how many copies (containers) of this image do we want to run; it describes the hardware resource requirements for container runtime; it remembers where to find the image in case we want Kubernetes to fetch image updates. In fact, it is so declarative that it is actually a yaml file. You can see/edit a deployment file with kubectl edit, but we won't go into details in this post.

A Replicaset is a controller for running multiple copies of the docker image. By one copy I mean one container. As you might guessed it, this controller is important for scaling. In Kubernetes, scaling is easy, you just set the desired copies to run, and Kubernetes went ahead to create/destroy containers on the Docker servers to make sure the desired copies are running.

A Service is a routing configuration. It describe how the containers can be reached by other containers or the outside world. Load Balancer is crucial here, because you can run multiple containers of the same image, you want traffic to be routed to all of them, but you want them to have the same endpoint (e.g. IP address). You can setup different types of load balancers, internal or external, to decide who can reach these containers.

A Pod is the most basic unit in Kubernetes. In fact, unlike using Docker directly, you cannot just run a container in Kubernetes. You have to wrap the container with this thing called "pod", and run this "pod". Why on earth, you asked? Why not just run container directly?

The rational for this is that sometimes you have two or more containers doing different things that have to reside on the same host, for example, in order to accessing the host filesystem or using Named Pipes or Unix socket. Think of the senario where you have a FTP container and a nginx web server container. Both using volume to read/write a host directory. When you only have one host, this is not a problem, you just docker run the two container, and mount the same volume. But wth Kubernetes, you get containers scattered in different hosts, and your setup no longer works, damn!

With Pod, your legacy application is saved! You bundle the two containers in a Pod. And Kubernetes gurantees you that when they run, they always run together, side by side, on the same host.

The Rule of thumb (credit: Kubernetes: Up and Running) for whether you should put two containers in one pod or in two pods is this: Would they still work if they are running on different hosts?

Run, Kubernetes, Run!

The following two lines create and run "chillbeats" and "android-builder" as deployments:

$ kubectl run chillbeats --image=dohsimpson/chillbeats
$ kubectl run android-builder --image=dohsimpson/android-builder

The following two lines modify the services and expose them to the world:

$ kubectl expose --port 80 deployment chillbeats --type=LoadBalancer
$ kubectl expose --port 80 --target-port 5000 deployment android-builder --type=LoadBalancer --name=android-builder

Wait a bit for the modification to take place, and run kubectl get svc to see the external ips for these two services:

$ kubectl get svc
NAME                      TYPE           CLUSTER-IP      EXTERNAL-IP      PORT(S)                       AGE
android-builder           LoadBalancer   10.59.243.68    35.199.28.149    80:31678/TCP                  10d
chillbeats                LoadBalancer   10.59.242.91    35.230.163.72    80:32195/TCP                  10d

Now we just need to chnage DNS to point the corresponding domain names to these IP addresses. But before that, we still need to deploy "indeed_jobs" image.

Cron in Kubernetes

Cronjob is a thing in Kubernetes! The command to deploy a container as a cronjob is the following:

kubectl run indeedjobs --schedule '1 0 * * *' --restart=OnFailure --image=dohsimpson/indeed_jobs

As you can see, we specified --schedule option with a schedule in crontab syntax. (Doc)

We can see it's running:

$ kubectl get cronjob
NAME         SCHEDULE    SUSPEND   ACTIVE    LAST SCHEDULE   AGE
indeedjobs   1 0 * * *   False     0         20h             9d

Switching the DNS

By changing the DNS to point to the new IP addresses. We can achieve zero-downtime migration. For a managed DNS solution, I turned to CloudFlare's free plan: it includes DNS service as well as free star certificate for second level domains, i.e. SUBDOMAIN.DOMAIN.TLS.

Terraform has a provider for CloudFlare as well! So we can use Terraform to manage CloudFlare DNS.

variables.tf

variable "cloudflare_email" {
  default = "*******"
}

variable "cloudflare_token" {
  default = "*******"
}

enting.org.tf

provider "cloudflare" {
  email = "${var.cloudflare_email}"
  token = "${var.cloudflare_token}"
}

resource "cloudflare_record" "android-builder" {
    domain = "enting.org"
    name = "android-builder"
    value = "35.199.28.149"
    type = "A"
    ttl = 120
    proxied = false
}

chillbeats.tf

# chillbeats.live
resource "cloudflare_record" "chillbeats_root" {
    domain = "chillbeats.live"
    name = "@"
    value = "35.230.163.72"
    type = "A"
    ttl = 120
    proxied = true
}

Do terraform apply and the A records are updated, sweet!

Destroy old servers

It's time to retire our old servers on DigitalOcean. It couldn't be simpler, go to your corresponding Terraform directory and run terraform destroy, and you are done!

Conclusion

With Docker and Terraform, migrating applications between cloud providers is easy.