How to back up Kubernetes and Docker

You don’t have to back up everything about every container, but it’s important to back up configurations for running and managing them in case of disaster

Comments

Yes, your container infrastructure needs some type of backup. Kubernetes and Docker will not magically build themselves after a disaster.

As discussed in a separate article, you don’t need to back up the running state of each container, but you will need to back up the configuration used to run and manage your containers.

Here’s a quick reminder of what you’ll need to back up.

Configuration and desired-state information

The Dockerfiles used to build your images and all versions of those files
The images created from the Dockerfile and used to run each container
Kubernetes etcd & other - K8s databases that info on cluster state
Deployments - YAML files describing each deployment

Persistent data created or changed by containers

Persistent volumes
Databases

Dockerfiles

Docker containers are run from images, and images are built from Dockerfiles. A proper Docker configuration would first use some kind of repository such as GitHub as a version-control system for all Dockerfiles.

Do not create ad hoc containers using ad hoc images built from ad hoc Dockerfiles. All Dockerfiles should be stored in a repository that allows you to pull historical versions of that Dockerfile should there be a problem with the current build.

You should also have some kind of repository where you store the YAML files associated with each K8s deployment. These are text files that can benefit from a version-control system.

These repositories then need to be backed up. One of the most popular repositories is GitHub, which offers a number of ways to back up your repository. There are a variety of scripts using the provided APIs to download a current backup of your repository. There are also third-party commercial tools you can use to backup GitHub or whatever repository you are using.

If you haven’t followed the advice above and have running containers based on images that you no longer have the Dockerfiles for, you can use the docker image history command or a tool such as dfimage to create a Dockerfile from your current images

Put those Dockerfiles in a repository and start backing it up! But, honestly, don’t get in this situation. Always store and back up the Dockerfiles and YAML files used to create your environment.

Docker Images

The current images used to run your containers should also be stored in a repository. Of course, if you’re running Docker images in Kubernetes, you’re already doing that.

You can use a private repo such as a Docker registry, or a public repo like Dockerhub. Cloud providers can also provide you a private repo to store your images. The contents of that repo should then be backed up. A simple Google search such as “Dockerhub backup” can yield a surprising number of options.

If you do not have the current image used to run your containers, you can create one using the docker commit command. You can then create a Dockerfile from that image using docker image history or the tool dfimage.

Kubernetes etcd

The Kubernetes etcd database is very important and should be backed up using the etcdctl snapshot save db command. This will create the file snapshot.db in the current directory. That file should then be backed up to external storage.

If you are using commercial backup software, you can easily trigger the etcdctl snapshot save command before taking a backup of the directory where the snapshot.db will be created. That’s one way you can integrate this backup into your commercial backup environment. Take a look at this recovery documentation.

Persistent volumes

There are a variety of ways that containers can be given access to persistent storage that can be used to store or create data. Traditional docker volumes reside in a subdirectory of the Docker configuration. Bind mounts are simply any directory on a Docker host that is mounted inside a container (using the bind mount command).

For a variety of reasons, traditional volumes are preferred by the Docker community, but for the purposes of backup traditional volumes and bind mounts are essentially the same. You can also mount a network-file-system (NFS) directory or an object from an object-storage system as a volume inside a container.

The method you use to backup your persistent volumes is going to be based on which of the above options you use for the container. However, all of them will have the same problem: If the data is changing, you will need to deal with that in order to get a consistent backup.

One way is to shut down any containers using that particular volume. This is a bit old-school, but it’s one of the challenges created by the container world, since the typical method of putting a backup agent in the container isn’t really an option. Once shut down, the volume can be backed up.

If it is a traditional Docker volume, you can back it up by mounting it to another container that won’t change its data while it’s backing up, and then creating a tar image of the volume in a bind-mounted volume that you then back up using whatever your backup system uses.

However, this is really hard to do in Kubernetes. This is one reason stateful information is best stored in a database, not a filesystem. Please consider this issue when designing your K8s infrastructure.

Also, if you’re using a bind-mounted directory, an NFS-mounted filesystem, or an object storage system as your persistent storage system, you can use whatever is the best way to back up that storage system.

This could be a snapshot followed by replication, or simply running your commercial backup software on that system. These methods are likely to provide a much more consistent backup than a typical file-level backup of that same volume.

Databases

The next backup challenge is when a container is using a database to store its data. These databases need to be backed up in a way that will guarantee their integrity.

Depending on the database, the method mentioned above might work: Shut down the container accessing the database, then back up the directory where its files are stored. However, the downtime required by this method may not be appropriate.

Another method is to connect directly to the database engine itself and ask it to run a backup to a file you can then back up. If the database is running inside a container, you will need to first use a bind mount to attach a volume that it can back up, so its backup can exist outside the container.

Then run the command that database uses (such as mysqldump) to create a backup. Then make sure to back up the file it creates using your backup system.

What if you don’t know what containers are using what storage or what databases? One solution might be to use the docker ps command to list the running containers, then use the docker inspect command to display each container’s configuration.

There is a section called Mounts that will tell you what volumes are mounted where. Any bind mounts will also be specified in the YAML files that you submitted to Kubernetes.

Commercial backup solutions

There are a variety of commercial backup solutions that can protect some or all of the data mentioned above. The following is a very quick summary:

Commvault’s virtual server agent can act as a proxy to backup containers and their images.
Cohesity offers data protection for K8s namespaces
Heptio (now a VMware company) offers Velero, backup designed for K8s
Contino, Datacore, and Portworx offer storage designed for K8s and containers, and also support backing up that information

Given the variety of ways K8s and Docker can be configured, it’s very difficult to cover all of this in a single article. But hopefully, this has given you something to think about, or maybe helped you realise something you haven’t been backing up but should be. Keep it safe out there.