Contributed"> 3 Ways To Improve Your Container Build Process - The New Stack
TNS
VOXPOP
404 - VOXPOP NOT FOUND
Give use a minute to figure out what's going on here ...
Containers / Operations

3 Ways To Improve Your Container Build Process

To replace an image layer on a central registry and have all the running images update individual layers is a handy protocol.
Jul 22nd, 2024 10:00am by
Featued image for: 3 Ways To Improve Your Container Build Process
Image by M W from Pixabay.

As Kubernetes recently celebrated its 10-year anniversary, container images are at the heart of modern-day infrastructure. These container images tend to become more complicated and heavy; a report found that responded reporting that build time had increased by 15.9% between 2020 and 2021.

In this article, I will explore the gains that can come from managing containers well, which will allow us to build lighter images, faster. Two of these tips will be specific to Dockerfile, while the last one uses Pack, which is a CLI tool that supports the use of buildpacks.

Step Ordering

The order of the steps matters when writing a Dockerfile. Docker has a built-in caching mechanism — called layer caching — that caches each step in your file when you build a container image. So, the next time you build an image from the same Dockerfile, it will reuse the cached layers.

However, as soon as a step changes and the cache cannot be used — for example, because the application code has changed — all the following steps will also need to be run again.

Let’s take a look at a section of a Python app Dockerfile. Here, we start with an Ubuntu base image, copy the application code, and then install system packages.


The issue with this ordering is that every time the application code changes, Docker cannot use its cache for the package installation part and will need to download and install the packages again. What should take less than a second might end up taking minutes.

Therefore the right ordering here would be to flip the instructions simply:


I ran an example on a modest server with 16G of memory and 4 vCores with the following Python application. To mimic the size of enterprise applications, I added numpy, scipy, and pandas as dependencies.

When using the Dockerfile with a wrong ordering — when the application code is copied before Python packages are installed — it took an average of 1.7 minutes to rebuild an image after a code change.

When using the Dockerfile with the correct ordering — when the application code is copied before Python packages are installed — it took an average of 0.3 seconds to rebuild an image after a code change, which is 99% faster.

So don’t forget the rule of thumb for Dockerfile, place what is unlikely to change at the top of your Dockerfile and what is more likely to change at the bottom.

Use Multistages

Multistage builds in Dockerfiles optimize the process of building images by allowing multiple FROM statements in the same Dockerfile. This feature helps create smaller, more efficient Docker images. For example, it can be used to separate the build environment from the runtime environment.

Again, I ran an example to see what improvements we could expect. This time, we use a sample Java application with Spring, Spark, and Kafka as dependencies.

Here is the Dockerfile using multistage:


As you can see, I have two FROM statements, one that I use to build my jar, and the second one that I use to build my images.

When using the Dockerfile without multistage — when compilation and container installation are part of the same stage — the image size ended up being 1Gb.

When using the Dockerfile with multistage — when compilation and container installation are part of the same stage — the image size ended up being 500Mo or 50% lighter. Depending on how many container instances need to load this image, this can provide a massive deployment time gain, and this is especially true if you have a geographically distributed infrastructure.

Only Rebase a Specific Layer

This last tip is less well-known than the other two tips we discussed. Here we will use an additional tool — the pack CLI — which is based on the cloud native buildpacks implementation.

Thanks to the way images are generated with Buildpack, the rebase feature allows you to swap out the OS layers (run image) of a container image with a newer version of the run image without re-building the entire container.

To rebase an image, the command is the following:


Here, again, I ran an experiment using a simple Python application, Dockerfile. When rebuilding a new container image using docker build, I went from Ubuntu 22.04 to the latest, which at the time of writing is 24.04. Rebuilding the image with the latest OS version took, on average, two minutes and 26 seconds.

When using the pack CLI rebase feature, rebuilding the container image with the latest OS version took, on average, 43 seconds, or 70% faster.

This is especially useful when you must urgently apply an OS security patch to your images, which will especially matter for companies with thousands or even hundreds of thousands of images to patch.

Conclusion

All these may not matter when running things locally as a developer; however, build speeds matter when working with development pipelines. What might seem trivial gains when small will matter at scale. When you work in an engineering organization with many pipelines — especially with the proliferation of microservices — build and rebuild speeds will be essential to maintaining high shipping velocity. It could mean shifting dev cycles to happen in minutes instead of hours.

Smaller images will offer storage, network, and throughput performance gains and reduce cost when operating at scale.

Finally, faster update times matter when organizations are using a large amount of containers in production. The ability to replace an image layer on a central registry and have all the running images update individual layers is a handy protocol, especially when the alternative is to individually build and redeploy images.

Group Created with Sketch.
TNS owner Insight Partners is an investor in: Docker, Kubernetes.
TNS DAILY NEWSLETTER Receive a free roundup of the most recent TNS articles in your inbox each day.