Thursday, October 13, 2022

Bitnami Improves Container Catalog Size

Authored by Alejandro Gómez, R&D Manager at VMware

Introduction

In a previous post, we shared details of an analysis we conducted to optimize our container’s image sizes in order to improve our end users' experience. After some work, on September 9, 2022, we enabled the --squash option when building the container’s images, and we are already seeing some improvements.

Changes at DockerHub

After enabling the --squash option, we are seeing an important change on DockerHub repositories. For example, for Pytorch and Magento, before enabling the --squash option, the compressed size was 616.64MB (2.14GB decompressed) and 391.61 MB (1.35GB decompressed) respectively.




On the other hand, checking the squashed images, we can observe how the container’s image sizes are lower for those products: Pytorch with a compressed size of 327.58MB (1.12GB decompressed) and Magento with a compressed size of 269.51MB (906MB decompressed).





Here are the size reductions we observed for both assets:
  • Pytorch: 46.9% reduction for the compressed size and 47.7% for the decompressed size
  • Magento: 32.2% reduction for the compressed size and 34.64% for the decompressed size


How will end users see the benefits of this change?

End users can easily check the benefits of the improvements when they try to use the container’s images. We did a quick test by pulling the mentioned non-squashed and the squashed ones to see how much time the test needed. The script used for the test was this simple one:

#!/bin/bash


docker system prune -a -f

start_time=$SECONDS

non_squashed_images=(bitnami/magento:2.4.5-debian-11-r9 bitnami/pytorch:1.12.1-debian-11-r10)

for image in "${non_squashed_images[@]}"

do

    docker pull $image

done

non_squashed_elapsed=$(( SECONDS - start_time ))


docker system prune -a -f

start_time=$SECONDS

squashed_images=(bitnami/magento:2.4.5-debian-11-r10 bitnami/pytorch:1.12.1-debian-11-r11)

for image in "${squashed_images[@]}"

do

    docker pull $image

done

squashed_elapsed=$(( SECONDS - start_time ))


echo "Time pulling non-squashed images: $non_squashed_elapsed s"

echo "Time pulling squashed images: $squashed_elapsed s"



After running the script, the output was really interesting, not only in terms of time, but also important things about cached base image layer:

Total reclaimed space: 0B

2.4.5-debian-11-r9: Pulling from bitnami/magento

3b5e91f25ce6: Pulling fs layer

… … …

… … …

Digest: sha256:bbdde3cea27eaec4264f0464d8491600e24d5b726365d63c24a92ba156344024

Status: Downloaded newer image for bitnami/magento:2.4.5-debian-11-r9

docker.io/bitnami/magento:2.4.5-debian-11-r9

1.12.1-debian-11-r10: Pulling from bitnami/pytorch

3b5e91f25ce6: Already exists


Digest: sha256:1a238c5f74fe29afb77a08b5fa3aefd8d22c3ca065bbd1d8a278baf93585814d

Status: Downloaded newer image for bitnami/pytorch:1.12.1-debian-11-r10

docker.io/bitnami/pytorch:1.12.1-debian-11-r10

Deleted Images:

untagged: bitnami/magento:2.4.5-debian-11-r9

untagged: bitnami/magento@sha256:bbdde3cea27eaec4264f0464d8491600e24d5b726365d63c24a92ba156344024

… … …

… … …

deleted: sha256:7ec26d70ae9c46517aedc0931c2952ea9e5f30a50405f9466cb1f614d52bbff7

deleted: sha256:d745f418fc70bf8570f4b4ebefbd27fb868cda7d46deed2278b9749349b00ce2


Total reclaimed space: 3.415GB

2.4.5-debian-11-r10: Pulling from bitnami/magento

3b5e91f25ce6: Pulling fs layer

Digest: sha256:7775f3bc1cfb81c0b39597a044d28602734bf0e04697353117f7973739314b9c

Status: Downloaded newer image for bitnami/magento:2.4.5-debian-11-r10

docker.io/bitnami/magento:2.4.5-debian-11-r10

1.12.1-debian-11-r11: Pulling from bitnami/pytorch

3b5e91f25ce6: Already exists

Digest: sha256:3273861a829d49e560396aa5d935476ab6131dc4080b4f9f6544ff1053a36035

Status: Downloaded newer image for bitnami/pytorch:1.12.1-debian-11-r11

docker.io/bitnami/pytorch:1.12.1-debian-11-r11


Total reclaimed space: 1.947GB


Time pulling non-squashed images: 165 seconds

Time pulling squashed images: 91 seconds


As we can observe in the script output:
  • The uncompressed size dropped from 3.415GB to 1.947GB (43% less).
  • Squashed images pulling was 45% faster than the normal ones.
  • The base image layer with digest 3b5e91f25ce6 used always from the cache in the second pull (for non-squashed and squashed ones).
So, an end user who uses, for example, Magento or Pytorch would see those savings. The previous test was using one of the biggest images and decreased like Pytorch, but what about other solutions and with normal end users’ products? We will do another test with a product like, for example, WordPress (that uses WordPress and MariaDB). For this test, we will use the WordPress docker-compose.yml that exists in the Bitnami containers repository. We modified the existent MariaDB and WordPress images for:
  • Non-squashed:
    • bitnami/wordpress:6.0.2-debian-11-r2
    • bitnami/mariadb:10.6.9-debian-11-r7
  • Squashed:
    • bitnami/wordpress:6.0.2-debian-11-r3
    • bitnami/mariadb:10.6.9-debian-11-r8

For this test, we used the following script (that uses two docker-compose files created with the mentioned images before):

#!/bin/bash


docker system prune -a -f

start_time=$SECONDS

docker-compose -f wordpress-docker-compose-non-squashed.yml up -d

non_squashed_elapsed=$(( SECONDS - start_time ))

docker-compose -f wordpress-docker-compose-non-squashed.yml down


docker system prune -a -f

start_time=$SECONDS

docker-compose -f wordpress-docker-compose-squashed.yml up -d

squashed_elapsed=$(( SECONDS - start_time ))

docker-compose -f wordpress-docker-compose-squashed.yml down

docker system prune -a -f

echo "Time pulling and starting non-squashed WordPress: $non_squashed_elapsed s"

echo "Time pulling and starting squashed WordPress: $squashed_elapsed s"


With this test, the squashed solution used 974MB compared to the 1.161GB (16.11% of savings) that the non-squashed one used and, in terms of time, used 52 seconds for the squashed solution compared to 60 seconds (14.43% of savings) used by the non-squashed one. Also, in this final basic test, the improvement is useful (the major benefits come for data solutions like Pytorch or Spark, for example). When we analyzed the main use cases, we could see that users mainly enjoy benefits from this improvement, though we are aware that in some use cases, the container layers would make sense to be preserved. However, based on our research, it’s beneficial for the majority of the users.

Will it be the last thing we will do to improve the catalog? Definitely not! We have some things "in the oven" that we are cooking up in order to keep pushing the catalog improvements forward. And—remember!—our catalog is open source, so you can feel free to contribute in our containers repository too, as explained in our previous post.