Benefits of repeated apt cache cleans

Question

The main reason people do this is to minimise the amount of data stored in that particular docker layer. When pulling a docker image, you have to pull the entire content of the layer.

For example, imagine the following two layers in the image:

RUN apt-get update
RUN rm -rf /var/lib/apt/lists/*

The first RUN command results in a layer containing the lists, which will ALWAYS be pulled by anyone using your image, even though the next command removes those files (so they’re not accessible). Ultimately those extra files are just a waste of space and time.

On the other hand,

RUN apt-get update && rm -rf /var/lib/apt/lists/*

Doing it within a single layer, those lists are deleted before the layer is finished, so they are never pushed or pulled as part of the image.

So, why have multiple layers which use apt-get install? This is likely so that people can make better use of layers in other images, as Docker will share layers between images if they’re identical in order to save space on the server and speed up builds and pulls.

Leave a Comment Cancel reply