Docker 技巧(2): 用multi-stage build 建立最小 python image

原來的 Dockerfile 長這樣:

FROM python:3.8

WORKDIR /webapp

COPY requirements.txt .

RUN pip install -r requirements.txt

COPY . .

EXPOSE 8000

ENTRYPOINT ["python","main.py"]

image 大小約為 1.18 GB

這是用最完整的 python image 製作的,因此檔案非常大!

python

使用 multi-stage 的 Dockerfile 長這樣:

###### stage 1 #########
FROM python:3.8 AS compile

RUN python -m venv /opt/venv

ENV PATH="/opt/venv/bin:$PATH"

WORKDIR /webapp

COPY requirements.txt .

RUN pip install -r requirements.txt

###### stage 2 #########
FROM python:3.8 AS runtime

COPY --from=compile /opt/venv /opt/venv

ENV PATH="/opt/venv/bin:$PATH"

WORKDIR /webapp

COPY . .

EXPOSE 8000

ENTRYPOINT ["python","main.py"]

image 大小約為 1.1 GB

multi-stage build 即是透過在 stage 1 建立一個 virtualenv 環境,並將所有package 安裝至該環境,然後在stage 2 將 virtualenv 中的資料複製過來,並加上路徑到 PATH,這樣可以稍稍減小image size。

在pyhton 專案中使用multi-stage build 有多種方式,其中最好用的是 virtualenv,參考: Multi-stage builds #2: Python specifics—virtualenv, –user, and other methods

也可以在 stage 1 先 compile 成 wheel (不過似乎需要先定義 setup.py) ,複製到 stage 2 再安裝,參考: How do I reduce a python (docker) image size using a multi-stage build? - Stack Overflow

另外,雖然 docker build 無法 mount volum,但是透過 muti-stage build,我們可以從任何 image 中複製資料過來。例如,我們已經有一個龐大的 python project image,其中已經在 venv 中安裝了許多 package,此時我們可以直接從該 image 中將 venv 複製到新的 image 中,以此為基礎再安裝新的 package,如此就不用每次從頭再安裝一次所有package。參考:Can You Mount a Volume While Building Your Docker Image to Cache Dependencies? · vsupalov.com

python slim

使用 python slim 版 image + multi-stage 的 Dockerfile 長這樣:

###### stage 1 #########
FROM python:3.8-slim AS compile

RUN python -m venv /opt/venv

ENV PATH="/opt/venv/bin:$PATH"

WORKDIR /webapp

COPY requirements.txt .

RUN pip install -r requirements.txt

###### stage 2 #########
FROM python:3.8-slim AS runtime

COPY --from=compile /opt/venv /opt/venv

ENV PATH="/opt/venv/bin:$PATH"

WORKDIR /webapp

COPY . .

EXPOSE 8000

ENTRYPOINT ["python","main.py"]

image 大小約為 340 MB

slim版是python 官方提供的精簡版image 。一般來說,用 slim 版就非常夠用了,image size 夠小,building 速度也快。

生產環境中建議使用此版比較少問題。

但是若想追求極致最小 image size,還可以嘗試 alpine 版本。

python alpine

使用 python alpine 版 image 長這樣:

###### stage 1 #########
FROM python:3.8-alpine3.13 AS compile

RUN apk --no-cache --update-cache add g++

RUN apk --no-cache --update-cache add libffi-dev openssl-dev python3-dev

RUN apk --no-cache --update-cache add gcc musl-dev cargo

RUN python -m venv /opt/venv

ENV PATH="/opt/venv/bin:$PATH"

WORKDIR /webapp

COPY requirements.txt .

RUN pip install -r requirements.txt

###### stage 2 #########
FROM python:3.8-alpine3.13 AS runtime

RUN apk --no-cache --update-cache add gcc

COPY --from=compile /opt/venv /opt/venv

ENV PATH="/opt/venv/bin:$PATH"

WORKDIR /webapp

COPY . .

EXPOSE 8000

ENTRYPOINT ["python","main.py"]

image 大小約為 261 MB

這個做法主要是在 stage 1 安裝在 alpine compile 必須要的 dependency,然後指定 virtualenv 路徑為 /opt/venv ,執行 pip install 安裝,安裝過程中會build出binary,路徑為 /opt/venv/bin 。因此到了 stage 2,我們將 /opt/venv 複製過來,並且在 PATH 中新增 /opt/venv/bin ,就可以找到安裝的套件,但是還要安裝 gcc 才有辦法正常執行。

但是從頭開始 compile 套件(如: numpy, pandas, cryptography...等)需要耗費至少20分鐘以上時間,因此不太適合生產環境使用。

因為使用到 cryptography 這個套件,根據 cryptography/installation.rst at main · pyca/cryptography  說明:

Alpine

Warning

The Rust available by default in Alpine < 3.12 is older than the minimum supported version. See the :ref:Rust installation instructions <installation:Rust> for information about installing a newer Rust.

$ sudo apk add gcc musl-dev python3-dev libffi-dev openssl-dev cargo

If you get an error with openssl-dev you may have to use libressl-dev.

需要安裝 alpine 的套件 gcc , musl-dev, python3-dev, libffi-dev, openssl-dev, cargo

若沒有安裝 musl-dev 則會出現下列 error:

#14 11.76       compile options: '-Inumpy/core/src/common -Inumpy/core/src -Inumpy/core -Inumpy/core/src/npymath -Inumpy/core/src/multiarray -Inumpy/core/src/umath -Inumpy/core/src/npysort -I/opt/venv/include -I/usr/local/include/python3.8 -c'
#14 11.76       gcc: _configtest.c
#14 11.76       gcc _configtest.o -o _configtest
#14 11.76       /usr/lib/gcc/x86_64-alpine-linux-musl/10.2.1/../../../../x86_64-alpine-linux-musl/bin/ld: cannot find Scrt1.o: No such file or directory
#14 11.76       /usr/lib/gcc/x86_64-alpine-linux-musl/10.2.1/../../../../x86_64-alpine-linux-musl/bin/ld: cannot find crti.o: No such file or directory
#14 11.76       /usr/lib/gcc/x86_64-alpine-linux-musl/10.2.1/../../../../x86_64-alpine-linux-musl/bin/ld: cannot find -lssp_nonshared
#14 11.76       collect2: error: ld returned 1 exit status
#14 11.76       failure.
#14 11.76       removing: _configtest.c _configtest.o _configtest.o.d
#14 11.76       Traceback (most recent call last):
#14 11.76         File "<string>", line 1, in <module>
#14 11.76         File "/tmp/pip-install-6zhcypre/numpy_2a8e86b381da4601b17c91bd412f560d/setup.py", line 443, in <module>
#14 11.76           setup_package()
#14 11.76         File "/tmp/pip-install-6zhcypre/numpy_2a8e86b381da4601b17c91bd412f560d/setup.py", line 435, in setup_package
#14 11.76           setup(**metadata)
#14 11.76         File "/tmp/pip-install-6zhcypre/numpy_2a8e86b381da4601b17c91bd412f560d/numpy/distutils/core.py", line 171, in setup
#14 11.76           return old_setup(**new_attr)
#14 11.76         File "/opt/venv/lib/python3.8/site-packages/setuptools/__init__.py", line 153, in setup    
#14 11.76           return distutils.core.setup(**attrs)
#14 11.76         File "/usr/local/lib/python3.8/distutils/core.py", line 148, in setup
#14 11.76           dist.run_commands()
#14 11.76         File "/usr/local/lib/python3.8/distutils/dist.py", line 966, in run_commands
#14 11.76           self.run_command(cmd)
#14 11.76         File "/usr/local/lib/python3.8/distutils/dist.py", line 985, in run_command
#14 11.76           cmd_obj.run()
#14 11.76         File "/tmp/pip-install-6zhcypre/numpy_2a8e86b381da4601b17c91bd412f560d/numpy/distutils/command/install.py", line 62, in run
#14 11.76           r = self.setuptools_run()
executor failed running [/bin/sh -c pip install -r requirements.txt]: exit code: 1
#14 11.76         File "/tmp/pip-install-6zhcypre/numpy_2a8e86b381da4601b17c91bd412f560d/numpy/distutils/command/install.py", line 36, in setuptools_run
#14 11.76           return distutils_install.run(self)
#14 11.76         File "/usr/local/lib/python3.8/distutils/command/install.py", line 545, in run
#14 11.76           self.run_command('build')
#14 11.76         File "/usr/local/lib/python3.8/distutils/cmd.py", line 313, in run_command
#14 11.76           self.distribution.run_command(command)
#14 11.76         File "/usr/local/lib/python3.8/distutils/dist.py", line 985, in run_command
#14 11.76           cmd_obj.run()
#14 11.76         File "/tmp/pip-install-6zhcypre/numpy_2a8e86b381da4601b17c91bd412f560d/numpy/distutils/command/build.py", line 47, in run
#14 11.76           old_build.run(self)
#14 11.76         File "/usr/local/lib/python3.8/distutils/command/build.py", line 135, in run
#14 11.76           self.run_command(cmd_name)
#14 11.76         File "/usr/local/lib/python3.8/distutils/cmd.py", line 313, in run_command
#14 11.76           self.distribution.run_command(command)
#14 11.76         File "/usr/local/lib/python3.8/distutils/dist.py", line 985, in run_command
#14 11.76           cmd_obj.run()
#14 11.76         File "/tmp/pip-install-6zhcypre/numpy_2a8e86b381da4601b17c91bd412f560d/numpy/distutils/command/build_src.py", line 142, in run
#14 11.76           self.build_sources()
#14 11.76         File "/tmp/pip-install-6zhcypre/numpy_2a8e86b381da4601b17c91bd412f560d/numpy/distutils/command/build_src.py", line 153, in build_sources
#14 11.76           self.build_library_sources(*libname_info)
#14 11.76         File "/tmp/pip-install-6zhcypre/numpy_2a8e86b381da4601b17c91bd412f560d/numpy/distutils/command/build_src.py", line 286, in build_library_sources
#14 11.76           sources = self.generate_sources(sources, (lib_name, build_info))
#14 11.76         File "/tmp/pip-install-6zhcypre/numpy_2a8e86b381da4601b17c91bd412f560d/numpy/distutils/command/build_src.py", line 369, in generate_sources
#14 11.76           source = func(extension, build_dir)
#14 11.76         File "numpy/core/setup.py", line 669, in get_mathlib_info
#14 11.76           raise RuntimeError("Broken toolchain: cannot link a simple C program")
#14 11.76       RuntimeError: Broken toolchain: cannot link a simple C program
#14 11.76       ----------------------------------------
#14 11.76   ERROR: Command errored out with exit status 1: /opt/venv/bin/python -u -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-6zhcypre/numpy_2a8e86b381da4601b17c91bd412f560d/setup.py'"'"'; __file__='"'"'/tmp/pip-install-6zhcypre/numpy_2a8e86b381da4601b17c91bd412f560d/setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(__file__) if os.path.exists(__file__) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record /tmp/pip-record-firl0hv7/install-record.txt --single-version-externally-managed --prefix /tmp/pip-build-env-61w2h4w5/overlay --compile --install-headers /tmp/pip-build-env-61w2h4w5/overlay/include/site/python3.8/numpy Check the logs for full command output.
#14 11.76   ----------------------------------------
#14 11.76 WARNING: Discarding https://files.pythonhosted.org/packages/31/29/ede692aa6547dfc1f07a4d69e8411b35225218bcfbe9787e78b67a35d103/pandas-1.0.5.tar.gz#sha256=69c5d920a0b2a9838e677f78f4dde506b95ea8e4d30da25859db6469ded84fa8 (from https://pypi.org/simple/pandas/) (requires-python:>=3.6.1). Command errored out with exit status 1: /opt/venv/bin/python /tmp/pip-standalone-pip-rkjyspz4/__env_pip__.zip/pip install --ignore-installed 
--no-user --prefix /tmp/pip-build-env-61w2h4w5/overlay --no-warn-script-location --no-binary :none: --only-binary :none: -i https://pypi.org/simple -- setuptools wheel 'Cython>=0.29.13' 'numpy==1.13.3; python_version=='"'"'3.6'"'"' and platform_system!='"'"'AIX'"'"'' 'numpy==1.14.5; python_version=='"'"'3.7'"'"' and platform_system!='"'"'AIX'"'"'' 'numpy==1.17.3; python_version>='"'"'3.8'"'"' and platform_system!='"'"'AIX'"'"'' 'numpy==1.16.0; python_version=='"'"'3.6'"'"' and platform_system=='"'"'AIX'"'"'' 'numpy==1.16.0; python_version=='"'"'3.7'"'"' and platform_system=='"'"'AIX'"'"'' 'numpy==1.17.3; python_version>='"'"'3.8'"'"' and platform_system=='"'"'AIX'"'"'' Check the logs for full command output.

若沒有安裝 cargo,則提示需要安裝 rust compiler:

#15 1138.2   error: can't find Rust compiler
#15 1138.2
#15 1138.2   If you are using an outdated pip version, it is possible a prebuilt wheel is available for this 
package but pip is not able to install from it. Installing from the wheel would avoid the need for a Rust compiler.
#15 1138.2
#15 1138.2   To update pip, run:
#15 1138.2
#15 1138.2       pip install --upgrade pip
#15 1138.2
#15 1138.2   and then retry package installation.
#15 1138.2
#15 1138.2   If you did intend to build this package from source, try installing a Rust compiler from your system package manager and ensure it is on the PATH during installation. Alternatively, rustup (available at https://rustup.rs) is the recommended way to download and update the Rust compiler toolchain.
#15 1138.2
#15 1138.2   This package requires Rust >=1.41.0.

沒裝 g++ 會無法build pandas:

#15 892.7   gcc: fatal error: cannot execute 'cc1plus': execvp: No such file or directory
#15 892.7   compilation terminated.
#15 892.7   error: command 'gcc' failed with exit status 1
#15 892.7   ----------------------------------------
#15 892.7   ERROR: Failed building wheel for pandas
#15 892.7   Building wheel for numpy (PEP 517): started
#15 978.7   Building wheel for numpy (PEP 517): still running...
#15 1044.5   Building wheel for numpy (PEP 517): still running...
#15 1106.0   Building wheel for numpy (PEP 517): still running...
#15 1107.1   Building wheel for numpy (PEP 517): finished with status 'done'
#15 1107.1   Created wheel for numpy: filename=numpy-1.20.2-cp38-cp38-linux_x86_64.whl size=7178729 sha256=dd5e96ba4b271ed49677f1c6f1ccebdaba034c1800d43c40472ab8c753818bb6
#15 1107.1   Stored in directory: /root/.cache/pip/wheels/e6/85/96/b353a55f333c41eb906284e4b5ff81fe65a5512edee8270e8c
#15 1107.2   Building wheel for bcrypt (PEP 517): started
#15 1108.3   Building wheel for bcrypt (PEP 517): finished with status 'done'
#15 1108.3   Created wheel for bcrypt: filename=bcrypt-3.2.0-cp38-cp38-linux_x86_64.whl size=31795 sha256=c46523b5a897ce6054c60309f64d3ddcccf9215e6f67f96df030f9b10f2339e4
#15 1108.3   Stored in directory: /root/.cache/pip/wheels/af/42/cb/78425eb7d565a75b710a82f213c19f7100b873af40ddb372fc
#15 1108.3   Building wheel for cryptography (PEP 517): started
#15 1178.0   Building wheel for cryptography (PEP 517): still running...
#15 1178.3   Building wheel for cryptography (PEP 517): finished with status 'done'
#15 1178.3   Created wheel for cryptography: filename=cryptography-3.4.7-cp38-cp38-linux_x86_64.whl size=545874 sha256=69ea7c35ae46feb993285c61c01c72cc4ea080ec08075e0cd14f05a86f4cb6dc
#15 1178.3   Stored in directory: /root/.cache/pip/wheels/c6/cf/35/a509feedeb06b2ed1d3dd68821de8617c4578a4d7c7de7fcb4
#15 1178.3 Successfully built numpy bcrypt cryptography
#15 1178.3 Failed to build pandas
#15 1178.3 ERROR: Could not build wheels for pandas which use PEP 517 and cannot be installed directly       

python alpine with pre-built package

另外一種做法是:

stage 1 時,透過 alpine 套件管理器 apk 安裝 community pre-built 的 python package,如 py3-pandas, py3-numpy, py3-cffi, py3-cryptography, py3-bcrypt,...等,這些套件預設會安裝到 /usr/lib/python3.8/site-packages 路徑中,需要添加 ENV PYTHONPATH=/usr/lib/python3.8/site-packages ,之後執行 pip install 才有辦法抓到套件路徑。

stage 2 時,將 stage 1 的 /usr/lib/ 複製到 stage 2 的 /opt/venv/lib/ 中,並且在 LD_LIBRARY_PATH 中加入 /opt/venv/lib ,這樣系統就可以找到上述套件。

###### stage 1 #########
FROM python:3.8-alpine3.13 AS compile

RUN echo "http://dl-8.alpinelinux.org/alpine/edge/community" >> /etc/apk/repositories
RUN apk add --update --no-cache py3-pandas py3-numpy py3-cffi
RUN apk add --update --no-cache py3-cryptography
RUN apk add --update --no-cache py3-bcrypt

ENV PYTHONPATH=/usr/lib/python3.8/site-packages

RUN python -m venv /opt/venv

ENV PATH="/opt/venv/bin${PATH:+:$PATH}"

WORKDIR /webapp

COPY requirements.txt .

RUN pip install -r requirements.txt

###### stage 2 #########
FROM python:3.8-alpine3.13 AS runtime

COPY --from=compile /opt/venv /opt/venv
COPY --from=compile /usr/lib/ /opt/venv/lib/

ENV PATH="/opt/venv/bin${PATH:+:$PATH}"
ENV LD_LIBRARY_PATH="/opt/venv/lib${LD_LIBRARY_PATH:+:$LD_LIBRARY_PATH}"

WORKDIR /webapp

COPY . .

EXPOSE 8000

ENTRYPOINT ["python","main.py"]

這樣做法非常快,大概幾分鐘就 build 完了,而且產生的image大小為 213.07 MB,比自己 compile 還小!

參考:

https://stackoverflow.com/a/57485724/1851492

https://gist.github.com/orenitamar/f29fb15db3b0d13178c1c4dd611adce2

alpine 套件列表: https://pkgs.alpinelinux.org/packages