Cache node_modules with GitLab container registry

02 Nov, 2021GitLab, Docker, Node.js, CI/CD

In the .gitlab-ci.yml you can set the image to be any public docker images available on https://hub.docker.com, based on which to run the CI jobs:

build:
  image: node:lts  # https://hub.docker.com/_/node
  script:
    - npm run build

However, the base image won't be able to provide all the 3rd libraries required to build the project. So you'll need to download them in the CI job.

build:
  image: node:lts
  script:
    - npm ci  # download all the dependencies specified in package.json
    - npm run build

The catch here is that you need to download the dependencies in every single job (which takes about 1 minute for a medium to large scale project), not only in the first job. As the docker runner creates fresh containers for every job.

One of the practice is to cache the dependencies with the cache keyword, but it requires the CI runners to have an external storage service configured, otherwise the the runner won't be able to load the cached dependencies when it's running on a different runner instance.

But the cache requires an external storage service, such as AWS S3 or Google Cloud Storage (alternatively you can set up a MinIO server as the S3 replacement). And the speed of uploading and downloading the cached files is not that optimal (30s - 1min), not to mention network timeouts can happen when the dependencies getting too large. So using cache is not really much time saving.

Thankfully the Container Registry feature is available for self-hosted GitLab instance since 12.10, which is a great way to cache the dependencies.

The idea is to build your own docker images with the dependencies installed, and push the image to the Container Registry. And all the subsequent CI jobs run on that image, which is much faster than downloading the dependencies every time, and usually the image will be cached on the docker runner much more efficiently.

A further improvement is to tag the dependencies image with checksum of the package*.json files, so not only the subsequent jobs in the current pipeline can be faster, but also all the jobs in other pipelines are going to get the speed boost as well, as long as the dependencies are the same (namely the checksum of the package*.json stay unchanged).

So the npm ci script in the earlier example can be replaced with image: $IMAGE_DEPENDENCY:

build:
  image: $IMAGE_DEPENDENCY
  script:
    - npm run build

Step 1. Dockerfile

Create a Dockerfile in your project root directory:

FROM node:lts

WORKDIR /usr/src/app

COPY package*.json ./

# Dependencies are installed in /usr/src/app/node_modules
RUN npm ci --no-optional

RUN rm package*.json

Be aware that the GitLab runner will erase the CI_PROJECT_DIR directory (/builds/org-name/project-name by default) before running the CI job, so you don't want to install your dependencies in that directory.

Step 2. Dependencies installation job

Define the job to build the docker image and generate the IMAGE variable in the .gitlab-ci.yml:

install:
  image: docker:latest
  before_script:
    # https://docs.gitlab.com/ee/user/packages/container_registry/#build-and-push-by-using-gitlab-cicd
    - docker login -u $CI_REGISTRY_USER -p $CI_REGISTRY_PASSWORD $CI_REGISTRY
  script:
    # The tag is based on the combined hash of Dockerfile, package.json and
    # package-lock.json.
    - CHECKSUM=$(sha256sum Dockerfile package.json package-lock.json | sha256sum | head -c 8)

    # https://docs.gitlab.com/ee/ci/variables/predefined_variables.html
    # CI_REGISTRY_IMAGE: predefined variable, equal to the project path.
    - IMAGE_DEPENDENCY=$CI_REGISTRY_IMAGE/dependency:$CHECKSUM

    - docker build --pull --tag $IMAGE_DEPENDENCY .
    - docker push $IMAGE_DEPENDENCY

    - echo "IMAGE_DEPENDENCY=$IMAGE_DEPENDENCY" > deploy.env
  artifacts:
    # https://docs.gitlab.com/ee/ci/yaml/#artifactsreportsdotenv
    # Mark deploy.env as reports artifact to expose IMAGE as an environment
    # variable to the subsequent jobs.
    reports:
      dotenv: deploy.env

Step 3. Subsequent jobs

And finally let's define the global image value and a before_script to link the node_modules for all the subsequent jobs (leave these two blocks at the top level (without indentation, not on the beginning of the file) of the .gitlab-ci.yml to make them applicable to all the jobs):

image: $IMAGE_DEPENDENCY
before_script:
  # Node.js projects need the dependencies to be installed locally, so we just
  # soft link the node_modules from the image to the project directory.
  # If your project doesn't need the dependencies installed locally, you can
  # skip this step, just make sure the dependencies are installed in the proper
  # location in the `Dockerfile`.
  - ln -s /usr/src/app/node_modules ./node_modules

Conclusion

With the help of the GitLab Container Registry, the node_modules dependencies are installed only once as long as they stay untouched. It speeds up the execution by reducing the preparation phase down to about only 5 seconds per job. And if your pipeline has 10 jobs and your team runs the pipeline dozens of times daily, this can save your team few hours of waiting time for the pipeline results. It also helps reduce the cost of running the GitLab CI runner by avoiding the overhead of downloading and installing the same dependencies repeatedly.


Powered by Gatsby. Theme inspired by end2end.

© 2014-2022. Made withby mdluo.