5/5 - (1 vote)

We have already talked about our werf GitOps tool more than once, but this time we would like to share the experience of building the site with the documentation of the project itself – werf.io. This is a regular static site, but its assembly is interesting because it is built using a dynamic number of artifacts.

Lets go into the nuances of the site structure: generating a common menu for all versions, pages with information about releases, etc. – we will not. Instead, we focus on issues and features of the dynamic assembly and a little on the accompanying CI / CD processes.

Introduction: how the site is arranged

To begin with, the werf documentation is stored along with its code. This makes certain development requirements that generally go beyond the scope of this article, but at least we can say that:

  •     New werf functions should not be released without updating the documentation and, conversely, any changes in the documentation imply the release of a new version of werf;
  •     The project has a fairly intensive development: new versions may come out several times a day;
  •     Any manual deployment of a site with a new version of the documentation is at least tedious;
  •     The project adopted an approach of semantic versioning, with 5 channels of stability. The release process involves the sequential passage of versions through the channels in order to increase stability: from alpha to rock-solid;

To hide from the user all  this “inner kitchen” by offering him what “just works”, we made a separate werf installation and update tool – multiwerf. It is enough to indicate the release number and stability channel that you are ready to use, and multiwerf will check if there is a new version on the channel and download it if necessary.

The latest version of werf in each channel is available in the version selection menu on the site. By default, the version of the most stable channel for the latest release opens at werf.io/documentation – it is also indexed by search engines. The documentation for the channel is available at individual addresses (for example, werf.io/v1.0-beta/documentation for beta release 1.0).

In total, the site has the following versions:

  •     root (opens by default)
  •     for each active update channel of each release (for example, werf.io/v1.0-beta).

To generate a specific version of a site in the general case, it is enough to compile it using Jekyll tools by running the appropriate command (jekyll build) in the / docs directory of the werf repository, after switching to the Git tag of the required version.

It remains only to add that:

  •    the utility itself (werf) is used for assembly;
  •     CI / CD processes are based on GitLab CI;
  •     and all of this, of course, works in Kubernetes.

Tasks

Now we formulate tasks that take into account all the specifics described:

  •     After changing the werf version on any update channel, the documentation on the site should be automatically updated.
  •     For development, you need to be able to occasionally view preliminary versions of the site.

Recompilation of the site must be performed after changing the version on any channel from the corresponding Git tags, but in the process of assembling the image we will get the following features:

  •     Since the list of versions on the channels is changing, it is only necessary to reassemble the documentation for the channels where the version has changed. After all, reassembling everything again is not very good.
  •     The set of channels for releases may vary. At some point in time, for example, the version on the channels may not be more stable than the early-access 1.1 release, but over time they will appear – how to change the assembly in this case?

It turns out that the assembly depends on changing of external data.

Approach choice

Alternatively, you can run each required version with separate pod in Kubernetes. This option implies a larger number of objects in the cluster, which will grow with an increase in the number of stable werf releases. And this in turn implies a more complex service: each version has its own HTTP server, and with a small load. Of course, this entails higher costs for resources.

We went along the path of assembling all the necessary versions in one image. Compiled statics of all versions of the site is in a container with NGINX, and traffic to the corresponding Deployment comes through NGINX Ingress. A simple structure – a stateless application – makes it easy to scale Deployment (depending on the load) using Kubernetes itself.

To be more precise, we collect two images: one for the production circuit, the other for the dev circuit. An additional image is used (launched) only on the dev-circuit together with the main one and contains the site version from the review commit, and routing between them is performed using Ingress resources.

werf vs git clone and artifacts

As already mentioned, in order to generate site statics for a specific version of the documentation, you need to build by switching to the corresponding repository tag. One could also do this by cloning the repository each time during assembly, selecting the appropriate tags from the list. However, this is a rather resource-consuming operation and, moreover, requiring the writing of non-trivial instructions … Another serious minus – with this approach, there is no way to cache something during assembly.

Here the werf utility comes to our aid, which implements smart caching and allows the use of external repositories. Using werf to add code from the repository will significantly speed up the build, as werf essentially does repository cloning once, and then only fetch if necessary. In addition, when adding data from the repository, we can only select the necessary directories (in our case, this is the docs directory), which will significantly reduce the amount of data added.

Since Jekyll is a tool designed to compile statics and is not needed in the final image, it would be logical to compile in the werf artifact, and import only the compilation result into the final image.

Writing werf.yaml

So, we decided that we will compile each version in a separate werf artifact. However, we do not know how many of these artifacts will be during the assembly, so we can’t write a fixed assembly configuration (strictly speaking, we can still, but it will not be completely effective).

werf allows you to use Go-templates in your configuration file (werf.yaml), and this makes it possible to generate a config “on the fly” depending on external data (what you need!). External data in our case is information about versions and releases, on the basis of which we collect the necessary number of artifacts and as a result we get two images: werf-doc and werf-dev to run on different paths.

External data is passed through environment variables. Here is their composition:

  •     RELEASES – a line with a list of releases and the corresponding current version of werf, in the form of a list with a space of values ​​in the format <RELEASE NUMBER>% <VERSION NUMBER>. Example: 1.0% v1.0.4-beta.20
  •     CHANNELS = – line with a list of channels and the corresponding current version of werf, in the form of a list with a space of values ​​in the format <CHANNEL>% <VERSION NUMBER>. Example: 1.0-beta% v1.0.4-beta.20 1.0-alpha% v1.0.5-alpha.22
  •     ROOT_VERSION – The version of the werf release for displaying by default on the site (it is not always necessary to display the documentation for the highest release number). Example: v1.0.4-beta.20
  •     REVIEW_SHA – hash of the review commit from which you need to collect the version for the test loop.

These variables will be populated in the pipeline GitLab CI, and how exactly is described below.

First of all, for convenience, we define Go-template variables in werf.yaml by assigning them values ​​from environment variables:

{{ $_ := set . "WerfVersions" (cat (env "CHANNELS") (env "RELEASES") | splitList " ") }}
{{ $Root := . }}
{{ $_ := set . "WerfRootVersion" (env "ROOT_VERSION") }}
{{ $_ := set . "WerfReviewCommit" (env "REVIEW_SHA") }}

The description of the artifact for compiling the statics of the site version is generally the same for all the cases we need (including the generation of the root version, as well as the version for the dev circuit). Therefore, we will place it in a separate block using the define function  for subsequent reuse using include. We will pass the following arguments to the template:

  •     Version – generated version (tag name);
  •     Channel – the name of the update channel for which the artifact is generated;
  •     Commit – hash of the commit, if the artifact is generated for the review commit;
  •     Context.

Artifact Template Description

{{- define "doc_artifact" -}}
{{- $Root := index . "Root" -}}
artifact: doc-{{ .Channel }}
from: jekyll/builder:3
mount:
- from: build_dir
  to: /usr/local/bundle
ansible:
  install:
  - shell: |
      export PATH=/usr/jekyll/bin/:$PATH
  - name: "Install Dependencies"
    shell: bundle install
    args:
      executable: /bin/bash
      chdir: /app/docs
  beforeSetup:
{{- if .Commit }}
  - shell: echo "Review SHA - {{ .Commit }}."
{{- end }}
{{- if eq .Channel "root" }}
  - name: "releases.yml HASH: {{ $Root.Files.Get "releases.yml" | sha256sum }}"
    copy:
      content: |
{{ $Root.Files.Get "releases.yml" | indent 8 }}
      dest:  /app/docs/_data/releases.yml
{{- else }}
  - file:
      path: /app/docs/_data/releases.yml
      state: touch
{{- end }}
  - file:
      path: "{{`{{ item }}`}}"
      state: directory
      mode: 0777
    with_items:
    - /app/main_site/
    - /app/ru_site/
  - file:
      dest: /app/docs/pages_ru/cli
      state: link
      src: /app/docs/pages/cli
  - shell: |
      echo -e "werfVersion: {{ .Version }}\nwerfChannel: {{ .Channel }}" > /tmp/_config_additional.yml
      export PATH=/usr/jekyll/bin/:$PATH
{{- if and (ne .Version "review") (ne .Channel "root") }}
{{- $_ := set . "BaseURL" ( printf "v%s" .Channel ) }}
{{- else if ne .Channel "root" }}
{{- $_ := set . "BaseURL" .Channel }}
{{- end }}
      jekyll build -s /app/docs  -d /app/_main_site/{{ if .BaseURL }} --baseurl /{{ .BaseURL }}{{ end }} --config /app/docs/_config.yml,/tmp/_config_additional.yml
      jekyll build -s /app/docs  -d /app/_ru_site/{{ if .BaseURL }} --baseurl /{{ .BaseURL }}{{ end }} --config /app/docs/_config.yml,/app/docs/_config_ru.yml,/tmp/_config_additional.yml
    args:
      executable: /bin/bash
      chdir: /app/docs
git:
- url: https://github.com/flant/werf.git
  to: /app/
  owner: jekyll
  group: jekyll
{{- if .Commit }}
  commit: {{ .Commit }}
{{- else }}
  tag: {{ .Version }}
{{- end }}
  stageDependencies:
    install: ['docs/Gemfile','docs/Gemfile.lock']
    beforeSetup: '**/*'
  includePaths: 'docs'
  excludePaths: '**/*.sh'
{{- end }}

The name of the artifact must be unique. We can achieve this, for example, by adding the name of the channel (the value of the .Channel variable) as a suffix to the name of the artifact: artifact: doc – {{.Channel}}. But you need to understand that when importing from artifacts, you will need to refer to the same names.

When describing an artifact, a werf feature like mount is used. Mounting with the build_dir service directory allows you to maintain the Jekyll cache between pipeline starts, which greatly speeds up rebuilding.

You may also have noticed the use of the releases.yml file – this is the YAML file with release data requested from github.com (an artifact obtained by executing the pipeline). It is needed when compiling the site, but in the context of the article we are interested in the fact that only one artifact, the site version root artifact, is dependent on its state (in other artifacts it is not needed).

This is implemented using the conditional operator if Go-templates and construction {{$ Root.Files.Get “releases.yml” | sha256sum}} in the stage stage. This works as follows: when building the artifact for the root version (the .Channel variable is root), the hash of the releases.yml file affects the signature of the entire stage, since it is a component of the name of the Ansible job (name parameter). Thus, if you change the contents of the releases.yml file, the corresponding artifact will be rebuilt.

Pay attention also to working with an external repository. Only the / docs directory is added to the image of the artifact from the werf repository, and depending on the parameters passed, the data of the immediately needed tag or review commit is added.

To use the artifact template to generate an artifact description of the transferred versions of channels and releases, we organize a loop on the variable WerfVersions in werf.yaml.

{{ range .WerfVersions -}}
{{ $VersionsDict := splitn "%" 2 . -}}
{{ dict "Version" $VersionsDict._1 "Channel" $VersionsDict._0 "Root" $Root | include "doc_artifact" }}
---
{{ end -}}

Because the loop will generate several artifacts (we hope so), it is necessary to take into account the separator between them – the sequence. As previously determined, when the template is called in a loop, we pass the version parameters, URL and root context.

Similarly, but already without a loop, we call the artifact template for “special cases”: for the root version, as well as the version from the review commit:

{{ dict "Version" .WerfRootVersion "Channel" "root" "Root" $Root  | include "doc_artifact" }}
---
{{- if .WerfReviewCommit }}
{{ dict "Version" "review" "Channel" "review" "Commit" .WerfReviewCommit "Root" $Root  | include "doc_artifact" }}
{{- end }}

Please note that the artifact for the review commit will only be collected if the .WerfReviewCommit variable is set.

Artifacts are ready – it’s time to import!

The final image, designed to run in Kubernetes, is a regular NGINX, in which the nginx.conf server configuration file and statics from artifacts are added. In addition to the artifact of the root version of the site, we need to repeat the loop on the variable .WerfVersions to import artifacts of the versions of channels and releases + observe the rule for naming artifacts that we adopted earlier. Since each artifact stores versions of the site for two languages, we import them into the places provided by the configuration.

The additional image, which, along with the main image, is launched on the dev circuit, contains only two versions of the site: the version from the review commit and the root version of the site (there are general assets and, if you remember, release data). Thus, the additional image from the main one will differ only in the import section (and, of course, in the name).

mage: werf-dev
...
import:
- artifact: doc-root
  add: /app/_main_site
  to: /app/main_site
  before: setup
- artifact: doc-root
  add: /app/_ru_site
  to: /app/ru_site
  before: setup
{{- if .WerfReviewCommit  }}
- artifact: doc-review
  add: /app/_main_site
  to: /app/main_site/review
  before: setup
- artifact: doc-review
  add: /app/_ru_site
  to: /app/ru_site/review
  before: setup
{{- end }}

As noted above, the artifact for the review commit will only be generated when the set environment variable REVIEW_SHA is launched. It would be possible not to generate a werf-dev image at all if there is no REVIEW_SHA environment variable, but in order for werf-based wocking of Docker image policies to work for the werf-dev image, we will leave it to be collected only with the root version artifact (anyway, it already assembled), to simplify the structure of the pipeline.

Assembly is ready! We pass to CI / CD and important nuances.

Pipeline in GitLab CI and features of dynamic assembly

When starting the assembly, we need to set the environment variables used in werf.yaml. This does not apply to the REVIEW_SHA variable, which we will set when the pipeline is called from the GitHub hook.

We will generate the necessary external data in the generate_artifacts Bash script, which will generate two pipeline GitLab artifacts:

  •     releases.yml file with release data
  •     file common_envs.sh containing environment variables for export.

You will find the contents of the generate_artifacts file in our example repository. Obtaining data itself is not the subject of the article, but the common_envs.sh file is important to us, because the work of werf depends on it. An example of its contents:

export RELEASES='1.0%v1.0.6-4'
export CHANNELS='1.0-alpha%v1.0.7-1 1.0-beta%v1.0.7-1 1.0-ea%v1.0.6-4 1.0-stable%v1.0.6-4 1.0-rock-solid%v1.0.6-4'
export ROOT_VERSION='v1.0.6-4'

You can use the output of such a script, for example, with the source Bash function.

And now the most interesting. For both the build and deploy applications to work correctly, you must make werf.yaml the same for at least one pipeline. If this condition is not met, then the signatures of the stages that werf calculates during assembly and, for example, the deployment, will be different. This will lead to a deployment error, as the image required for deployment will be absent.

In other words, if during the assembly of the site image the information on releases and versions is one, and at the time of the release a new version is released and the environment variables have different values, then the deployment will fail with an error: after all, the artifact of the new version has not yet been collected.

If the generation of werf.yaml depends on external data (for example, a list of current versions, as in our case), then the composition and values ​​of such data should be recorded within the pipeline. This is especially important if the external parameters change quite often.

We will receive and record external data at the first stage of the pipeline in GitLab (Prebuild) and transfer them further as a GitLab CI artifact. This will allow you to start and restart pipeline’s tasks (build, deploy, cleanup) with the same configuration in werf.yaml.

The contents of the Prebuild stage of the .gitlab-ci.yml file:

Prebuild:
  stage: prebuild
  script:
    - bash ./generate_artifacts 1> common_envs.sh
    - cat ./common_envs.sh
  artifacts:
    paths:
      - releases.yml
      - common_envs.sh
    expire_in: 2 week

By capturing external data in an artifact, you can build and deploy using the standard GitLab CI pipeline stages: Build and Deploy. We launch the pipeline by hooks from the werf GitHub repository (i.e. when changing the repository on GitHub). The data for them can be taken in the properties of the GitLab project in the CI / CD Settings -> Pipeline triggers section, and then create the corresponding Webhook (Settings -> Webhooks) in GitHub.

The assembly phase will look like this:

Build:
  stage: build
  script:
    - type multiwerf && . $(multiwerf use 1.0 alpha --as-file)
    - type werf && source <(werf ci-env gitlab --tagging-strategy tag-or-branch --verbose)
    - source common_envs.sh
    - werf build-and-publish --stages-storage :local
  except:
    refs:
      - schedules
  dependencies:
    - Prebuild

GitLab will add two artifacts from the Prebuild stage to the build phase, so we export the variables with prepared input using the source common_envs.sh construct. We start the assembly phase in all cases, except for the scheduled launch of the pipeline. According to the schedule, the pipeline will be launched for cleaning – we do not need to build in this case.

At the deployment stage, we describe two tasks – separately for the deployment to production and dev circuits, using the YAML template:

.base_deploy: &base_deploy
  stage: deploy
  script:
    - type multiwerf && . $(multiwerf use 1.0 alpha --as-file)
    - type werf && source <(werf ci-env gitlab --tagging-strategy tag-or-branch --verbose)
    - source common_envs.sh
    - werf deploy --stages-storage :local
  dependencies:
    - Prebuild
  except:
    refs:
      - schedules

Deploy to Production:
  <<: *base_deploy
  variables:
    WERF_KUBE_CONTEXT: prod
  environment:
    name: production
    url: werf.io
  only:
    refs:
      - master
  except:
    variables:
      - $REVIEW_SHA
    refs:
      - schedules

Deploy to Test:
  <<: *base_deploy
  variables:
    WERF_KUBE_CONTEXT: dev
  environment:
    name: test
    url: werf.test.flant.com
  except:
    refs:
      - schedules
  only:
    variables:
      - $REVIEW_SHA

The tasks essentially differ only by indicating the context of the cluster where werf should execute the deployment (WERF_KUBE_CONTEXT) and setting the environment variables of the contour (environment.name and environment.url), which are then used in the Helm chart templates.

Final touch

Since werf versions are released quite often, new images will often be collected, and the Docker Registry will constantly grow. Therefore, it is necessary to configure automatic cleaning of images by policy. It is very easy to do.

For implementation you will need:

  •      Add a purification step to .gitlab-ci.yml;
  •      Add periodic cleanup tasks;
  •      Set the environment variable with write access token.

Add the cleanup stage to .gitlab-ci.yml:

Cleanup:
  stage: cleanup
  script:
    - type multiwerf && . $(multiwerf use 1.0 alpha --as-file)
    - type werf && source <(werf ci-env gitlab --tagging-strategy tag-or-branch --verbose)
    - source common_envs.sh
    - docker login -u nobody -p ${WERF_IMAGES_CLEANUP_PASSWORD} ${WERF_IMAGES_REPO}
    - werf cleanup --stages-storage :local
  only:
    refs:
      - schedules

Almost all of us have already seen this a little higher – only for cleaning you need to first log in to the Docker Registry with a token that has rights to delete images in the Docker Registry (the automatically issued GitLab CI task token does not have such rights). The token must be entered into GitLab in advance and its value must be specified in the environment variable WERF_IMAGES_CLEANUP_PASSWORD of the project (CI / CD Settings -> Variables).

Adding a cleaning task with the necessary schedule is done in CI / CD -> Schedules

That’s it: the project in the Docker Registry will no longer constantly grow from unused images.

Result

  1.   We got a logical build structure: one artifact per version.
  2.     The assembly is universal and does not require manual changes when new versions of werf are released: the documentation on the site is automatically updated.
  3.     Two images are going for different contours.
  4.     It works fast because Caching is used to the maximum  when a new version of werf is released or when a GitHub hook is called for a review commit, only the corresponding artifact with the modified version is rebuilt.
  5.     No need to think about deleting unused images: werf policy cleanup will maintain order in the Docker Registry.

Conclusions

  •     Using werf allows the assembly to work quickly because of caching both the assembly itself and caching when working with external repositories.
  •     Working with external Git repositories eliminates the need to clone the repository every time completely or reinvent the wheel with tricky optimization logic. werf uses the cache and does cloning only once, and then uses fetch and only when necessary.
  •     The ability to use Go-templates in the werf.yaml assembly configuration file allows you to describe the assembly, the result of which depends on external data.
  •     Using mounts in werf significantly speeds up the collection of artifacts – due to the cache, which is common to all pipeline.
  •     werf makes cleaning easy, which is especially true for dynamic builds.