In this post, I'd like to document the approach I've developed for a highly automated application development and deploment process, based on various online sources and self-built bash glue.

Uhm... what?

Before going more into detail, let's have a quick look what all of these terms even mean:

Argo CD

Argo CD is a declarative, GitOps continuous delivery tool for Kubernetes.

Yeah, that doesn't make more sense, does it?

If you're using Kubernetes, your applications generally consist of a lot of different resources that need to be configured - and in general you have more than one application. While you can easily deploy these by hand or using Helm or kustomize, it can get quickly messy if you have a lot of them.

Argo CD manages applications on Kubernetes for you: Declare your repositories and applications, and it takes care of setting up the resources and ensuring they stay in the way you want them to be. Additionally, it provides a convenient web interface for checking the status of your application and the resources that belong to it.

GitOps Flow

In the cloud world, the current trend is Infrastructure as code. This means that everything you need to deploy your application (let it be cloud resources like storage or Kubernetes objects) is specified declaratively [1] and managed in a Git repository.

This has multiple advantages:

  • You can easily set up multiple environments (e.g. staging + production), and keep them in sync. At work, we have teams that setup environments for each Merge Request - and power them down when the changes are merged.

  • You have a full protocol of who changed what and when in the infrastructure.

  • You can fully automate not just your development process, but also the deployment process as well.

Common tools in these areas are Terraform and Pulumi, and to some extend Ansible.

CI system

To actually run these automations, like building container images and updating deployment repositories, you need a system that can run code on changes. This is generally a CI system like GitHub Actions or GitLab CI.

This approach generally works with any system that can run some scripts after a git commit.

Why all this complexity?

At first, this concept might look quite complicated, especially if you compared to the classic "Just throw a bunch of PHP files on a webserver".

This is true for simple solo developers or small teams with a single application. But for larger teams or more complex environments with many microservices, manual processes greatly add friction to the development.

Another major point: Using latest as container tag is generally not recommended, as you cannot be sure what version you're pulling or running on your systems - any restart could cause a different version to be deployed.

Automation is also documentation: A working (or at least existing) CI pipeline is always better than some outdated wiki pages or a README file not touched in years.

In the end, it even allows you to use code and deploy from a Web IDE like Gitpod, without ever having to run a manual command on your shell. Depending on the policies of an organization, this can also help implementing separation of concern without adding too much friction into the development and deployment process.

The idea

The high-level concept

Before going into the technical details, let's have a high-level view on how the entire process works and the reasoning behind it.

First, we need two git repositories for each software project:

  • A code repository which contains the actual application, as well as a Dockerfile to build a container from it. The container image is built in the CI using a predefined schema, e.g. version tags or unique pipeline IDs. The latter is useful if you work with Merge Requests.

  • A deployment repository which contains the definition of Kubernetes objects for the deployment in the cluster. This can be done with either Helm or kustomize (my personal favorite lately).

This approach is also shortly described in Argo CD's Best Practices, check them out for further reasons behind it.

The workflow

The following image shows the high-level steps of the entire workflow:

  1. Developers update the code of the application and push it to the code repository.

  2. The CI system builds a container image from application code and uploads it to a central container registry, either the one provided by GitLab or your cloud provider.

  3. The CI system clones the deployment repository, updates the image tag to the one just built, and pushes the change into the repository.

  4. Argo CD either detects the changes by itself (there's a git poll by default every few minutes), or you can use it's API and CLI tools to force a sync.

  5. Kubernetes applies the new configuration to the system, e.g. by replacing Pods with newer versions.

The secret sauce

While this sounds reasonable easy on the surface, actually implementing it can be a bit tricky. The major blocker for me had been to get the tag name of the latest container image into the deployment.

I solved this by cloning the deployment repo after building the image, updating the image tag and committing the changes - using the CI system.

This way, you can use unique tags and have a actual git-based log on every deployed revision - with an easy way to revert to old versions.

What you need

To follow this post, it's recommended to have the following already ready in place:

The implementation

Automatic build of container images

In your code repository, add a CI step to build your container. How this is done exactly varies a lot between CI systems and build environments, here's some tutorials for systems I mentioned above:

It's important that you tag your images using a dynamic named based, e.g. based on the commit hash or the pipeline's id. This ensures that every built version is separate, and you can explicitly refer to them later in the deployment repository.

Yaml
# Build a container image and upload it to your registry (GitLab CI syntax)
variables:
  IMAGE_NAME: my.registry.example/project/container:$CI_PIPELINE_ID
script:
  - docker build -t $IMAGE_NAME .
  - docker push $IMAGE_NAME

Deployment keys

To be able to write from the code repository to the deployment repository, we need to create a SSH keypair and configure it in the code repository.

  1. Run the following command on your (Linux or macOS) machine:

    Bash
    # Generate ./deploykey and ./deploykey.pub
    ssh-keygen -q -N "" -t ed25519 -C "deployment key" -f ./deploykey
    

    This generates a SSH keypair without passphrase in the current working directory.

  2. Add the public key (./deploykey.pub) to the deployment repository with write access.

  3. Add the private key (./deploykey) as file variable named DEPLOY_SSH_PRIVATE_KEY to your application code repository's CI settings.

RepositoryApplicationDeployment
Filenamedeploykeydeploykey.pub
GitHub ActionsSettings → Secrets and variables → ActionsSettings → Deploy keys
GitLab CISettings → CI/CD → VariablesSettings → Repositories → Deploy keys
Gitea(depends on your CI)Settings → Deploy keys

Tips:

  • If your code hoster doesn't support repository-scoped deploy keys, you can create a technical user instead and add that to the repository instead.
  • Make sure to add a newline at the end of the environment variable, or you might get weird SSH errors.

Bonus: Manage deployment keys using Terraform (on GitLab)

The previous steps are quite manual and can become tedious when you're setting up multiple projects.

To automate that, I've used this small Terraform module using the GitLab Terraform provider in the past:

terraform-gitlab.hcl
terraform {
  required_providers {
    gitlab = {
      source  = "gitlabhq/gitlab"
      version = "3.16.1"
    }
  }
}

provider "gitlab" {}

#
# Variables
#
variable "code_repository_project_id" {
  type        = string
  description = "GitLab Project ID of the code repository"
}

variable "deployment_repository_project_id" {
  type        = string
  description = "GitLab Project ID of the deployment repository"
}

variable "private_key_ci_variable_name" {
  type        = string
  description = "CI Variable name for the deploy private key (set in the code repository)"
  default     = "DEPLOY_SSH_PRIVATE_KEY"
}

variable "private_key_ci_variable_protected" {
  type        = bool
  description = "Should be private key only be available to protected branches and tags?"
  default     = false
}

#
# Resources
#
resource "tls_private_key" "deploykey" {
  algorithm   = "ED25519"
  ecdsa_curve = "P384"
}

resource "gitlab_deploy_key" "deployment_repo_deploy_key" {
  project  = var.deployment_repository_project_id
  title    = "GitOps Deployment Key (Terraform Managed)"
  key      = trimspace(tls_private_key.deploykey.public_key_openssh)
  can_push = true
}

resource "gitlab_project_variable" "code_repo_private_key" {
  project       = var.code_repository_project_id
  key           = var.private_key_ci_variable_name
  value         = tls_private_key.deploykey.private_key_openssh
  protected     = false # Adapt to your environment!
  variable_type = "file"
}

This can now be easily reused for each project you manage:

Hcl
module "sso" {
  source                           = "./gitops-deployment-config"
  code_repository_project_id       = 12345
  deployment_repository_project_id = 67890
}

⚠️ Security note: The deployment private key is being saved in plain text in the Terraform state. This means that everyone who has access to your Terraform state is also able to write to the deployment repository. In most cases that's the same people anyways, but this approach might not be feasible for larger organizations with more strict privilege separation.

Split up repositories

At this point, make sure you have your repositories split into two.

If you have already set up Argo CD for your application, I'd recommend now to now to disable auto-sync or remove the application altogether (without removing resources!). This allows you to safely experiment with the deployment process, without endangering a running application.

Update the deployment repository

I've created the following bash script, which takes care of updating the deployment repository from the CI pipeline of the code repository.

It might look bigger than needed, which is mostly caused by adding a ton of validation for easier debugging. This is an important aspect for me, as there is nothing more frustrating than having to to debug CI pipelines and trash your git history because of minor configuration errors.

deployscript.sh
#!/bin/bash
#
# Script to automatically update a deployment repository with the latest container image.
#
# Required env variables:
# - KUSTOMIZE_IMAGE_NAME
# - DEPLOY_REPO_URL
# - DEPLOY_TAG
# - DEPLOY_SSH_PRIVATE_KEY
#
# Optional variables:
# - GITOPS_COMMITTER_NAME
# - GITOPS_COMMITTER_EMAIL
#

set -eo pipefail

# Exit the script with an easy spottable error message
function fail() {
    echo "---------------------------------------------------------------------"
    printf "%s\n" "$*"
    echo "---------------------------------------------------------------------"
    exit 1
}

# Verified that a variable content doesn't contain unresolved variable names
function verify_var() {
    local NAME="$1"
    local VALUE="$2"
    if echo "$2" | grep '\$' >/dev/null; then
        fail "Variable $NAME is not fully resolved ('$VALUE'), check your CI config"
    fi
}

# ----------------{ Validation }----------------

KUSTOMIZE_DIR=${KUSTOMIZE_DIR:-kustomize}
DEPLOY_REPO_BRANCH=${DEPLOY_REPO_BRANCH:-main}

command -v kustomize >/dev/null || fail "Missing kustomize binary from path ($PATH)"

test -n "$KUSTOMIZE_IMAGE_NAME"   || fail "Missing kustomize image name (KUSTOMIZE_IMAGE_NAME)"
test -n "$DEPLOY_REPO_URL"        || fail "Missing deployment repository URL (DEPLOY_REPO_URL)"
test -n "$DEPLOY_TAG"             || fail "Missing variable DEPLOY_TAG"
test -n "$DEPLOY_SSH_PRIVATE_KEY" || fail "Missing variable DEPLOY_SSH_PRIVATE_KEY"

# Some CI systems don't resolve variables recursively. Just an additionaly sanity check.
verify_var "KUSTOMIZE_IMAGE_NAME" "$KUSTOMIZE_IMAGE_NAME"
verify_var "DEPLOY_REPO_URL" "$DEPLOY_REPO_URL"
verify_var "DEPLOY_TAG" "$DEPLOY_TAG"
verify_var "GITOPS_COMMITTER_NAME" "$GITOPS_COMMITTER_NAME"
verify_var "GITOPS_COMMITTER_EMAIL" "$GITOPS_COMMITTER_EMAIL"

echo "[*] Preparing SSH private key"
DEPLOY_SSH_PRIVATE_KEY_FILE=$(mktemp /tmp/gitops-deploy-key.XXXXXX)
function cleanup() {
    rm -f "$DEPLOY_SSH_PRIVATE_KEY_FILE"
}
trap "cleanup" 0 2 3 15

chmod 0600 "$DEPLOY_SSH_PRIVATE_KEY_FILE"
echo "$DEPLOY_SSH_PRIVATE_KEY" > "$DEPLOY_SSH_PRIVATE_KEY_FILE"

if ! grep "PRIVATE KEY" "$DEPLOY_SSH_PRIVATE_KEY_FILE" >/dev/null; then
    fail "Invalid private key (Missing string 'PRIVATE KEY'). Make sure it's a PEM formatted file."
fi

# If we have multiple deploy steps in the same pipeline, the same repo might
# already exists but with an unknown state.
rm -rf "./deployment-repo"

# ----------------{ Clone }----------------

echo "[*] Cloning repo: $DEPLOY_REPO_URL"

export GIT_SSH_COMMAND="ssh -i '$DEPLOY_SSH_PRIVATE_KEY_FILE' -o 'StrictHostKeyChecking no'"
git clone --depth 1 "$DEPLOY_REPO_URL" ./deployment-repo || {
    fail "Failed to clone deployment repository. Make sure the SSH public key has been added to the deployment repository as read-write deploy key."
}

# ----------------{ Update }----------------

pushd ./deployment-repo/ >/dev/null
    test -d "$KUSTOMIZE_DIR" || fail "No such directory: '$KUSTOMIZE_DIR' (from KUSTOMIZE_DIR)"

    # Set the git configuration for this repository
    echo "[*] Configuring git"
    git config user.name "${GITOPS_COMMITTER_NAME:-'GitOps Deployment'}"
    git config user.email "${GITOPS_COMMITTER_EMAIL:-'gitops@example.org'}"

    pushd "$KUSTOMIZE_DIR" >/dev/null
        echo "[*] Setting kustomize image tag to: $DEPLOY_TAG"
        kustomize edit set image "$KUSTOMIZE_IMAGE_NAME=*:$DEPLOY_TAG"

        if git status --porcelain | grep kustomization >/dev/null; then
            echo "[*] Committing changes to local git repository"
            git add kustomization.yaml
            git commit -m "Deployment: Update to version $DEPLOY_TAG"

            echo "[*] Pushing changed deployment repository"
            git push -u origin "$DEPLOY_REPO_BRANCH" || {
                fail "Failed to upload to the origin. Does the deploy key have write permissions?"
            }
        else
            echo "[!] Image name wasn't changed by kustomize, this mostly happens when retrying pipelines. Not doing anything."
            exit 0
        fi
    popd
popd

echo "[*] Finished gitops deployment successfully!"

As this has some tool requirements, I've created a Docker container for it (in a central GitOps repository), that is later being referenced from my code repositories.

Dockerfile
FROM alpine:latest

RUN apk add --update openssh wget git bash

# Install kustomize
RUN wget -q https://github.com/kubernetes-sigs/kustomize/releases/download/kustomize%2Fv4.5.5/kustomize_v4.5.5_linux_amd64.tar.gz \
 && tar xzf kustomize_v4.5.5_linux_amd64.tar.gz \
 && rm kustomize_v4.5.5_linux_amd64.tar.gz \
 && mv ./kustomize /usr/local/bin/kustomize

# Add our tooling and configs
COPY ./deployscript.sh /deployscript.sh

If you want to use Helm instead, adapt the tool download accordingly and patch your values.yaml instead of using kustomize edit in the shell script.

Now, run it from your CI after the container has been build. Example for GitLab follows:

Yaml
deploy:
  stage: deploy
  image: TAG_OF_THE_GITOPS_IMAGE
  resource_group: deploy-repository # Prevents concurrent execution
  script:
    - bash /deployscript.sh
  variables:
    DEPLOY_TAG: $CI_COMMIT_TAG # Adapt to your needs
    KUSTOMIZE_IMAGE_NAME: myapp
    DEPLOY_REPO_URL: git@gitlab.com:YOUR_DEPLOYMENT_REPOSITORY.git

After everything is committed and pushed, hopefully your pipeline looks like this:

None
Successful GitOps pipeline

Conclusion

Implementing this workflow took quite a time, but in the end I'm quite happy I did it. I've been using it personally for multiple of my projects for nearly a year now, and will soon roll out something similar at my workplace.


  1. In simple terms: You define what you want, and it's the responsibility of the system to ensure the final state. How that happens is not important for the developers.