How to add a new cluster to Operate First#

In this guide we will explore how to onboard a cluster to the Operate First community. This document covers a journey from a standalone cluster to cluster which is centrally managed by the Operate First Ops team via GitOps.


  • The cluster which is being onboarded must already exist and run OpenShift.

  • The cluster is imported to the Operate First’s Advanced cluster management (ACM).

If you need help to fulfill either of the prerequisites, please raise an issue in the support repository here.


  • A pull request against the operate-first/apps repository.

The PR enables the Operate First to:

  • Operate First’s ArgoCD can manage the cluster.

  • Operate First’s SSO is used as identity provider by OpenShift.

  • Operate First’s integrated metrics and alerting federation is deployed to the cluster.


All manifests for all the workloads owned by Operate First Ops team are maintained in the operate-first/apps repository following the Kustomize best practices.

The cluster-scope folder in this repo stores all privileged resources that are usually not allowed to be deployed by regular project admin and requires elevated access like cluster-admin role.

If you want to know more about the overall design please consult Operate First’s Architectural Decision Records (ADR) archive.

For each cluster we have a separate overlay in the cluster-scope folder. Clusters are grouped grouped by region. For more information on this topic, see ADR-0009 - Declarative Definitions for Cluster Scoped Resources.


1. Define important variables#

In this guide we will use a couple of facts about the cluster. To make it easier to follow this guide, let’s define these valued beforehand.

import uuid
import json
import os

# User variables
GITHUB_USERNAME = os.getenv("JUPYTERHUB_USER")  # If this notebook is executed within Jupyter Hub on Operate First, you can use the `JUPYTERHUB_USER` variables instead

# Cluster specific variables
CLUSTER_NAME = "my-cluster"
CLUSTER_DESCRIPTION = "Description of cluster"
CLUSTER_ADMINS_LST = [GITHUB_USERNAME,] # list of LOWERCASE github usernames of the cluster admins

CLUSTER_ADMINS=json.dumps([u.lower() for u in CLUSTER_ADMINS_LST]).replace("\"", "\\\"")

2. Fork and clone the apps repository#

Please fork/clone the operate-first/apps repository. We’ll be working within this repository only.

  1. Go to operate-first/apps.

  2. Click on a fork button.

  3. When a fork is created click on the code button and copy an address of your forked repository.

  4. Run following command using copied address:

!git clone{GITHUB_USERNAME}/apps.git
%cd apps
Cloning into 'apps'...
remote: Enumerating objects: 12876, done.
remote: Counting objects: 100% (1567/1567), done.
remote: Compressing objects: 100% (852/852), done.
remote: Total 12876 (delta 701), reused 1449 (delta 633), pack-reused 11309
Receiving objects: 100% (12876/12876), 2.83 MiB | 4.71 MiB/s, done.
Resolving deltas: 100% (6290/6290), done.

3. Enable ArgoCD management in ACM#

The onboarded cluster is already being managed by ACM. Since Operate First manages its applications through ArgoCD and ACM can integrate with ArgoCD, we will use that. In the next cell we will let ACM setup a connection to the new cluster from our ArgoCD instance. Since ACM 2.3 this is achieved by declaring the cluster to be managed via a ArgoCD-enabled ClusterSet.

kind: ManagedCluster
  name: %s
  labels: argocd-managed

%store text_input >acm/overlays/moc/infra/managedclusters/{CLUSTER_NAME}.yaml

!cd acm/overlays/moc/infra/managedclusters && kustomize edit add resource {CLUSTER_NAME}.yaml
Writing 'text_input' (str) to file 'acm/overlays/moc/infra/managedclusters/demo.yaml'.

4. Enable SSO login#

Next on the list of tasks that need to happen is to enable SSO for this cluster. Operate First SSO provides users a unified and seamless experience when accessing the cluster. To enable it, we need to setup 2 things:

  1. We need to inform the SSO server - a Keycloak instance, that this new cluster exists and that it is indeed a valid client.

  2. Configure cluster’s oauth controller to query our SSO for user identity.

1. Configure SSO server#

The cell below will create a Keycloak client definition for the new cluster. SSO server is managed via keycloak folder in this repo, hence this cell creates a file at keycloak/overlays/moc/infra/clients/$CLUSTER_NAME.yaml and then encrypts it with sops. You can find the key to import from here:

The KeycloakClient resource makes our SSO aware of the cluster’s presence - it configures and enables the cluster to a client to the SSO.

!gpg --keyserver --recv 0508677DD04952D06A943D5B4DC4116D360E3276

kind: KeycloakClient
    name: %s
        client: %s
        clientId: %s
            - profile
        description: %s
        name: %s cluster
        protocol: openid-connect
        secret: %s
        standardFlowEnabled: true
            realm: operate-first


%store text_input >keycloak/overlays/moc/infra/clients/{CLUSTER_NAME}.yaml

!sops --encrypt --encrypted-regex="^secret$" --pgp="0508677DD04952D06A943D5B4DC4116D360E3276" keycloak/overlays/moc/infra/clients/{CLUSTER_NAME}.yaml >keycloak/overlays/moc/infra/clients/{CLUSTER_NAME}.enc.yaml

!rm keycloak/overlays/moc/infra/clients/{CLUSTER_NAME}.yaml

!yq e -i ".files += [\"clients/{CLUSTER_NAME}.enc.yaml\"]" keycloak/overlays/moc/infra/secret-generator.yaml
gpg: key 4DC4116D360E3276: "Operate-First <>" not changed
gpg: Total number processed: 1
gpg:              unchanged: 1
Writing 'text_input' (str) to file 'keycloak/overlays/moc/infra/clients/demo.yaml'.
[PGP]	 WARN[0000] Deprecation Warning: GPG key fetching from a keyserver within sops will be removed in a future version of sops. See for more information. 

2. Configure SSO as identity provider for the cluster#

Now we need to configure the cluster’s OAuth controller so it uses Opereate First SSO as an identity provider.

Below we will create a operate-first-sso-secret secret resource and encrypt it with sops. This secret contains cluster’s SSO credentials which matches the SSO server configuration above.

Then we reference this secret in the OAuth configuration for OpenShift. The OAuth resource defines identity providers available to users when authenticating to the cluster.

!mkdir -p cluster-scope/overlays/prod/{CLUSTER_REGION}/{CLUSTER_NAME}/oauths/

apiVersion: v1
kind: Secret
    name: operate-first-sso-secret
    namespace: openshift-config
    annotations: IgnoreExtraneous Prune=false
type: Opaque
    clientSecret: %s
""" % (UUID)

%store text_input >cluster-scope/overlays/prod/{CLUSTER_REGION}/{CLUSTER_NAME}/oauths/operate-first-sso-secret.yaml

!sops --encrypt --encrypted-regex="^(data|stringData)$" --pgp="0508677DD04952D06A943D5B4DC4116D360E3276" cluster-scope/overlays/prod/{CLUSTER_REGION}/{CLUSTER_NAME}/oauths/operate-first-sso-secret.yaml >cluster-scope/overlays/prod/{CLUSTER_REGION}/{CLUSTER_NAME}/oauths/operate-first-sso-secret.enc.yaml

!rm cluster-scope/overlays/prod/{CLUSTER_REGION}/{CLUSTER_NAME}/oauths/operate-first-sso-secret.yaml
Writing 'text_input' (str) to file 'cluster-scope/overlays/prod/emea/demo/oauths/operate-first-sso-secret.yaml'.
[PGP]	 WARN[0000] Deprecation Warning: GPG key fetching from a keyserver within sops will be removed in a future version of sops. See for more information. 
kind: ksops
  name: secret-generator
  - oauths/operate-first-sso-secret.enc.yaml

%store text_input >cluster-scope/overlays/prod/{CLUSTER_REGION}/{CLUSTER_NAME}/secret-generator.yaml
Writing 'text_input' (str) to file 'cluster-scope/overlays/prod/emea/demo/secret-generator.yaml'.
kind: OAuth
  name: cluster
    - mappingMethod: claim
      name: operate-first
            - email
            - name
            - preferred_username
        clientID: %s
          name: operate-first-sso-secret
        extraScopes: []
      type: OpenID

%store text_input >cluster-scope/overlays/prod/{CLUSTER_REGION}/{CLUSTER_NAME}/oauths/cluster_patch.yaml
Writing 'text_input' (str) to file 'cluster-scope/overlays/prod/emea/demo/oauths/cluster_patch.yaml'.

5. Create a cluster admins group#

Now we can assume we have Operate First SSO enabled on the cluster, hence we can start using GitHub accounts as user names on the cluster. Let’s use that to declare cluster admins for this particular cluster. While cluster admins have full access to the cluster, all changes should always be done via GitOps. In general, we think of cluster admins as an emergency break in case we need to investigate or act quickly.

Please be advised that Keycloak converts all usernames to lowercase and OpenShift RBAC is case sensitive, hence we convert all GitHub usernames to lowercase and reference them in OpenShift as such.

By executing the following cell you will create a file cluster-scope/overlays/prod/$CLUSTER_REGION/$CLUSTER_NAME/groups/cluster-admins.yaml.

!mkdir -p cluster-scope/overlays/prod/{CLUSTER_REGION}/{CLUSTER_NAME}/groups
kind: Group
    name: cluster-admins
%store input_text  >cluster-scope/overlays/prod/{CLUSTER_REGION}/{CLUSTER_NAME}/groups/cluster-admins.yaml
!yq e -i ".users = {CLUSTER_ADMINS}" cluster-scope/overlays/prod/{CLUSTER_REGION}/{CLUSTER_NAME}/groups/cluster-admins.yaml
Writing 'input_text' (str) to file 'cluster-scope/overlays/prod/emea/demo/groups/cluster-admins.yaml'.

6. Stitch things together via Kustomize#

Now we have many different isolated bits and pieces of configuration defined and modified for our cluster. In order to apply those changes we need to render all those manifests together and instruct ArgoCD to deploy it. First things first, let’s combine those manifests now.

We use Kustomize to compose manifests. This tool requires a kustomization.yaml file as the base manifest. This file instructs Kustomize which resource files to pull and how to overlay and render them together. In this particular case it serves us as the single source of truth for what gets configured on each cluster when it comes to the privileged resources.

The following cell will create a kustomization.yaml file in cluster-scope/overlays/prod/$CLUSTER_REGION/$CLUSTER_NAME and bootstraps it with:

  1. All the common configuration specific to given region - this is specified in ../common folder.

  2. Adds OAuth configuration to enable SSO.

  3. Patches the cluster-admins user group replacing the users object with those users we’ve specified via the variable above.

kind: Kustomization

  - ../common
  - ../../../../base/
  - ../../../../base/

  - groups/cluster-admins.yaml
  - oauths/cluster_patch.yaml

  - secret-generator.yaml

%store text_input  >cluster-scope/overlays/prod/{CLUSTER_REGION}/{CLUSTER_NAME}/kustomization.yaml
Writing 'text_input' (str) to file 'cluster-scope/overlays/prod/emea/demo/kustomization.yaml'.

7. Enable monitoring and alerting#

We follow the recommended practices by OpenShift upstream, that means we support User Workload Monitoring on our clusters. And since Operate First is a community cloud, we are aiming to be transparent about alerts fired by the cluster itself. In this step we will enable User Workload Monitoring, then we’ll bring in an alert receiver for GitHub that funnels alerts from the cluster and files them as GitHub Issues.

1. Enable User Workload Monitoring#

User Workload Monitoring can be enabled via a simple configuration change in the cluster-monitoring-config ConfigMap in the openshift-monitoring namespace. Since we apply this change to most of the clusters, we host it in the cluster-scope/base. To apply this change to the new cluster, all we have to do is to pull the resource in to the overlay we created in previous step.

!cd cluster-scope/overlays/prod/{CLUSTER_REGION}/{CLUSTER_NAME} && kustomize edit add resource ../../../../base/core/configmaps/cluster-monitoring-config

2. Deploy alert receiver for Github#

Alert receiver configuration is also well known and already defined. However since this is a standalone application, we need a separate overlay for it in folder which belongs to this application - alertreceiver folder. In the next cell we will create a new overlay in there (patching the static labels assigned to each alert originating from this cluster). Then we’ll update the cluster-scope overlay we’ve created in the step 6. Stitch things together via Kustomize of this guide by requesting a namespace for the alert receiver to be created.

!mkdir -p alertreceiver/overlays/{CLUSTER_REGION}/{CLUSTER_NAME}

kind: Kustomization

  - ../../common

  - patch: |
      - op: replace
        path: /spec/template/spec/containers/0/args/0
        value: --label=environment/%s/%s
      group: apps
      kind: Deployment
      name: github-receiver
      version: v1

%store text_input  >alertreceiver/overlays/{CLUSTER_REGION}/{CLUSTER_NAME}/kustomization.yaml

!cd cluster-scope/overlays/prod/{CLUSTER_REGION}/{CLUSTER_NAME} && kustomize edit add resource ../../../../base/core/namespaces/opf-alertreceiver
Writing 'text_input' (str) to file 'alertreceiver/overlays/emea/demo/kustomization.yaml'.

8. Create ArgoCD apps for this cluster#

At this point we have created, modified and updated all necessary manifests that are needed for a cluster to be properly managed. The remaining step now is to make ArgoCD aware that those manifests exists and how it can deploy and monitor them for us.

In this step we will create:

  1. An “App-of-apps” application which deploys other (future) applications to this cluster.

  2. An application which deploys the cluster management related manifests (from the cluster-scope folder).

  3. An application which deploys the alertreceiver (from the alertreceiver folder).

  4. We will enable this cluster to be targeted by ArgoCD project for Operate First management applications.

1. Create the App-of-apps#

First we will create the app-of-apps application for this cluster. It’s an application which points to other application manifests. This pattern allows us to automate deployment of future ArgoCD applications that we will want to deploy to this cluster.

Since we assume ArgoCD to be yet another regular application we host manifests which helps to configure it in the argocd folder. We host a single instance of it, so we’ll be working within the argocd/overlays/moc-infra/

In this step we will create an Application resource which points to argocd/overlays/moc-infra/applications/envs/$CLUSTER_REGION/$CLUSTER_NAME. That is exactly where we will keep all other Application resources for this cluster. Once the app-of-apps resource manifest is created, we’ll add it to the Kustomization at argocd/overlays/moc-infra/applications/app-of-apps/kustomization.yaml.

kind: Application
  name: opf-app-of-apps-%s
    namespace: argocd
    name: moc-infra
  project: operate-first
    path: argocd/overlays/moc-infra/applications/envs/%s/%s
    targetRevision: HEAD
      prune: true
      selfHeal: true
    - Validate=false
    - ApplyOutOfSyncOnly=true

%store text_input >argocd/overlays/moc-infra/applications/app-of-apps/app-of-apps-{CLUSTER_NAME}.yaml

!cd argocd/overlays/moc-infra/applications/app-of-apps && kustomize edit add resource app-of-apps-{CLUSTER_NAME}.yaml
Writing 'text_input' (str) to file 'argocd/overlays/moc-infra/applications/app-of-apps/app-of-apps-demo.yaml'.

As you can see we’ve pointed ArgoCD to a folder within argocd/overlays/moc-infra/applications/envs which does not exist yet. Now is the time to create it.

!mkdir  -p argocd/overlays/moc-infra/applications/envs/{CLUSTER_REGION}/{CLUSTER_NAME}

!cd argocd/overlays/moc-infra/applications/envs/{CLUSTER_REGION}/{CLUSTER_NAME} && kustomize init --namespace argocd --namesuffix -{CLUSTER_NAME}

2. Application for privileged resource#

Now let’s add our first application to this folder. This application should source the overlay in the cluster-scope folder, which we’ve created at 6. Stitch things together via Kustomize step.

!mkdir -p argocd/overlays/moc-infra/applications/envs/{CLUSTER_REGION}/{CLUSTER_NAME}/cluster-management

kind: Application
  name: cluster-resources
    name: %s
    namespace: open-cluster-management-agent
    - group:
        - /spec/defaultRoute
        - /spec/httpSecret
        - /spec/proxy
        - /spec/requests
        - /spec/rolloutStrategy
      kind: Config
      name: cluster
  project: cluster-management
    path: cluster-scope/overlays/prod/%s/%s
    targetRevision: HEAD
      prune: true
      selfHeal: true
    - Validate=false
    - ApplyOutOfSyncOnly=true

%store text_input >argocd/overlays/moc-infra/applications/envs/{CLUSTER_REGION}/{CLUSTER_NAME}/cluster-management/cluster-resources.yaml

!cd argocd/overlays/moc-infra/applications/envs/{CLUSTER_REGION}/{CLUSTER_NAME}/cluster-management && kustomize init --resources cluster-resources.yaml

!cd argocd/overlays/moc-infra/applications/envs/{CLUSTER_REGION}/{CLUSTER_NAME} && kustomize edit add resource cluster-management
Writing 'text_input' (str) to file 'argocd/overlays/moc-infra/applications/envs/emea/demo/cluster-management/cluster-resources.yaml'.
Error: kustomization file already exists
  kustomize create [flags]

  create, init


	# Create a new overlay from the base '../base".
	kustomize create --resources ../base

	# Create a new kustomization detecting resources in the current directory.
	kustomize create --autodetect

	# Create a new kustomization with multiple resources and fields set.
	kustomize create --resources deployment.yaml,service.yaml,../base --namespace staging --nameprefix acme-

      --annotations string   Add one or more common annotations.
      --autodetect           Search for kubernetes resources in the current directory to be added to the kustomization file.
  -h, --help                 help for create
      --labels string        Add one or more common labels.
      --nameprefix string    Sets the value of the namePrefix field in the kustomization file.
      --namespace string     Set the value of the namespace field in the customization file.
      --namesuffix string    Sets the value of the nameSuffix field in the kustomization file.
      --recursive            Enable recursive directory searching for resource auto-detection.
      --resources string     Name of a file containing a file to add to the kustomization file.

Global Flags:
      --stack-trace   print a stack-trace on error

2021/09/30 18:08:50 resource cluster-management already in kustomization file

3. Application for alert receiver#

Next up is the alert receiver. We need to create an Application resource for it as well. It will point to the alertreceiver overlay we’ve created in the step 7. Enable monitoring and alerting.

text_input = """\
kind: Application
  name: alertreceiver
    name: %s
    namespace: opf-alertreceiver
  project: cluster-management
    path: alertreceiver/overlays/%s/%s
    targetRevision: HEAD
      prune: true
      selfHeal: true
      - Validate=false

%store text_input >argocd/overlays/moc-infra/applications/envs/{CLUSTER_REGION}/{CLUSTER_NAME}/cluster-management/alertreceiver.yaml

!cd argocd/overlays/moc-infra/applications/envs/{CLUSTER_REGION}/{CLUSTER_NAME}/cluster-management && kustomize edit add resource alertreceiver.yaml
Writing 'text_input' (str) to file 'argocd/overlays/moc-infra/applications/envs/emea/demo/cluster-management/alertreceiver.yaml'.


Please stage your changes and send them as a PR against the operate-first/apps repository.


Make sure that following files/ have been modified/added:

  • Modified acm/overlays/moc/infra/managedclusters/kustomization.yaml

  • Added acm/overlays/moc/infra/managedclusters/$CLUSTER_NAME.yaml

  • Added alertreceiver/overlays/moc/$CLUSTER_NAME/kustomization.yaml

  • Added argocd/overlays/moc-infra/applications/app-of-apps/app-of-apps-$CLUSTER_NAME.yaml

  • Modified argocd/overlays/moc-infra/applications/app-of-apps/kustomization.yaml

  • Added argocd/overlays/moc-infra/applications/envs/$CLUSTER_REGION/$CLUSTER_NAME/cluster-management/alertreceiver.yaml

  • Added argocd/overlays/moc-infra/applications/envs/$CLUSTER_REGION/$CLUSTER_NAME/cluster-management/cluster-resources.yaml

  • Added argocd/overlays/moc-infra/applications/envs/$CLUSTER_REGION/$CLUSTER_NAME/cluster-management/kustomization.yaml

  • Added argocd/overlays/moc-infra/applications/envs/$CLUSTER_REGION/$CLUSTER_NAME/kustomization.yaml

  • Added cluster-scope/overlays/prod/$CLUSTER_REGION/$CLUSTER_NAME/groups/cluster-admins.yaml

  • Added cluster-scope/overlays/prod/$CLUSTER_REGION/$CLUSTER_NAME/kustomization.yaml

  • Added cluster-scope/overlays/prod/$CLUSTER_REGION/$CLUSTER_NAME/oauths/cluster_patch.yaml

  • Added cluster-scope/overlays/prod/$CLUSTER_REGION/$CLUSTER_NAME/oauths/operate-first-sso-secret.enc.yaml

  • Added cluster-scope/overlays/prod/$CLUSTER_REGION/$CLUSTER_NAME/secret-generator.yaml

  • Added keycloak/overlays/moc/infra/clients/$CLUSTER_NAME.enc.yaml

  • Modified keycloak/overlays/moc/infra/secret-generator.yaml

!git status
!git add .
!git commit -m "feat(onboarding): Add cluster {CLUSTER_NAME}"
!git push
On branch master
Your branch is behind 'origin/master' by 1 commit, and can be fast-forwarded.
  (use "git pull" to update your local branch)

Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git restore <file>..." to discard changes in working directory)
	modified:   acm/overlays/moc/infra/managedclusters/kustomization.yaml
	modified:   argocd/overlays/moc-infra/applications/app-of-apps/kustomization.yaml
	modified:   keycloak/overlays/moc/infra/secret-generator.yaml

Untracked files:
  (use "git add <file>..." to include in what will be committed)

no changes added to commit (use "git add" and/or "git commit -a")
[master 58d7129] feat(onboarding): Add cluster demo
 16 files changed, 252 insertions(+)
 create mode 100644 acm/overlays/moc/infra/managedclusters/demo.yaml
 create mode 100644 alertreceiver/overlays/emea/demo/kustomization.yaml
 create mode 100644 argocd/overlays/moc-infra/applications/app-of-apps/app-of-apps-demo.yaml
 create mode 100644 argocd/overlays/moc-infra/applications/envs/emea/demo/cluster-management/alertreceiver.yaml
 create mode 100644 argocd/overlays/moc-infra/applications/envs/emea/demo/cluster-management/cluster-resources.yaml
 create mode 100644 argocd/overlays/moc-infra/applications/envs/emea/demo/cluster-management/kustomization.yaml
 create mode 100644 argocd/overlays/moc-infra/applications/envs/emea/demo/kustomization.yaml
 create mode 100644 cluster-scope/overlays/prod/emea/demo/groups/cluster-admins.yaml
 create mode 100644 cluster-scope/overlays/prod/emea/demo/kustomization.yaml
 create mode 100644 cluster-scope/overlays/prod/emea/demo/oauths/cluster_patch.yaml
 create mode 100644 cluster-scope/overlays/prod/emea/demo/oauths/operate-first-sso-secret.enc.yaml
 create mode 100644 cluster-scope/overlays/prod/emea/demo/secret-generator.yaml
 create mode 100644 keycloak/overlays/moc/infra/clients/demo.enc.yaml
qt.qpa.xcb: QXcbConnection: XCB error: 3 (BadWindow), sequence: 813, resource id: 15545424, major code: 40 (TranslateCoords), minor code: 0
Enumerating objects: 73, done.
Counting objects: 100% (73/73), done.
Delta compression using up to 8 threads
Compressing objects: 100% (42/42), done.
Writing objects: 100% (48/48), 8.44 KiB | 2.11 MiB/s, done.
Total 48 (delta 14), reused 1 (delta 0), pack-reused 0
remote: Resolving deltas: 100% (14/14), completed with 11 local objects.
   de2f47a..58d7129  master -> master