fbpx

By: Joseph Villarreal

Argo Rollouts Operator: A Post Admision Mutating Hook

I recently worked with a client who wanted more powerful canary testing for their already existing Kubernetes-native applications with a minimal amount of modifications to the original source code. The move to Argo Rollouts seemed right, so we decided to take an innovative approach to avoid fully migrating Deployments into Rollout object types in the code base. This would give us a huge time saver and ensure no previous work gets thrown out the window. Users can simply use a tag set to take care of everything — migration, creation, and maintenance of the object directly in the Kubernetes environment. Here’s how we did it…

What Are Argo Rollouts?

Argo Rollouts is a controller that seeks to extend the native capabilities of Kubernetes with more advanced application installation techniques and delivery options. These can include blue-green deployments, canary analysis, experiments and progressive delivery.

The rationale behind Argo Rollouts is the fact that Kubernetes natively offers a set of basic delivery strategies with a set of guarantees like verifications during the update and health checks; however, these strategies have some limitations… There is little control over delivery speed, inability to alter traffic between versions, no built-in ability for stress testing or application tracking, inability to integrate testing against external services, and maybe even more importantly — you cannot stop a delivery in progress nor can it be automated to abort the delivery of an application once it has started among some other points.

This is where Argo Rollouts help. They can extend that set of strategies by including:

  • Blue-Green deployment strategy
  • Canary deployment strategy
  • Ability to promote application traffic in a tiered and very grainy manner
  • Automated promotions and regressions based on testing and experiments
  • Manual bridges for promoting an application
  • Personalized and customizable determination of the metrics against which the KPI business analysis will be done

How Do Argo Rollouts Work?

The Rollout Object declared by this application works in a totally analogous way to the native Kubernetes Deployment object. This “Rollout” object is in charge of creating, scaling and deleting the ReplicaSets.

An Argo Rollouts declaration is almost exactly the same as a Deployment definition with three modifications:

  • apiVersion: Argo uses the CRD extended API argoproj.io/v1alpha1
  • Kind: The declared object type changes from deployment to Rollout
  • Spec.strategy: the definition of the strategy to use, the Rollouts Custom Resource Definitions adds canary and blueGreen to the already existing Recreate and Rolling Update strategies
DeploymentRollout
apiVersion: apps/v1
kind: Deployment
metadata:
 name: rollouts-demo
spec:
 replicas: 5
 Strategy: {} 
 revisionHistoryLimit: 10
 Selector: 
   matchLabels:
     app: rollouts-demo
 template:
   metadata:
     labels:
       app: rollouts-demo
   spec:
     containers:
     - name: rollouts-demo
       image: nginx
       ports:
       - name: http
         containerPort: 8080
         protocol: TCP
       resources:
         requests:
           memory: 32Mi
           cpu: 5m
apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
 name: rollouts-demo
spec:
 replicas: 5
 strategy: {}
 revisionHistoryLimit: 10
 selector:
   matchLabels:
     app: rollouts-demo
 template:
   metadata:
     labels:
       app: rollouts-demo
   spec:
     containers:
     - name: rollouts-demo
       image: nginx
       ports:
       - name: http
         containerPort: 8080
         protocol: TCP
       resources:
         requests:
           memory: 32Mi
           cpu: 5m

spec.strategy possible values are RollingUpdate or Recreate, for example:

  strategy: 
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 0

spec.strategy possible values are BlueGreen or Canary, for example:

strategy:
    canary: 
      maxSurge: "25%"
      maxUnavailable: 0
      steps:
      - setWeight: 10
      - setWeight: 20
      - pause: {} 

As can be seen in this comparison, the translation between one type of object and another is quite simple. Two of the modifications are a simple line change, the greater complexity of the migration is within the definition of the operator’s strategy since the Rollout includes two new types of strategies, the required specification should be declared in this section as a compound object with several sub-sections which include steps to carry out the analysis and the acceptance criteria.

For more information on the details of this definition refer to:

  • https://argoproj.github.io/argo-rollouts/features/bluegreen/
  • https://argoproj.github.io/argo-rollouts/features/canary/

What is the Rollout Operator?

The Rollout operator is a customized tool built using open source components to narrow the gap between the native Kubernetes deployment strategies and what Argo Rollouts offers. 

The operator is installed in your Kubernetes cluster as a Helm chart; it seeks to facilitate and fully automate the translation between your native Deployment manifests into a Rollout object in a set of steps that run inside your Cluster as a Workflow.

The goal is that developers who have already invested time and effort declaring their resources don’t have to worry about migrating between object types, which involves converting an existing Deployment resource, to a Rollout resource with a defined set of steps

The main benefit is convenience, this approach allows developers to avoid migrating object types. This is a huge time saver and ensures no previous work gets thrown out the window. They can simply use a tag set to take care of everything — migration, creation, and maintenance of the object in the Kubernetes API.

Besides the convenience from automation, this approach eases other considerations. For example, when migrating a Deployment that is already serving live production traffic, a Rollout should run next to the Deployment before deleting the Deployment. Not following this approach might result in downtime. The Rollout Operator avoids this scenario as It allows for the Rollout to be tested and start intercepting traffic instead of serving the original Deployment.

How Does the Operator Work?

The tool consists of a Kubernetes Operator implementation through Argo Events and Argo Workflows.

One difference from the traditional operator approach is that this operator does not run as an Admission Controller Mutating Webhook, meaning, the transformation of the Object happens after it already has been accepted by the Kubernetes Admission Controller and not prior like in the standard approach. Instead, using a Kubernetes API resource-event as the source, the operator leverages a Workflow’s functionality to translate and create Rollouts from an already existing Kubernetes Deployment definition; it is in that sense, the operator acts as a “Post Admission Mutating Webhook”.

Operator components

The operator has four components:

  • An EventSource
  • A Service Bus
  • A sensor with a workflow trigger
  • A workflow definition

EventSource

The first component consists of an “alert message generator” or EventSource. This is a CRD object that will locate any deployment resource through a set of preselected tags in a specific namespace. The Eventsource will target deployments with the label rollout: “true” in a particular namespace

---
apiVersion: argoproj.io/v1alpha1
kind: EventSource
metadata:
  name: deployment-rollout
  namespace: {{ .Values.namespace }} 
spec:
  template:
    serviceAccountName: rolloutoperator
  resource:
    deploymentDelta:
      namespace: {{ .Values.namespace }} 
      group: apps
      version: v1
      resource: deployments
      eventTypes:
        - ADD
        - UPDATE
        - DELETE
      filter:
        afterStart: true
        labels:
          - key: rollout
            operation: "=="
            value: "true"
...

Notice how the namespace definition uses a gotpl function {{ .Values. }}? This placeholder will be replaced by helm at installation time with a customer supplied value

Once the source Object is identified, a follow-up of any event from this deployment is generated. Any event such as the creation, modification or deletion of this deployment will create an event and a message will be sent to a messaging bus. This message will include the yaml declaration of the source object with additional information on the context in which the event occurred, like time, type of event and other relevant metadata.

In essence, the EventSource transforms Kubernetes API events into cloudevents and dispatches them over to the Eventbus component, the structure of an event dispatched by the EventSource over to the Eventbus looks like this:

    {
        "context": {
          "type": "type_of_event_source",
          "specversion": "cloud_events_version",
          "source": "name_of_the_event_source",
          "id": "unique_event_id",
          "time": "event_time",
          "datacontenttype": "type_of_data",
          "subject": "name_of_the_configuration_within_event_source"
        },
        "data": {
          "type": "type_of_the_event", // ADD, UPDATE or DELETE
          "body": "resource_body", // JSON format
          "group": "resource_group_name",
          "version": "resource_version_name",
          "resource": "resource_name"
        }
    }

For our operator the most valuable information comes from data.type and data.body, since this is the information required for our transformations from deployments into rollouts

EventBus

The next component is the Messaging Bus, a CRD, The current Argo implementation of the Eventbus is powered by a NATS streaming service.

This message queue is responsible for retaining the messages that have been sent to it until they are read by a subscriber. The subscriber’s objective is to allow the executors to carry out their tasks without there being the possibility for messages being lost, since the bus creates a landing space for messages on a temporary buffer, the bus becomes the transport layer of the Operator.

---
apiVersion: argoproj.io/v1alpha1
kind: EventBus
metadata:
  name: default
  namespace: {{ .Values.namespace }} 
spec:
  nats:
    native:
      replicas: 3
      auth: token
...

Sensor

Finally, we have the execution component known as a Sensor. When invoked, the Sensor will carry out our specific actions, in this particular case, it will be in charge of carrying out the translation of the native delivery object to our superior delivery object with analysis, by calling the execution of an Argo Workflow.

This invocation will be carried out by subscribing the sensor to the messaging bus and the sensor will continuously wait for new messages. Once identified, it will pass the information of the received object that includes, as we saw previously, the Kubernetes definition of the source Object and some important metadata.

---
apiVersion: argoproj.io/v1alpha1
kind: Sensor
metadata:
  name: resource
  namespace: {{ .Values.namespace }} 
spec:
  template:
    serviceAccountName: rolloutoperator
  dependencies:
    - name: deployment-event
      eventSourceName: deployment-rollout
      eventName: deploymentDelta
  triggers:
    - template:
        name: argo-workflow
        k8s:
          group: argoproj.io
          version: v1alpha1
          resource: workflows
          operation: create
          source:
            resource:
              apiVersion: argoproj.io/v1alpha1
              kind: Workflow
              metadata:
                generateName: "rollout-operator-"
                namespace: {{ .Values.worflows.namespace }}
              spec:
                arguments:
                  parameters:
                    - name: "body"
                      vale: "{}"
                    - name: "strategyConfigMap"
                      vale: "strategy-store"
                    - name: "strategyConfigMapKey"
                      vale: "rollingUpdate"
                    - name: "eventType"
                      vale: "UPDATE"
                workflowTemplateRef:
                  name: rollout-operator-submittable
          parameters:
            - src:
                dependencyName: deployment-event
                dataKey: body
              dest: spec.arguments.parameters.0.value
            - src:
                dependencyName: deployment-event
                dataKey: body.metadata.labels.strategyConfigMap 
                value: strategy-store
              dest: spec.arguments.parameters.1.value
            - src:
                dependencyName: deployment-event
                dataKey: body.metadata.labels.strategyConfigMapKey
                value: rollingUpdate
              dest: spec.arguments.parameters.2.value
            - src:
                dependencyName: deployment-event
                dataKey: type
              dest: spec.arguments.parameters.3.value
...

With this information, a sequence of steps is executed, these steps carry out the translation of the object in a controlled and validated sequential way. Once the result Object is obtained, it is applied against the Kubernetes Application Interface to create our Rollout.

Workflow and Cluster Workflow Template

The following code shows the code involved in the transformation using the Workflow:

---
apiVersion: "argoproj.io/v1alpha1"
kind: "ClusterWorkflowTemplate"
metadata:
  name: "rollout-operator-submittable"
  generateName: "rollout-operator-"
spec:
  entrypoint: "init"
  arguments:
    parameters:
      - name: "body"
        vale: "{}"
      - name: "strategyConfigMap"
        vale: ""
      - name: "strategyConfigMapKey"
        vale: ""
      - name: "eventType"
        vale: ""
  imagePullSecrets:
    - name: "image-pull-secret"
  serviceAccountName: rolloutoperator
  templates:
    - name: "init"
      steps:
        - - name: "json-parse"
            template: "json-parse"
            arguments:
              parameters:
                - name: "body"
                  value: "{{workflow.parameters.body}}"
                - name: "strategyConfigMap"
                  value: "{{workflow.parameters.strategyConfigMap}}"
                - name: "strategyConfigMapKey"
                  value: "{{workflow.parameters.strategyConfigMapKey}}"
        - - name: "kubectl-apply"
            template: "kubectl-apply"
            when: "{{workflow.parameters.eventType}} != DELETE"
            arguments:
              artifacts: 
                - name: "object"
                  from: "{{steps.json-parse.outputs.artifacts.object}}"
              parameters:
                - name: "obj"
                  value: "{{steps.json-parse.outputs.parameters.obj}}"
        - - name: "kubectl-delete"
            template: "kubectl-delete"
            when: "{{workflow.parameters.eventType}} == DELETE"
            arguments:
              artifacts: 
                - name: "object"
                  from: "{{steps.json-parse.outputs.artifacts.object}}"
              parameters:
                - name: "obj"
                  value: "{{steps.json-parse.outputs.parameters.obj}}"
    - name: "json-parse"
      inputs:
        parameters:
          - name: "body"
          - name: "strategyConfigMap"
          - name: "strategyConfigMapKey"
      script:
        image: registry.gitlab.com/mwpcicd/platform/node:15.5.1-alpine3.10
        resources:
          requests:
            memory: "100Mi"
            cpu: "200m"
        env:
          - name: STRATEGY
            valueFrom:
              configMapKeyRef:
                name: "{{inputs.parameters.strategyConfigMap}}"
                key: "{{inputs.parameters.strategyConfigMapKey}}"
        command: ["node"]
        # volumeMounts:
        #   - name: config-volume
        #     mountPath: /etc/config
        source: |
          // read
          const fs = require('fs');
          //let rawdata = fs.readFileSync('/etc/config/strg1');
          //let strategy = JSON.parse((rawdata));
          const obj = {{inputs.parameters.body}};
          const strategy = JSON.parse(process.env.STRATEGY);
          // cleanup
          delete obj.metadata.annotations;
          delete obj.metadata.creationTimestamp;
          delete obj.metadata.generation;
          delete obj.metadata.managedFields;
          delete obj.metadata.resourceVersion;
          delete obj.metadata.selfLink;
          delete obj.metadata.uid;
          delete obj.status;
          delete obj.metadata.labels.rollout;
          delete obj.metadata.labels.strategy;
          delete obj.metadata.labels.strategyConfigMap;
          delete obj.metadata.labels.strategyConfigMapKey;
          obj.spec.strategy = {};
          // mod
          obj.apiVersion = "argoproj.io/v1alpha1";
          obj.kind = "Rollout";
          replicas = obj.metadata.labels?.rolloutTargetreplicas;
          if (replicas){
            obj.spec.replicas = replicas * 1;
          }else{
            obj.spec.replicas = 2;
          }
          obj.spec.strategy = strategy;
          //save
          fs.writeFile('/tmp/object.json', JSON.stringify(obj), function (err) {
            if (err) return console.log(err);
            console.log(obj);
          });
      outputs:
        artifacts:
          - name: "object"
            path: "/tmp/object.json"
        parameters:
          - name: "obj"
            valueFrom:
              path: "/tmp/object.json"
    - name: "kubectl-apply"
      inputs:
        artifacts:
          - name: "object"
            path: /tmp/object.json
        parameters:
          - name: "obj"
      script:
        image: "registry.gitlab.com/mwpcicd/platform/kubectl"
        resources:
          requests:
            memory: "100Mi"
            cpu: "200m"
        command: ["sh"]
        source: |
          kubectl apply -f /tmp/object.json
          kubectl delete workflow -l workflows.argoproj.io/completed=true,workflows.argoproj.io/phase=Succeeded,events.argoproj.io/sensor=resource && kubectl delete pod -l workflows.argoproj.io/completed=true
    - name: "kubectl-delete"
      inputs:
        artifacts:
          - name: "object"
            path: /tmp/object.json
        parameters:
          - name: "obj"
      script:
        image: "registry.gitlab.com/mwpcicd/platform/kubectl"
        resources:
          requests:
            memory: "100Mi"
            cpu: "200m"
        command: ["sh"]
        source: |
          kubectl delete -f /tmp/object.json
          kubectl delete workflow -l workflows.argoproj.io/completed=true,workflows.argoproj.io/phase=Succeeded,events.argoproj.io/sensor=resource && kubectl delete pod -l workflows.argoproj.io/completed=true
...

Basic Use

The basic use for this Operator consists of adding a set of labels to an existing Deployment, that will allow the Operator to identify it as a transformation target Object. This means an Object that must be translated and recreated in the Cluster as a Rollout type. These labels belong to a pre-established set whose values ​​can be modified easily.

labels: 
  rollout: "true" 
  rolloutTargetreplicas: "6" 
  strategyConfigMap: "strategy-store" 
  strategyConfigMapKey: "inlineCanary"

Customization

The default operator includes some analysis templates that are customizable in real-time, plus it offers the ability to modify the steps of this analysis by modifying a Configuration Map that will be saved in the cluster. However, the user is free to create their own analysis templates from scratch, which will give you greater flexibility in terms of:

  • Parameters being used
  • The arguments that the template will receive in real-time
  • The destination of the analysis that can be executed (ex. a Kubernetes Job or calls to an external server like Prometheus.)

For more information on how to implement these Analysis templates refer to:

https://argoproj.github.io/argo-rollouts/features/analysis/

Limitations And Next Steps

Inherently, the operator can lack some required power. If you’re seeking something supercharged, I would recommend enabling the ability to monitor multiple namespaces simultaneously.

In its current state, the Rollout Operator creates all required resources in a singular Kubernetes namespace, with these resources every Deployment with the correct set of tags that lands on that namespace are observed and transformed automatically, the namespace is selected dynamically during Helm Chart installation allowing some flexibility.

This will allow the EventSources to trigger upon creation or modification of Deployments in multiple namespaces simultaneously, that way one operator could take care of the entire cluster administration, instead of running multiple releases of the Operator like it is necessary right now.

You may also want the ability to redirect traffic using Service Mesh or Ingress controllers with Canary service endpoints. This feature was not implemented in this case, but it would be a good addition for the next versions. The current state of the Operator allows to fully intercept all incoming traffic sent to the Service by creating a new Replica Set and the corresponding Service Endpoints managed by a Rollout. This upgrade to enable Service Mesh and Ingress control will require more modifications on the application side, for example creating a secondary Service and adding some additional annotation on the Ingress objects, none of which is extremely complicated but will require some changes on the current Workflow logic.

Overall, my big takeaway is the insane flexibility from the Argo suite. This powerful, open-source stack allows for incredible customization and interaction with a Kubernetes Cluster. While working the Operator, we quickly realized that this approach can extend to basically create any Post Admission Webhook: any action on your Kubernetes cluster that involves a native resource can trigger the transformation or the initialization of a series of steps that you can define at will. The outcome of these steps is up to you. 

It really comes down to imagination and the current state of the Kubernetes API — both of which I believe can take us pretty far. 

I hope you enjoyed this overview. Let me know what you think in the comments below.