Flagger Canary

Flagger is Progressive Delivery Operator for Kubernetes that is designed to give developers confidence in automating production releases with progressive delivery techniques.

The benefit of using Canary releases is its ability to do capacity testing of the new version in a production environment with a safe rollback strategy if issues are found. It reduces the risk of new software versions in production by gradually shifting traffic to the new version while measuring traffic metrics and running rollout tests.

Flagger can run automated application testing for the following deployment strategies:

  • Canary (progressive traffic shifting)
  • A/B testing (HTTP headers and cookie traffic routing)
  • Blue/Green (Traffic switching mirroring)

The following example shows how to integrate Flagger with Service Mesh Manager to observe Progressive delivery on the Service Mesh Manager dashboard. To demonstrate this, you will learn how to configure and deploy podinfo application for Blue/Green traffic mirror testing, upgrade its version and watch the Canary release on the Service Mesh Manager dashboard.

Setting up Flagger with Service Mesh Manager

  1. Deploy Flagger into the smm-system namespace and connect it to Istio and Prometheus at the Service Mesh Manager Prometheus address as shown in the following command:

    Note: The Prometheus metrics service is hosted at http://smm-prometheus.smm-system.svc.cluster.local:59090/prometheus

    kubectl apply -f https://raw.githubusercontent.com/fluxcd/flagger/main/artifacts/flagger/crd.yaml
    helm repo add flagger https://flagger.app
    helm upgrade -i flagger flagger/flagger \
    --namespace=smm-system \
    --set crd.create=false \
    --set meshProvider=istio \
    --set metricsServer=http://smm-prometheus.smm-system.svc.cluster.local:59090/prometheus
    
  2. Make sure you see the following log message for successful flagger operator deployment in your Service Mesh Manager cluster:

    kubectl -n smm-system logs deployment/flagger
    

    Expected output:

    {"level":"info","ts":"2022-01-25T19:45:02.333Z","caller":"flagger/main.go:200","msg":"Connected to metrics server http://smm-prometheus.smm-system.svc.cluster.local:59090/prometheus"}
    

    At this point flagger is integrated with Service Mesh Manager. You can now deploy your own applications to be used for Progressive Delivery.

Podinfo example with Flagger

Next let’s try out an example from Flagger docs.

  1. Create the “test” namespace and enable sidecar-proxy auto-inject on for this namespace (use the smm binary downloaded from the SMM download page).

    Then deploy the “podinfo” target image that needs to be enabled for canary deployment for load testing during automated canary promotion:

    kubectl create ns test
    smm sidecar-proxy auto-inject on test
    kubectl apply -k https://github.com/fluxcd/flagger//kustomize/podinfo
    
  2. Create IstioMeshGateway service:

    kubectl apply -f - << EOF
    apiVersion: servicemesh.cisco.com/v1alpha1
    kind: IstioMeshGateway
    metadata:
      annotations:
        banzaicloud.io/related-to: istio-system/cp-v113x
      labels:
        app: test-imgw-app
        istio.io/rev: cp-v113x.istio-system
      name: test-imgw
      namespace: test
    spec:
      deployment:
        podMetadata:
          labels:
            app: test-imgw-app
            istio: ingressgateway
      istioControlPlane:
        name: cp-v113x
        namespace: istio-system
      service:
        ports:
          - name: http
            port: 80
            protocol: TCP
            targetPort: 8080
        type: LoadBalancer
      type: ingress
    EOF
    
  3. Add Port and Hosts for IstioMeshGateway using the following gateway configuration.

    kubectl apply -f - << EOF
    apiVersion: networking.istio.io/v1alpha3
    kind: Gateway
    metadata:
      name: public-gateway
      namespace: test
    spec:
      selector:
        app: test-imgw-app
        gateway-name: test-imgw
        gateway-type: ingress
        istio.io/rev: cp-v113x.istio-system
      servers:
        - port:
            number: 80
            name: http
            protocol: HTTP
          hosts:
            - "*"
    EOF
    
  4. Create a Canary custom resource.

    apiVersion: flagger.app/v1beta1
    kind: Canary
    metadata:
      name: podinfo
      namespace: test
    spec:
      targetRef:
        apiVersion: apps/v1
        kind: Deployment
        name: podinfo
      progressDeadlineSeconds: 60
      autoscalerRef:
        apiVersion: autoscaling/v2beta2
        kind: HorizontalPodAutoscaler
        name: podinfo
      service:
        port: 9898
        targetPort: 9898
        portDiscovery: true
        gateways:
          - public-gateway
        hosts:
          - "*"
        trafficPolicy:
          tls:
            mode: ISTIO_MUTUAL
        rewrite:
          uri: /
        retries:
          attempts: 3
          perTryTimeout: 1s
          retryOn: "gateway-error,connect-failure,refused-stream"
      analysis:
        interval: 30s
        threshold: 3
        maxWeight: 80
        stepWeight: 20
        metrics:
          - name: request-success-rate
            thresholdRange:
              min: 99
            interval: 1m
          - name: request-duration
            thresholdRange:
              max: 500
            interval: 30s
    
  5. Wait until Flagger initializes the deployment and sets up a VirtualService for podinfo.

    kubectl -n smm-system logs deployment/flagger -f
    

    Expected:

    {"level":"info","ts":"2022-01-25T19:54:42.528Z","caller":"controller/events.go:33","msg":"Initialization done! podinfo.test","canary":"podinfo.test"}
    
  6. Get the Ingress IP from IstioMeshGateway:

    export INGRESS_IP=$(kubectl get istiomeshgateways.servicemesh.cisco.com -n test test-imgw -o jsonpath='{.status.GatewayAddress[0]}')
    echo $INGRESS_IP
    

    The output should be an IP address, for example: 34.82.47.210

  7. Verify that podinfo is reachable from external IP address by running curl http://$INGRESS_IP/

    The output should be similar to:

    {
      "hostname": "podinfo-96c5c65f6-l7ngc",
      "version": "6.0.0",
      "revision": "",
      "color": "#34577c",
      "logo": "https://raw.githubusercontent.com/stefanprodan/podinfo/gh-pages/cuddle_clap.gif",
      "message": "greetings from podinfo v6.0.0",
      "goos": "linux",
      "goarch": "amd64",
      "runtime": "go1.16.5",
      "num_goroutine": "8",
      "num_cpu": "4"
    }
    
  8. Send traffic to the ingress IP. For this setup we will use the hey traffic generator. On macOS, you can install it from the brew package manager:

    brew install hey
    

    You can send traffic from any terminal where the IP address is reachable. This command sends curl requests for 30 mins from two threads, each with 10 requests per second:

    hey -z 30m -q 10 -c 2 http://$INGRESS_IP/
    

    On the Service Mesh Manager dashboard, select MENU > TOPOLOGY, and select the test namespace to see the generated traffic.

    Image of podinfo traffic Image of podinfo traffic

Upgrade Image version

The current pod version is v6.0.0, update it to the next version.

  1. Upgrade the target image to the new version and watch the canary functionality on the Service Mesh Manager dashboard.

    kubectl -n test set image deployment/podinfo podinfod=stefanprodan/podinfo:6.1.0
    

    Expected output:

    deployment.apps/podinfo image updated
    

    You can check the logs as flagger tests and promotes the new version:

    {"msg":"New revision detected! Scaling up podinfo.test","canary":"podinfo.test"}
    {"msg":"Starting canary analysis for podinfo.test","canary":"podinfo.test"}
    {"msg":"Advance podinfo.test canary weight 20","canary":"podinfo.test"}
    {"msg":"Advance podinfo.test canary weight 40","canary":"podinfo.test"}
    {"msg":"Advance podinfo.test canary weight 60","canary":"podinfo.test"}
    {"msg":"Advance podinfo.test canary weight 80","canary":"podinfo.test"}
    {"msg":"Copying podinfo.test template spec to podinfo-primary.test","canary":"podinfo.test"}
    {"msg":"HorizontalPodAutoscaler podinfo-primary.test updated","canary":"podinfo.test"}
    {"msg":"Routing all traffic to primary","canary":"podinfo.test"}
    {"msg":"Promotion completed! Scaling down podinfo.test","canary":"podinfo.test"}
    
  2. Check Canaries status by running the kubectl get canaries -n test -o wide command. The output should be similar to:

    NAME      STATUS         WEIGHT   FAILEDCHECKS   INTERVAL   MIRROR   STEPWEIGHT   STEPWEIGHTS   MAXWEIGHT   LASTTRANSITIONTIME
    podinfo   Initializing   0        0              30s                 20                         80          2022-04-11T21:25:31Z
    ..
    NAME      STATUS         WEIGHT   FAILEDCHECKS   INTERVAL   MIRROR   STEPWEIGHT   STEPWEIGHTS   MAXWEIGHT   LASTTRANSITIONTIME
    podinfo   Initialized    0        0              30s                 20                         80          2022-04-11T21:26:03Z
    ..
    NAME      STATUS         WEIGHT   FAILEDCHECKS   INTERVAL   MIRROR   STEPWEIGHT   STEPWEIGHTS   MAXWEIGHT   LASTTRANSITIONTIME
    podinfo   Progressing    0        0              30s                 20                         80          2022-04-11T21:33:03Z
    ..
    NAME      STATUS         WEIGHT   FAILEDCHECKS   INTERVAL   MIRROR   STEPWEIGHT   STEPWEIGHTS   MAXWEIGHT   LASTTRANSITIONTIME
    podinfo   Succeeded      0        0              30s                 20                         80          2022-04-11T21:35:28Z
    
  3. Visualize the entire progressive delivery through the Service Mesh Manager dashboard.

    Traffic from “TEST-IMGW-APP” is shifted from “podinfo-primary” to “podinfo-canary” from 20% to 80% (according to the step configured for canary rollouts). The following image shows the incoming traffic on the “podinfo-primary” pod: Image of primary podinfo traffic Image of primary podinfo traffic

    The following image shows the incoming traffic on “podinfo-canary” pod: Image of canary podinfo traffic Image of canary podinfo traffic

You can see that flagger dynamically shifts the ingress traffic to canary deployment in steps and performs conformity tests. Once the tests pass, flagger shifts the traffic back to the primary deployment and updates the version of the primary deployment to the new version.

Finally, Flagger scales down podinfo:6.0.0 and shifts the traffic to podinfo:6.1.0, and makes it a primary deployment.

The following image shows that the canary-image(v6.1.0) was tagged as primary-image(v6.1.0): Image of canary and podinfo traffic Image of canary and podinfo traffic

Automated rollback

To test automated rollback in case a canary fails, complete the following steps.

  1. Generate status 500 and delay by running the following command on the tester pod by running:

    watch "curl -s http://$INGRESS_IP/delay/1 && curl -s http://$INGRESS_IP/status/500"
    
  2. Watch how the Canary release fails. Run kubectl get canaries -n test -o wide

    The output should be similar to:

    NAME      STATUS        WEIGHT   FAILEDCHECKS   INTERVAL   MIRROR   STEPWEIGHT   STEPWEIGHTS   MAXWEIGHT   LASTTRANSITIONTIME
    podinfo   Progressing   60       1              30s                 20                         80          2022-04-11T22:10:33Z
    ..
    NAME      STATUS        WEIGHT   FAILEDCHECKS   INTERVAL   MIRROR   STEPWEIGHT   STEPWEIGHTS   MAXWEIGHT   LASTTRANSITIONTIME
    podinfo   Progressing   60       1              30s                 20                         80          2022-04-11T22:10:33Z
    ..
    NAME      STATUS        WEIGHT   FAILEDCHECKS   INTERVAL   MIRROR   STEPWEIGHT   STEPWEIGHTS   MAXWEIGHT   LASTTRANSITIONTIME
    podinfo   Progressing   60       2              30s                 20                         80          2022-04-11T22:11:03Z
    ..
    NAME      STATUS        WEIGHT   FAILEDCHECKS   INTERVAL   MIRROR   STEPWEIGHT   STEPWEIGHTS   MAXWEIGHT   LASTTRANSITIONTIME
    podinfo   Progressing   60       3              30s                 20                         80          2022-04-11T22:11:33Z
    ..
    NAME      STATUS   WEIGHT   FAILEDCHECKS   INTERVAL   MIRROR   STEPWEIGHT   STEPWEIGHTS   MAXWEIGHT   LASTTRANSITIONTIME
    podinfo   Failed   0        0              30s                 20                         80          2022-04-11T22:12:03Z
    
    {"msg":"New revision detected! Scaling up podinfo.test","canary":"podinfo.test"}
    {"msg":"Starting canary analysis for podinfo.test","canary":"podinfo.test"}
    {"msg":"Advance podinfo.test canary weight 20","canary":"podinfo.test"}
    {"msg":"Advance podinfo.test canary weight 40","canary":"podinfo.test"}
    {"msg":"Advance podinfo.test canary weight 60","canary":"podinfo.test"}
    {"msg":"Halt podinfo.test advancement request duration 917ms > 500ms","canary":"podinfo.test"}
    {"msg":"Halt podinfo.test advancement request duration 598ms > 500ms","canary":"podinfo.test"}
    {"msg":"Halt podinfo.test advancement request duration 1.543s > 500ms","canary":"podinfo.test"}
    {"msg":"Rolling back podinfo.test failed checks threshold reached 3","canary":"podinfo.test"}
    {"msg":"Canary failed! Scaling down podinfo.test","canary":"podinfo.test"}
    
  3. Visualize the canary rollout on the Service Mesh Manager Dashboard.

    When the rollout steps from 0% -> 20% -> 40% -> 60%, you can observe that the performance degrades for incoming requests > 500ms, causing image rollout to halt. Threshold was set to max 3 attempts, so after trying for three times, rollout was backed off.

    The following image shows the “primary-pod” incoming traffic graph: Image of podinfo traffic Image of podinfo traffic

    The following image shows the “canary-pod” incoming traffic graph: Image of podinfo traffic Image of podinfo traffic

    The following image shows the status of pod health: Image of podinfo traffic Image of podinfo traffic

Cleaning up

To clean up your cluster, run the following commands.

  1. Remove the Gateway and Canary CRs.

    kubectl delete -n test canaries.flagger.app podinfo
    kubectl delete -n test gateways.networking.istio.io public-gateway
    kubectl delete -n test istiomeshgateways.servicemesh.cisco.com test-imgw
    kubectl delete -n test deployment podinfo
    
  2. Delete the “test” namespace.

    kubectl delete namespace test
    
  3. Uninstall the Flagger deployment and delete the canary CRD resource.

    helm delete flagger -n smm-system
    kubectl delete -f https://raw.githubusercontent.com/fluxcd/flagger/main/artifacts/flagger/crd.yaml