Deploy a new Alertmanager

SLO-based alerting requires you to configure Service Mesh Manager to use an Alertmanager deployment. This procedure describes how to deploy a new Alertmanager for Service Mesh Manager.

Note: If you want to use an existing Alertmanager deployment instead, see Use an existing Alertmanager.

Prerequisites

In this article an example configuration is provided that is suitable for use with Slack alerts. To configure this notification, a Slack account and a Slack Incoming Webhook is required.

To configure additional notification targets, see the Prometheus Alertmanager documentation.

Steps

  1. Set up Alertmanager. You can find a basic alerting setup in this example. Download the file and replace the %SLACK_API_URL% string with your Slack Incoming Webhook URL.

  2. Apply the resulting file by running this command:

    kubectl apply -f alertmanager.yaml -n smm-system
    
  3. Verify that Alertmanager is running. Run the following command:

    kubectl get pods -n smm-system -l app=alertmanager
    NAME                                    READY   STATUS    RESTARTS   AGE
    alertmanager-smm-alertmanager-0   3/3     Running   0          62s
    alertmanager-smm-alertmanager-1   3/3     Running   0          62s
    

    You should see two instances of Alertmanager in Running state.

  4. Configure Prometheus to use this Alertmanager. Service Mesh Manager is controlled by a ControlPlane custom resource found in the Service Mesh Manager’s namespace (default: smm-system) named smm.

    The following command changes the spec.smm.prometheus.alertmanager value to connect to the Alertmanagers started by the YAMLs specified in the previous steps. Run the following commands:

    cat > enable-new-alertmanager.yaml <<EOF
    spec:
      smm:
        prometheus:
          enabled: true
          alertmanager:
          static:
            - host: alertmanager-smm-alertmanager-0.alertmanager-operated.smm-system.svc.cluster.local
              port: 9093
            - host: alertmanager-smm-alertmanager-1.alertmanager-operated.smm-system.svc.cluster.local
              port: 9093
    EOF
    
    kubectl patch controlplane --type=merge --patch "$(cat enable-new-alertmanager.yaml)" smm
    
  5. If you are using Service Mesh Manager in operator mode, skip this step.

    Otherwise, execute a reconciliation so Service Mesh Manager updates your Kubernetes cluster to the desired state described by the ControlPlane Custom Resource. Run the following command:

    smm operator reconcile
    

Example Alertmanager configuration

The following is a basic alerting setup for Alertmanager. Download the file and replace the %SLACK_API_URL% string with the your Slack Incoming Webhook URL.

kind: Secret
apiVersion: v1
metadata:
  name: smm-alertmanager-config
stringData:
  alertmanager.yaml: |
    global:
      slack_api_url: "%SLACK_API_URL%"
    templates:
      - "*.tmpl"

    route:
      receiver: 'slack-notifications'
      group_by: [ service, severity ]

    receivers:
    - name: 'slack-notifications'
      slack_configs:
      - channel: '#smm-demo-alerts'
        send_resolved: true
        title: '{{ template "slack.title" . }}'
        color: '{{ template "slack.color" . }}'
        text: '{{ template "slack.text" . }}'    

  slack-templates.tmpl: |
    {{ define "__alert_severity_prefix_title" -}}
        {{ if ne .Status "firing" -}}
        :lgtm:
        {{- else if eq .CommonLabels.severity "page" -}}
        :fire:
        {{- else if eq .CommonLabels.severity "ticket" -}}
        :warning:
        {{- else -}}
        :question:
        {{- end }}
    {{- end }}

    {{/* First line of Slack alerts */}}
    {{ define "slack.title" -}}
        [{{ .Status | toUpper -}}
        {{ if eq .Status "firing" }}:{{ .Alerts.Firing | len }}{{- end -}}
        ] {{ template "__alert_severity_prefix_title" . }} {{ .CommonLabels.service }}
    {{- end }}


    {{/* Color of Slack attachment (appears as line next to alert )*/}}
    {{ define "slack.color" -}}
        {{ if eq .Status "firing" -}}
            {{ if eq .CommonLabels.severity "page" -}}
                warning
            {{- else if eq .CommonLabels.severity "ticket" -}}
                danger
            {{- else -}}
                #439FE0
            {{- end -}}
        {{ else -}}
        good
        {{- end }}
    {{- end }}

    {{/* The test to display in the alert */}}
    {{ define "slack.text" -}}
        {{ range .Alerts }}
            {{- if .Annotations.message }}
                {{ .Annotations.message }}
            {{- end }}
            {{- if .Annotations.description }}
                {{ .Annotations.description }}
            {{- end }}

        {{- end }}
    {{- end }}    

---
kind: Alertmanager
apiVersion: monitoring.smm.cisco.com/v1
metadata:
  name: smm-alertmanager
spec:
  affinity:
    podAntiAffinity:
      preferredDuringSchedulingIgnoredDuringExecution:
      - weight: 100
        podAffinityTerm:
          labelSelector:
            matchLabels:
              app.kubernetes.io/component: prometheus
              app.kubernetes.io/instance: smm
              app.kubernetes.io/name: smm-alertmanager
              smm.cisco.com/cluster-name: master
          topologyKey: topology.kubernetes.io/region
      requiredDuringSchedulingIgnoredDuringExecution:
      - labelSelector:
          matchLabels:
            app.kubernetes.io/component: prometheus
            app.kubernetes.io/instance: smm
            app.kubernetes.io/name: smm-alertmanager
            smm.cisco.com/cluster-name: master
        topologyKey: kubernetes.io/hostname
  listenLocal: false
  configSecret: smm-alertmanager-config
  replicas: 2
  podMetadata:
    annotations:
      istio.io/rev: cp-v112x.istio-system
      sidecar.istio.io/inject: "true"
    labels:
      app.kubernetes.io/component: prometheus
      app.kubernetes.io/instance: smm
      app.kubernetes.io/name: smm-alertmanager
      smm.cisco.com/cluster-name: master
    name: "smm-alertmanager"

    resources:
      requests:
        memory: "100Mi"
        cpu: "50m"
        ephemeral-storage: "200Mi"
      limits:
        memory: "200Mi"
        cpu: "200m"
        ephemeral-storage: "200Mi"
    serviceAccountName: smm-prometheus

---
apiVersion: networking.istio.io/v1alpha3
kind: DestinationRule
metadata:
  name: smm-alertmanager
  namespace: smm-system
  labels:
    app.kubernetes.io/component: prometheus
    app.kubernetes.io/instance: smm
    app.kubernetes.io/name: smm-alertmanager
    smm.cisco.com/cluster-name: master
spec:
  host: alertmanager-operated.smm-system.svc.cluster.local
  trafficPolicy:
    tls:
      mode: ISTIO_MUTUAL