Scale SMM Control Plane

Service Mesh Manager follows the microservice architecture pattern. This document details what are the scaling properties of the different microservices Service Mesh Manager is built on.

These services (by default) are installed into the smm-system namespace, thus the procedures described in Scale a specific Workload apply to them.

This document omits a few services as they have minimal resource requirements even for large-scale deployments, thus no tuning is necessary for those.

API services

To provide the dashboard functionality, Service Mesh Manager relies on a set of GraphQL API servers. These servers are only used when the dashboard is being used.

Their Memory usage scales linearly with the number of Workloads, Services, and other Kubernetes objects. Since they cache some Kubernetes objects, we recommend setting their Memory limits based on actual measurements specific to the current workload in the cluster.

Note: As Service Mesh Manager can be used to monitor Workloads that are not part of the Istio mesh, the resource utilization of these services depends on the total number of Kubernetes Workloads, not just the Istio-enabled ones.

Component Usage Resource setting in ControlPlane
smm-health-api Provides Health scores on the dashboard .spec.smm.health.api.resources
smm-sre-api Provides SLO access on the dashboard .spec.smm.sre.api.resources
smm Provides Istio management .spec.smm.application.resources
smm-federation-gateway Aggregates the API server’s APIs .spec.smm.federationGateway.resources

For example, to set the resource requirements of smm-health-api, run the following commands:

cat > change-health-resources.yaml <<EOF
spec:
 smm:
   health:
     api:
       resources:
         requests:
           cpu: 500m
           memory: "1500M"
         limits:
           cpu: "1"
           memory: "2000M"
EOF

kubectl patch controlplane --type=merge --patch "$(cat change-health-resources.yaml )" smm
  • If you are using Service Mesh Manager in Operator Mode, then the Istio deployment is updated automatically.
  • If you are using the imperative mode, run the smm operator reconcile command to apply the changes.

Health controller

The health controller is responsible for collecting outlier detection data for all Services and Workloads in the Cluster running Service Mesh Manager. The health controller is implemented in a way that it cannot scale horizontally. Use the Service Mesh Manager dashboard to find out the right CPU and Memory requirements for this component.

The health controller’s resource requirements can be set in the ControlPlane CR’s .spec.smm.health.controller.resources key as shown in the api services example.

Note: The health controller increases the resource usage of Prometheus by approximately 30%. You can disable the outlier detection system using the .spec.smm.health.enabled setting.

SRE Controller

The SRE controller is responsible for SLO measurement and alerting. The SRE controller is implemented in a way that it cannot scale horizontally. Use the Service Mesh Manager dashboard to find out the right CPU and Memory requirements for this component.

The SRE controller’s resource requirements can be set in the ControlPlane CR’s .spec.smm.sre.controller.resources key as shown in the api services example.

Another component belonging to the alerting subsystem is the smm-sre-alert-exporter that helps Service Mesh Manager visualizing the historical alerting data. This service has small footprint, but if it needs to be scaled it can be done using the .spec.smm.sre.alertExporter.resources settings.

Other components

As you can see from the list of Pods running in the smm-system namespace, Service Mesh Manager uses other components as well. Refer to the definition of the ControlPlane resource to check where to set the resource requirements of those parts.

For details, see The ControlPlane Custom Resource

You can get the current CRD by running the following command:

kubectl get crd controlplanes.smm.cisco.com -o yaml