Istio configuration validation

If you’re an active Istio user, then there’s a good chance that Istio’s configuration reference is bookmarked in your browser, and that you’ve read the pages on VirtualServices, and ServiceEntries over and over, but still have to struggle to set up even simple configurations in your mesh.

Istio’s custom resource configuration is very powerful and flexible, but infamous for being overly complex. At its best, its YAML consists of lists of lists, cross-references, conflicting fields, and wildcards.

Even though Istio’s maintainers are aware of this hyper-complexity, and - at least in the last few releases - have tried to bring user friendliness into focus, Istio still routinely strands us in quagmires of minutia and uncertainty. We’re down to ~25 custom resources from ~50 a year ago, and some now have useful CLI features like istioctl analyze, but we feel that there’s more to be done.

That’s why we’ve added our own validation subsystem. The Service Mesh Manager service mesh platform maintains total compatibility with upstream Istio, but also extends its feature set, while avoiding lock-in through a new abstraction layer. A good example of this is its validation subsystem, which takes Istio’s validation system to a whole new level. It does this by considering the cluster state, as a whole, rather than just Istio’s configuration.

Istio configuration validation in Service Mesh Manager

Validation results can be seen on the MAIN MENU > OVERVIEW page of the UI:

Validation Validation

Click Show YAML configuration icon to display the configuration file. In case of validation errors, the relevant parts are highlighted

Validation Validation

You can check the validation results from the command line as well:

smm analyze
0 validation errors found

You can also run the validation for a specific namespace, for example:

smm analyze --namespace istio-system

The analyze command can also produce JSON output, for example:

smm analyze --namespace istio-system -o json

A sample error output in JSON:

{
  "gateway.networking.istio.io:master:istio-system:demo-gw-demo1": [
    {
      "checkID": "gateway/reused-cert",
      "istioRevision": "cp-v113x.istio-system",
      "subjectContextKey": "gateway.networking.istio.io:master:istio-system:demo-gw-demo1",
      "passed": false,
      "error": {},
      "errorMessage": "multiple gateways configured with same TLS certificate"
    }
  ],
  "gateway.networking.istio.io:master:istio-system:demo-gw-demo2": [
    {
      "checkID": "gateway/reused-cert",
      "istioRevision": "cp-v113x.istio-system",
      "subjectContextKey": "gateway.networking.istio.io:master:istio-system:demo-gw-demo2",
      "passed": false,
      "error": {},
      "errorMessage": "multiple gateways configured with same TLS certificate"
    }
  ]
}

Validation examples

Service Mesh Manager performs a lot of validation checks for various aspects of the configuration, both syntactically and semantically. The validation checks are constantly curated and new checks added with every release. A few examples will be presented in this post to show how helpful this feature is.

Sidecar injection template validation

This check validates whether there are any pods within the environment that runs with an outdated sidecar proxy image or configuration. In this example the global configuration setting of the sidecar proxy image was changed from banzaicloud/istio-proxyv2:1.7.3-bzc to banzaicloud/istio-proxyv2:1.7.3-bzc.1.

smm analyze --namespace smm-demo

An error output looks like this:

destinationrule smm-demo/movies:
  Cluster: ex7gkhfn49gi5
  Error: missing mesh policy
    Control Plane: cp-v113x.istio-system
    Error ID: destinationrule/enabled-mtls/destinationrule/enabled-mtls/missing-mesh-policy
    Severity: error
    Path: host
    Context:
      hostname: movies

✗ 1 validation error found

This helps operators to get information about outdated proxies within the environment.

Gateway port protocol configuration conflict validation

This example demonstrates a check for the common mistake of setting conflicting port configuration in different Gateway resources, which won’t be denied by Istio’s built-in validation, but can cause unwanted behavior at ingress. The 9443 port for the same ingress gateway has been set to TCP in one resource, and set to TLS in another.

The following YAMLs were applied:

apiVersion: networking.istio.io/v1beta1
kind: Gateway
metadata:
  name: demo-gw-port-conflict-01
  namespace: istio-system
spec:
  selector:
    app: demo-gw
    gateway-name: demo-gw
    gateway-type: ingress
  servers:
  - hosts:
    - demo1.example.com
    port:
      name: tcp
      number: 9443
      protocol: TCP
---
apiVersion: networking.istio.io/v1beta1
kind: Gateway
metadata:
  name: demo-gw-port-conflict-02
  namespace: istio-system
spec:
  selector:
    app: demo-gw
    gateway-name: demo-gw
    gateway-type: ingress
  servers:
  - hosts:
    - demo2.example.com
    port:
      name: tls
      number: 9443
      protocol: TLS
    tls:
      serverCertificate: /certs/cert.pem
      privateKey: /certs/key.pem
      mode: SIMPLE

Check the configuration’s validity by running the CLI tool’s analyze command.

smm analyze --namespace istio-system

The output shows the issue exactly, and provides all the information necessary for the operator to quickly pinpoint the problem in the configuration.

gateway istio-system/demo-gw-port-conflict-01:
    Cluster: master
    Error: Conflicting gateway port protocols
        Control Plane: v113x.istio-system
        Error ID: gateway/port/gateway/port/protocol-conflict
        Path: servers[0]
        Context:
            port: 9443
            protocol: TCP

gateway istio-system/demo-gw-port-conflict-02:
    Cluster: master
    Error: Conflicting gateway port protocols
        Control Plane: cp-v113x.istio-system
        Error ID: gateway/port/gateway/port/protocol-conflict
        Path: servers[0]
        Context:
            port: 9443
            protocol: TLS

✗ 2 validation errors found

Multiple gateways with the same TLS certificate validation

Configuring more than one gateway, using the same TLS certificate, causes browsers that leverage HTTP/2 connection reuse (that is, most browsers) to produce 404 errors when accessing a second host after a connection to another host has already been established.

You can read more about this issue in the Istio docs.

Let’s apply the following resources to demonstrate how this issue works:

apiVersion: servicemesh.cisco.com/v1alpha1
kind: IstioMeshGateway
metadata:
  labels:
    app: demo-gw
  name: demo-gw
  namespace: istio-system
spec:
  istioControlPlane:
    name: cp-v113x
    namespace: istio-system
  deployment:
    metadata:
      labels:
        app: demo-gw
    replicas:
      min: 1
      max: 1
      count: 1
  service:
    ports:
      - name: http2
        port: 80
        protocol: TCP
        targetPort: 8080
      - name: https
        port: 443
        protocol: TCP
        targetPort: 8443
    type: LoadBalancer
  runAsRoot: true
  type: ingress
---
apiVersion: cert-manager.io/v1alpha2
kind: ClusterIssuer
metadata:
  name: selfsigned-issuer
spec:
  selfSigned: {}
---
apiVersion: cert-manager.io/v1alpha2
kind: Certificate
metadata:
  name: example-wildcard-cert
  namespace: istio-system
spec:
  secretName: example-wildcard-cert
  duration: 2160h # 90d
  renewBefore: 360h # 15d
  commonName: "test wildcard certifcate"
  isCA: false
  keySize: 2048
  keyAlgorithm: rsa
  keyEncoding: pkcs1
  usages:
    - server auth
  dnsNames:
  - "*.example.com"
  issuerRef:
    name: selfsigned-issuer
    kind: ClusterIssuer
    group: cert-manager.io
---
apiVersion: networking.istio.io/v1beta1
kind: Gateway
metadata:
  name: demo-gw-tls-conflict-01
  namespace: istio-system
spec:
  selector:
    app: demo-gw
    gateway-name: demo-gw
    gateway-type: ingress
  servers:
  - hosts:
    - demo1.example.com
    port:
      name: https
      number: 443
      protocol: HTTPS
    tls:
      credentialName: example-wildcard-cert
      httpsRedirect: false
      mode: SIMPLE
---
apiVersion: networking.istio.io/v1beta1
kind: Gateway
metadata:
  name: demo-gw-tls-conflict-02
  namespace: istio-system
spec:
  selector:
    app: demo-gw
    gateway-name: demo-gw
    gateway-type: ingress
  servers:
  - hosts:
    - demo2.example.com
    port:
      name: https
      number: 443
      protocol: HTTPS
    tls:
      credentialName: example-wildcard-cert
      httpsRedirect: false
      mode: SIMPLE

The following resources were created:

  • an ingress gateway
  • an *.example.com wildcard certificate
  • two Gateway resources, both of which specify the same wildcard cert

Check the configuration’s validity by running the analyze command in the CLI tool.

smm analyze --namespace smm-system
gateway istio-system/demo-gw-demo1:
    Cluster: master
    Error: multiple gateways configured with same TLS certificate
        Control Plane: cp-v113x.istio-system
        Error ID: gateway/reused-cert/gateway/reused-cert
        Path: port[443]
        Context:
            reusedCertificateSecret: secret:master:istio-system:example-wildcard-cert

gateway istio-system/demo-gw-demo2:
    Cluster: master
    Error: multiple gateways configured with same TLS certificate
        Control Plane: cp-v113x.istio-system
        Error ID: gateway/reused-cert/gateway/reused-cert
        Path: port[443]
        Context:
            reusedCertificateSecret: secret:master:istio-system:example-wildcard-cert

✗ 2 validation errors were found