Monitoring

Monitor Kyverno policy metrics with Prometheus

Introduction

As a cluster administrator, it may benefit you to have monitoring capabilities over both the state and execution of cluster-applied Kyverno policies. This includes monitoring over any applied changes to policies, any activity associated with incoming requests, and any results produced as an outcome. If enabled, monitoring will allow you to visualize and alert on applied policies, and is critical to overall cluster observability and compliance.

In addition, you can specify the scope of your monitoring targets to either the rule, policy, or cluster level, which enables you to extract more granular insights from collected metrics.

Installation and Setup

When you install Kyverno via Helm, a service called kyverno-svc-metrics gets created inside the kyverno namespace and this service exposes metrics on port 8000.

 1$ values.yaml
 2
 3...
 4metricsService:
 5  create: true
 6  type: ClusterIP
 7  ## Kyverno's metrics server will be exposed at this port
 8  port: 8000
 9  ## The Node's port which will allow access Kyverno's metrics at the host level. Only used if service.type is NodePort.
10  nodePort:
11  ## Provide any additional annotations which may be required. This can be used to
12  ## set the LoadBalancer service type to internal only.
13  ## ref: https://kubernetes.io/docs/concepts/services-networking/service/#internal-load-balancer
14  ##
15  annotations: {}
16...

By default, the service type is going to be ClusterIP meaning that metrics can only be scraped by a Prometheus server sitting inside the cluster.

In some cases, the Prometheus server may sit outside your workload cluster as a shared service. In these scenarios, you will want the kyverno-svc-metrics service to be publicly exposed so as to expose the metrics (available at port 8000) to your external Prometheus server.

Services can be exposed to external clients via an Ingress, or using LoadBalancer or NodePort service types.

To expose your kyverno-svc-metrics service publicly as NodePort at host’s/node’s port number 8000, you can configure your values.yaml before Helm installation as follows:

 1...
 2metricsService:
 3  create: true
 4  type: NodePort
 5  ## Kyverno's metrics server will be exposed at this port
 6  port: 8000
 7  ## The Node's port which will allow access Kyverno's metrics at the host level. Only used if service.type is NodePort.
 8  nodePort: 8000
 9  ## Provide any additional annotations which may be required. This can be used to
10  ## set the LoadBalancer service type to internal only.
11  ## ref: https://kubernetes.io/docs/concepts/services-networking/service/#internal-load-balancer
12  ##
13  annotations: {}
14...

To expose the kyverno-svc-metrics service using a LoadBalancer type, you can configure your values.yaml before Helm installation as follows:

 1...
 2metricsService:
 3  create: true
 4  type: LoadBalancer
 5  ## Kyverno's metrics server will be exposed at this port
 6  port: 8000
 7  ## The Node's port which will allow access Kyverno's metrics at the host level. Only used if service.type is NodePort.
 8  nodePort: 
 9  ## Provide any additional annotations which may be required. This can be used to
10  ## set the LoadBalancer service type to internal only.
11  ## ref: https://kubernetes.io/docs/concepts/services-networking/service/#internal-load-balancer
12  ##
13  annotations: {}
14...

Configuring the metrics

While installing Kyverno via Helm, you also have the ability to configure which metrics you’ll want to expose.

You can configure which namespaces you want to include and/or exclude for metric exportation when configuring your Helm chart. This configuration is useful in situations where you might want to exclude the exposure of Kyverno metrics for certain futile namespaces like test namespaces, which you might be dealing with on a regular basis. Likewise, you can include certain namespaces if you want to monitor Kyverno-related activity for only a set of certain critical namespaces. Exporting the right set of namespaces (as opposed to exposing all namespaces) can end up substantially reducing the memory footprint of Kyverno’s metrics exporter.

 1...
 2config:
 3  metricsConfig:
 4    namespaces: {
 5      "include": [],
 6      "exclude": []
 7    }
 8  # 'namespaces.include': list of namespaces to capture metrics for. Default: all namespaces included.
 9  # 'namespaces.exclude': list of namespaces to NOT capture metrics for. Default: [], none of the namespaces excluded.
10...

exclude takes precedence over “include”, in cases when a namespace is provided under both include and exclude.

The below content is deprecated

The metric refresh interval is also configurable, and allows the metrics registry to purge itself of all associated metrics within that time frame. This clean-up resets the memory footprint associated with Kyverno’s metric exporter. This is particularly useful in scenarios when concerned with the overall memory footprint of Kyverno’s metric exporter.

1...
2config:
3  # rate at which metrics should reset so as to clean up the memory footprint of kyverno metrics, if you might be expecting high memory footprint of Kyverno's metrics.
4  metricsRefreshInterval: 24h 
5  #Default: 0, no refresh of metrics
6...

You still would not lose your previous metrics as your metrics get persisted in the Prometheus backend.

Metrics and Dashboard

Policies and Rule Counts

This metric can be used to track the number of policies as well as rules present in the cluster which are currently active and even the ones which are not currently active but were created in the past.

Policy and Rule Execution

This metric can be used to track the results associated with the rules executing as a part of incoming resource requests and even background scans. This metric can be further aggregated to track policy-level results as well.

Policy Rule Execution Latency

This metric can be used to track the latencies associated with the execution/processing of the individual rules whenever they evaluate incoming resource requests or execute background scans. This metric can be further aggregated to present latencies at the policy-level.

Monitoring

Introduction

Installation and Setup

Configuring the metrics

Metrics and Dashboard

Policies and Rule Counts

Policy and Rule Execution

Policy Rule Execution Latency

Admission Review Latency

Admission Requests Counts

Policy Change Counts

Grafana Dashboard

OpenTelemetry