Observability¶

Monitor the operator and your RADIUS infrastructure with Prometheus metrics.

Operator Metrics¶

The operator exposes Prometheus metrics on :8080/metrics. These cover the operator's own reconciliation performance — not FreeRADIUS traffic metrics.

Available Metrics¶

Metric	Type	Labels	Description
`freeradius_operator_reconcile_total`	Counter	`namespace`, `name`, `kind`, `result`	Total reconciliation attempts. `result` is `success` or `error`.
`freeradius_operator_reconcile_duration_seconds`	Histogram	—	Time spent in each reconciliation loop.

Scrape Configuration¶

If you're using the Prometheus Operator, create a ServiceMonitor:

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: freeradius-operator
  namespace: freeradius-system
spec:
  selector:
    matchLabels:
      app: freeradius-operator
  endpoints:
    - port: metrics
      interval: 30s
      path: /metrics

For plain Prometheus, add a scrape config:

scrape_configs:
  - job_name: freeradius-operator
    kubernetes_sd_configs:
      - role: pod
    relabel_configs:
      - source_labels: [__meta_kubernetes_pod_label_app]
        regex: freeradius-operator
        action: keep
      - source_labels: [__meta_kubernetes_pod_container_port_number]
        regex: "8080"
        action: keep

FreeRADIUS Server Metrics¶

Every RadiusCluster ships with a built-in Prometheus exporter sidecar that talks to the co-located freeradius process over the loopback via the RFC 5997 Status-Server protocol. The exporter is the same operator binary invoked with --mode=exporter, so there is no extra image to publish or pin — only the operator image.

What's exposed¶

The sidecar listens on TCP port 9812 and exposes /metrics and /healthz. Every metric carries a cluster="<RadiusCluster name>" label.

Metric	Type	Description
`freeradius_up`	Gauge	`1` if the last scrape of the status-server succeeded, `0` otherwise.
`freeradius_scrape_duration_seconds`	Gauge	Duration of the last scrape. Useful for latency SLOs.
`freeradius_exporter_scrape_errors_total`	Counter	Total number of scrape attempts that failed.
`freeradius_access_requests_total`	Counter	Total Access-Request packets received.
`freeradius_access_accepts_total`	Counter	Total Access-Accept packets sent.
`freeradius_access_rejects_total`	Counter	Total Access-Reject packets sent.
`freeradius_access_challenges_total`	Counter	Total Access-Challenge packets sent.
`freeradius_auth_{duplicate,malformed,invalid,dropped,unknown_types}_requests_total`	Counter	Per-listener error/abuse counters.
`freeradius_acct_*_total`	Counter	Mirror set for the accounting listener.
`freeradius_proxy_access__total`, `freeradius_proxy_acct__total`	Counter	Mirror sets for proxied traffic.
`freeradius_queue_len_{internal,proxy,auth,acct,detail}`	Gauge	Internal queue depths — rising values indicate saturation.

Enabling¶

Server metrics are on by default. The operator injects the exporter sidecar into every pod that runs an auth listener (standalone mode, and the auth role in split-mode). Acct- or CoA-only pods get no sidecar because the status-server only binds to the auth listener.

apiVersion: radius.operator.io/v1alpha1
kind: RadiusCluster
metadata:
  name: production
spec:
  image: freeradius/freeradius-server:3.2.3
  replicas: 3
  metrics:
    enabled: true                # default
    # port: 9812                 # default
    # image: my-registry/op:v1   # defaults to operator image
    resources:
      requests: { cpu: 25m, memory: 32Mi }
      limits:   { cpu: 100m, memory: 64Mi }
    serviceMonitor:
      enabled: true
      interval: 30s
      labels:
        release: kube-prometheus-stack
    prometheusRule:
      enabled: true
      labels:
        release: kube-prometheus-stack

Setting spec.metrics.enabled: false removes the sidecar and the metrics Service port.

ServiceMonitor and PrometheusRule¶

When serviceMonitor.enabled: true, the operator creates a monitoring.coreos.com/v1 ServiceMonitor selecting the cluster's auth Service on the metrics port. Set labels so the Prometheus Operator selects it (for kube-prometheus-stack this is typically release: kube-prometheus-stack).

When prometheusRule.enabled: true, the operator creates a PrometheusRule with four starter alerts:

Alert	Expression	For	Severity
`RadiusClusterDown`	`freeradius_up == 0`	2m	critical
`RadiusHighRejectRate`	Reject rate > 50% of total	5m	warning
`RadiusAuthLatencyHigh`	`freeradius_scrape_duration_seconds > 0.5`	10m	warning
`RadiusQueueDepthGrowing`	`freeradius_queue_len_internal > 100`	10m	warning

Both resources are best-effort: if the Prometheus Operator CRDs are not installed, the operator logs a single line and skips them. The sidecar and the /metrics endpoint still work — you can scrape with a plain Prometheus scrape config.

Grafana dashboard¶

A starter Grafana dashboard is included at docs/dashboards/freeradius.json. Import it and select the Prometheus datasource that scrapes the cluster.

End-to-end example¶

See example/metrics/ for a complete working configuration.

Status Conditions¶

Beyond metrics, the operator writes structured conditions to each resource's status. These are queryable with kubectl and useful for alerting.

Check cluster health¶

# Quick overview
kubectl get radiuscluster -n radius

# Detailed conditions
kubectl get radiuscluster production -n radius -o jsonpath='{.status.conditions}' | jq .

Alert on degraded clusters¶

A Degraded condition means the operator detected a problem (usually a missing Secret) and is retrying. You can alert on this with a Prometheus rule that watches the kube_customresource_status_condition metric (if using kube-state-metrics with CRD support) or by polling the Kubernetes API.

Useful kubectl Commands¶

# List all RADIUS resources
kubectl get rc,rcl,rp -n radius

# Watch reconciliation in real time
kubectl get radiuscluster -n radius -w

# Check pod health
kubectl get pods -n radius -l app.kubernetes.io/managed-by=freeradius-operator

# View operator logs
kubectl logs -n freeradius-system deploy/freeradius-operator -f

# Check pod restart count from status
kubectl get radiuscluster production -n radius \
  -o jsonpath='{.status.podRestarts}'