Observability¶
Monitor the operator and your RADIUS infrastructure with Prometheus metrics.
Operator Metrics¶
The operator exposes Prometheus metrics on :8080/metrics. These cover the operator's own reconciliation performance — not FreeRADIUS traffic metrics.
Available Metrics¶
| Metric | Type | Labels | Description |
|---|---|---|---|
freeradius_operator_reconcile_total |
Counter | namespace, name, kind, result |
Total reconciliation attempts. result is success or error. |
freeradius_operator_reconcile_duration_seconds |
Histogram | — | Time spent in each reconciliation loop. |
Scrape Configuration¶
If you're using the Prometheus Operator, create a ServiceMonitor:
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: freeradius-operator
namespace: freeradius-system
spec:
selector:
matchLabels:
app: freeradius-operator
endpoints:
- port: metrics
interval: 30s
path: /metrics
For plain Prometheus, add a scrape config:
scrape_configs:
- job_name: freeradius-operator
kubernetes_sd_configs:
- role: pod
relabel_configs:
- source_labels: [__meta_kubernetes_pod_label_app]
regex: freeradius-operator
action: keep
- source_labels: [__meta_kubernetes_pod_container_port_number]
regex: "8080"
action: keep
FreeRADIUS Server Metrics¶
Every RadiusCluster ships with a built-in Prometheus exporter sidecar that
talks to the co-located freeradius process over the loopback via the
RFC 5997 Status-Server protocol. The exporter is the same operator binary
invoked with --mode=exporter, so there is no extra image to publish or
pin — only the operator image.
What's exposed¶
The sidecar listens on TCP port 9812 and exposes /metrics and
/healthz. Every metric carries a cluster="<RadiusCluster name>" label.
| Metric | Type | Description |
|---|---|---|
freeradius_up |
Gauge | 1 if the last scrape of the status-server succeeded, 0 otherwise. |
freeradius_scrape_duration_seconds |
Gauge | Duration of the last scrape. Useful for latency SLOs. |
freeradius_exporter_scrape_errors_total |
Counter | Total number of scrape attempts that failed. |
freeradius_access_requests_total |
Counter | Total Access-Request packets received. |
freeradius_access_accepts_total |
Counter | Total Access-Accept packets sent. |
freeradius_access_rejects_total |
Counter | Total Access-Reject packets sent. |
freeradius_access_challenges_total |
Counter | Total Access-Challenge packets sent. |
freeradius_auth_{duplicate,malformed,invalid,dropped,unknown_types}_requests_total |
Counter | Per-listener error/abuse counters. |
freeradius_acct_*_total |
Counter | Mirror set for the accounting listener. |
freeradius_proxy_access_*_total, freeradius_proxy_acct_*_total |
Counter | Mirror sets for proxied traffic. |
freeradius_queue_len_{internal,proxy,auth,acct,detail} |
Gauge | Internal queue depths — rising values indicate saturation. |
Enabling¶
Server metrics are on by default. The operator injects the exporter
sidecar into every pod that runs an auth listener (standalone mode, and the
auth role in split-mode). Acct- or CoA-only pods get no sidecar because
the status-server only binds to the auth listener.
apiVersion: radius.operator.io/v1alpha1
kind: RadiusCluster
metadata:
name: production
spec:
image: freeradius/freeradius-server:3.2.3
replicas: 3
metrics:
enabled: true # default
# port: 9812 # default
# image: my-registry/op:v1 # defaults to operator image
resources:
requests: { cpu: 25m, memory: 32Mi }
limits: { cpu: 100m, memory: 64Mi }
serviceMonitor:
enabled: true
interval: 30s
labels:
release: kube-prometheus-stack
prometheusRule:
enabled: true
labels:
release: kube-prometheus-stack
Setting spec.metrics.enabled: false removes the sidecar and the metrics
Service port.
ServiceMonitor and PrometheusRule¶
When serviceMonitor.enabled: true, the operator creates a
monitoring.coreos.com/v1 ServiceMonitor selecting the cluster's auth
Service on the metrics port. Set labels so the Prometheus Operator
selects it (for kube-prometheus-stack this is typically
release: kube-prometheus-stack).
When prometheusRule.enabled: true, the operator creates a PrometheusRule
with four starter alerts:
| Alert | Expression | For | Severity |
|---|---|---|---|
RadiusClusterDown |
freeradius_up == 0 |
2m | critical |
RadiusHighRejectRate |
Reject rate > 50% of total | 5m | warning |
RadiusAuthLatencyHigh |
freeradius_scrape_duration_seconds > 0.5 |
10m | warning |
RadiusQueueDepthGrowing |
freeradius_queue_len_internal > 100 |
10m | warning |
Both resources are best-effort: if the Prometheus Operator CRDs are not
installed, the operator logs a single line and skips them. The sidecar and
the /metrics endpoint still work — you can scrape with a plain Prometheus
scrape config.
Grafana dashboard¶
A starter Grafana dashboard is included at
docs/dashboards/freeradius.json. Import it
and select the Prometheus datasource that scrapes the cluster.
End-to-end example¶
See example/metrics/ for a complete working
configuration.
Status Conditions¶
Beyond metrics, the operator writes structured conditions to each resource's status. These are queryable with kubectl and useful for alerting.
Check cluster health¶
# Quick overview
kubectl get radiuscluster -n radius
# Detailed conditions
kubectl get radiuscluster production -n radius -o jsonpath='{.status.conditions}' | jq .
Alert on degraded clusters¶
A Degraded condition means the operator detected a problem (usually a missing Secret) and is retrying. You can alert on this with a Prometheus rule that watches the kube_customresource_status_condition metric (if using kube-state-metrics with CRD support) or by polling the Kubernetes API.
Useful kubectl Commands¶
# List all RADIUS resources
kubectl get rc,rcl,rp -n radius
# Watch reconciliation in real time
kubectl get radiuscluster -n radius -w
# Check pod health
kubectl get pods -n radius -l app.kubernetes.io/managed-by=freeradius-operator
# View operator logs
kubectl logs -n freeradius-system deploy/freeradius-operator -f
# Check pod restart count from status
kubectl get radiuscluster production -n radius \
-o jsonpath='{.status.podRestarts}'