SLO specification format¶
sloth-k8s accepts SLO specifications in the Prometheus Sloth format
(version: "prometheus/v1"). This page describes the structure of a specification file
and the fields available within it.
For the authoritative upstream specification, see:
Annotated example¶
# Required. Must be "prometheus/v1" for the Prometheus backend.
version: "prometheus/v1"
# Required. A logical name for the service being measured.
service: "my-service"
# Optional. Arbitrary key/value labels attached to every generated rule.
labels:
team: my-team
repo: my-org/my-service
# Required. One or more SLO definitions.
slos:
- # Required. Unique name for this SLO within the service. Used in rule names.
name: "requests-availability"
# Required. Target success percentage over the SLO period (0–100).
objective: 99.9
# Optional. Human-readable description included in rule annotations.
description: "99.9% of HTTP requests succeed."
sli:
# "events" SLI: ratio of bad events to total events.
events:
# Required. PromQL expression that evaluates to the rate of bad events.
# Use {{.window}} as a placeholder — Sloth substitutes the correct window duration.
error_query: |
sum(rate(http_requests_total{status=~"5.."}[{{.window}}]))
# Required. PromQL expression that evaluates to the rate of all events.
total_query: |
sum(rate(http_requests_total[{{.window}}]))
alerting:
# Required (when alerting is enabled). Alert name prefix used in generated rules.
name: MyServiceHighErrorRate
# Optional. Extra labels added to generated alert rules.
labels:
category: availability
# Optional. Annotations added to generated alert rules.
annotations:
summary: "High error rate on 'my-service' requests"
# Optional. Override labels/annotations for the page-level alert only,
# or disable it entirely.
page_alert:
labels:
severity: page
# disable: true # uncomment to suppress the page alert
# Optional. Override labels/annotations for the ticket-level alert only,
# or disable it entirely.
ticket_alert:
labels:
severity: ticket
# disable: true # uncomment to suppress the ticket alert
Top-level fields¶
Field |
Required |
Description |
|---|---|---|
|
Yes |
Must be |
|
Yes |
Logical name of the service. Included in all generated rule labels as
|
|
No |
Arbitrary key/value map. Propagated as labels on every generated recording rule and alert rule. |
|
Yes |
List of SLO definitions. Must contain at least one entry. |
SLO fields (slos[*])¶
Field |
Required |
Description |
|---|---|---|
|
Yes |
Unique identifier within the service. Used in rule group names and the
|
|
Yes |
Target success rate as a percentage (e.g., |
|
No |
Human-readable description included in generated rule annotations. |
|
Yes |
Defines the SLI measurement. See SLI fields (slos[*].sli) below. |
|
No |
Configures alerting rules. If omitted, no alerts are generated. |
SLI fields (slos[*].sli)¶
Sloth supports two SLI types. Only one may be specified per SLO.
Events-based SLI (sli.events)
Measures the ratio of bad events to all events. Suitable for request-based services.
Field |
Required |
Description |
|---|---|---|
|
Yes |
PromQL expression returning the rate of bad (error) events. Must use
|
|
Yes |
PromQL expression returning the rate of all events. Same |
Raw-ratio SLI (sli.raw)
Provides a pre-computed error ratio directly, when you already have an expression that
returns a value between 0 and 1. Consult the
upstream API reference
for the raw field structure.
Alerting fields (slos[*].alerting)¶
Field |
Required |
Description |
|---|---|---|
|
Yes (if |
Prefix for generated alert rule names. |
|
No |
Labels applied to all generated alert rules for this SLO. |
|
No |
Annotations applied to all generated alert rules. |
|
No |
Overrides for the page-level (high-urgency) alert. Supports |
|
No |
Overrides for the ticket-level (lower-urgency) alert. Supports |