Configuration

This document provides detailed information about the configuration options available for the Sloth Kubernetes operator.

SLO Period Configuration

The charm supports configuration options for controlling SLO period windows and alert generation.

slo-period

Type: string

Default: 30d

Description:

The default SLO period for calculations. This determines the time window over which SLO compliance is measured. Common values include:

  • 30d - 30 days (default, recommended for most use cases)

  • 28d - 28 days (4-week rolling window)

  • 7d - 7 days (for shorter-term SLOs)

Important: Sloth only has built-in alert window defaults for 30d and 28d periods. If you use any other period (like 7d), you must also configure slo-period-windows, otherwise the charm will go to a blocked state.

Example:

# This works - 30d has built-in defaults
juju config sloth-k8s slo-period=30d

# This requires slo-period-windows configuration
juju config sloth-k8s slo-period=7d

slo-period-windows

Type: string (YAML format)

Default: "" (empty, uses Sloth defaults)

Required: Yes, when using slo-period other than 30d or 28d

Description:

Custom SLO period windows configuration in YAML format. This allows you to define custom alerting windows that override Sloth’s default alert window calculations.

Required when: Using a slo-period other than 30d or 28d, as Sloth only has built-in defaults for those two periods. The charm will go to a blocked state if you use a custom period without providing this configuration.

When provided, this configuration defines alert windows for four types of alerts (quick page, slow page, quick ticket, slow ticket). See SLOs and how sloth-k8s works for an explanation of what each type means and how to choose appropriate thresholds.

The YAML must follow the Sloth AlertWindows specification (apiVersion: sloth.slok.dev/v1, kind: AlertWindows). The charm validates the configuration against the AlertWindows spec to ensure all required fields are present and correctly formatted.

Validation:

The charm validates:

  • kind must be “AlertWindows”

  • apiVersion must be “sloth.slok.dev/v1”

  • sloPeriod must be a valid duration (e.g., “7d”, “30d”)

  • All time windows (shortWindow, longWindow) must use valid Prometheus duration format

  • errorBudgetPercent must be between 0 and 100

  • All required fields (page.quick, page.slow, ticket.quick, ticket.slow) must be present

Invalid configurations are logged as errors and ignored.

Configuration Parameters:

  • sloPeriod: Must match your slo-period config value (e.g., “7d”, “30d”)

  • errorBudgetPercent: Percentage of error budget consumed to trigger alert (0-100)

  • shortWindow: Shorter time window for detecting transient issues (e.g., “5m”, “30m”)

  • longWindow: Longer time window for overall trend (e.g., “1h”, “6h”)

Example (7-day SLO period):

juju config sloth-k8s slo-period-windows='
apiVersion: sloth.slok.dev/v1
kind: AlertWindows
spec:
  sloPeriod: 7d
  page:
    quick:
      errorBudgetPercent: 8
      shortWindow: 5m
      longWindow: 1h
    slow:
      errorBudgetPercent: 12.5
      shortWindow: 30m
      longWindow: 6h
  ticket:
    quick:
      errorBudgetPercent: 20
      shortWindow: 2h
      longWindow: 1d
    slow:
      errorBudgetPercent: 42
      shortWindow: 6h
      longWindow: 3d
'

Example (custom 30-day thresholds):

juju config sloth-k8s slo-period-windows='
apiVersion: sloth.slok.dev/v1
kind: AlertWindows
spec:
  sloPeriod: 30d
  page:
    quick:
      errorBudgetPercent: 2
      shortWindow: 5m
      longWindow: 1h
    slow:
      errorBudgetPercent: 5
      shortWindow: 30m
      longWindow: 6h
  ticket:
    quick:
      errorBudgetPercent: 10
      shortWindow: 2h
      longWindow: 1d
    slow:
      errorBudgetPercent: 10
      shortWindow: 6h
      longWindow: 3d
'

Notes:

  • The default 30d and 28d periods use Google’s SRE Workbook recommended parameters

  • Only configure custom windows if you need different alerting thresholds or are using non-standard SLO periods

  • Invalid YAML will be logged as an error and ignored

  • Changes to this configuration trigger rule regeneration

References: