what’s meaning the container_cpu_cfs_throttled_seconds_total metrics

container_cpu_cfs_throttled_seconds_total is the sum of all throttle durations, i.e. durations that the container was throttled, i.e. stopped using the uses CFS Cgroup bandwidth control.

Since each stopped thread adds its throttled durations to container_cpu_cfs_throttled_seconds_total, this number can become huge and does not help you (unless you have a known, fixed number of threads).

That is why alerting on CPU throttling is usually based on the metrics throttled percentage := container_cpu_cfs_throttled_periods_total / container_cpu_cfs_periods_total, i.e. the percentage of CPU periods where the container ran but was throttled (stopped from running the whole CPU period).

For more detail, you can watch this talk on CFS and CPU scheduling, or read the corresponding article.

Leave a Comment