Prometheus Alerting Rule for Redis Clients

post-thumb

Recently, a customer auto scaling farm got out of hand due to a misconfiguration on their end, which led to Redis maximum number of connections to be exceeded, leading to a lot of requests failing.

Curiously, they were way below the default maximum number of clients of 10000.

After some digging up, it turns out that each client needs one file descriptor (well, that would have been easy to guess…), so they failed when they reached the default nofile limit of 1024.

The Redis documentation states the following:

In Redis 2.6 this limit is dynamic: by default it is set to 10000 clients, unless otherwise stated by the `maxclients` directive in Redis.conf.

However, Redis checks with the kernel what is the maximum number of file descriptors that we are able to open (the _soft limit_ is checked). If the limit is smaller than the maximum number of clients we want to handle, plus 32 (that is the number of file descriptors Redis reserves for internal uses), then the number of maximum clients is modified by Redis to match the amount of clients we are _really able to handle_ under the current operating system limit.

After the configuration error was fixed, we needed to ensure that never happened again, so I came up with this Prometheus alert rule to prevent that, based on the excellent set of rules from Awesome Prometheus:

alert: RedisTooManyConnections
expr: (sum by (instance) (redis_connected_clients + 32)) > (sum by (instance)
  (process_max_fds{job="node"} * 0.8))
for: 2m
labels:
  severity: warning
annotations:
  description: |-
    Redis instance has too many connections
      VALUE = {{ $value }}
      LABELS: {{ $labels }}
  summary: Redis too many connections (instance {{ $labels.instance }})

This will alert if they reach 80% of the total available connections.