Understanding Prometheus Rate Function

What it really does under the hood

3 min readJun 20, 2021

When we are building software, is really important to use a monitoring system to understand what is happening in our software in real time.

One of the most famous tools to monitoring, is prometheus. This tool allow us to literally monitoring everything that we want, but it has some definitions that are not totally clear at the moment to use them. One of them are the rate() function.

Definition

The rate() function is a way to measure the increment or decrement of a counter(typically) during a time of period.

Explanation

Let’s see an example. We can imagine that we have a metric that show us the accumulated(counter) time that a process has been working.

Here we can see the that accumulated(counter) time of the process is always going up or at least staying flat if the process is not running. But never goes down because is a counter. This type of metric is very common but not really useful itself.

So what we would actually want, is to be able to answer to some questions like:

¿Is my process running?
¿Is my process running all time at any moment? (like 100% CPU usage)

To do so, we are going to use the rate() function. So if our metric name is(eg):

total_ms_cpu_process

The way to use it should look like this(eg):

rate(total_ms_cpu_process[1m])

Let’s go by parts. The [1m] means that we are going to group all our points(according to the scrapper time that we set in prometheus) in a group of 1 minute. You can change this period to any value, but is recommended to use a value grater that 1m if you are scrapping every 15 seconds.

And from that group of points, we are going to apply the rate() function. This will calculate the average rate of increase of each group. So it would look like this:

And finally this numbers applied in a chart that looks like:

¿How can we read this chart?

Each value, let’s say X, means that in 1 minute the process was running X in each second of that minute. So an example would be:

In the first minute the process was running just 5.3 millisecond(average) in each second in that first minute(minute one).

So if we have a point with the value of 1000ms(1 seconds), that means that the process was running all the time without stopping in the whole minute. This implies that our process was working 100% and could be a possible alert that something is going wrong.

And this is the chart that we are looking for. A chart where we can easily answer some questions:

¿Is my process running? Yes, because if is not running, the value will stay flat with 0 value.
¿Is my process running all time at any moment(like 100% CPU usage)? No, because we don’t have a point with 1000 value.

Summary

Is really important to understand the definition because depend of the metric where we are applying the rate() function, it will have different meanings but always the same definition.

peace :)

Understanding Prometheus Rate Function

What it really does under the hood

Definition

Explanation

Summary

Written by Mopitz