Magical Graphite

Or, « What I’ve learned about Graphite configuration ».

Last week, I worked on configuring Graphite and had to understand how it stores and aggregates data. So here are a few facts.

Graphite Retention

The way our data will be stored is described in /opt/graphite/conf/storage-schemas.conf. As an example:

[default]
 pattern = .*
 retentions = 1s:30m,1m:1d,5m:2y

This worked great when I was looking at data from the last 30 minutes.
If I was trying to display last hour metrics: nothing.
Drawing null as zero was giving me a horizontal line at the bottom of the graph.

The magic of aggregation

This behaviour comes from the file /opt/graphite/conf/storage-aggregation.conf where we find the following lines:

[99_default_avg]
 pattern = .*
 xFilesFactor = 0.5
 aggregationMethod = average

Our problem comes from xFilesFactor. It means that by default, we need at least 50% of the data to be non-null to store an average value. Think about it.

So here, I’m having a metric every second during 30 minutes. If Graphite doesn’t have something for a given second, the value is set to null. Fine, let’s move forward.
For interval higher than 30 minutes (and lower than a day), Graphite will gather data based on the aggregation configured. So it will average data and set the value null if it has less than 50% usable values (not null).

In our case, Graphite tries to average one minute of data (1m:1d) with the precision of 1s from the first retention rule (1s:30m). To understand why nothing is displayed, consider I’m Collectd is sending data to Graphite. On average, metrics are arriving every 3s. On a one minute interval, we gather 20 values but Graphite is considering 60 values, 40 being null. We only have 33% (0.33) metrics usable which is lower than 50% Graphite is waiting for so the averaged value is set to null.

The art of confusion

Now that we updated our configuration, set xFilesFactor to 0 to be sure, restart carbon-cache, everything should work fine…

But that’s not the case; no change.

In fact, previous configuration is still being used in wsp storage files. We can check it with whisper-info.py.

whisper-info.py /opt/graphite/storage/whisper/collectd/test-java01/cpu-0/cpu-user.wsp
 
 maxRetention: 63072000
 xFilesFactor: 0.5
 aggregationMethod: average
 fileSize: 2561812

Archive 0
 retention: 1800
 secondsPerPoint: 1
 points: 1800
 size: 21600
 offset: 52
Archive 1
 retention: 86400
 secondsPerPoint: 60
 points: 1440
 size: 17280
 offset: 21652
Archive 2
 retention: 63072000
 secondsPerPoint: 300
 points: 210240
 size: 2522880
 offset: 38932

See, we still have xFilesFactor: 0.5.
If you don’t care about previous data, a good solution is to delete files so that the new parameters will be used (rm -rf /opt/graphite/storage/whisper/collectd/). Maybe it’s a little bit overkill, (but easy and fast).

The other solution consists in using whisper-resize.py to enforce the new configuration.
whisper-resize.py /opt/graphite/storage/whisper/collectd/test-java01/cpu-0/cpu-user.wsp 3s:30m,1m:1d,5m:2y –xFilesFactor=0.1

The above works fine, but this is the other way to configure how many metrics Graphite can keep. It has the format n:i, which means we store a measure every n seconds and we want i points to be stored (computed with interval / n).

Example: 3s:30m
30m = 1800s
1800 / 3 = 600

3:600

So 3s:30m,1m:1d,5m:2y gives us 3:600 60:1440 300:210380.

« An average Gregorian year is 365.2425 days = 52.1775 weeks = 8765.82 hours = 525949.2 minutes = 31556952 seconds (mean solar, not SI). » Wikipedia

Note

Thing to remember concerning storage-schemas.conf (taken from Graphite doc):

« Changing this file will not affect already-created .wsp files. Use whisper-resize.py to change those. »

Anonyme

Auteur/autrice : Victor

Ingénieur en informatique de formation et de métier, j’administre ce serveur et son domaine et privilégie l'utilisation de logiciels libres au quotidien. Je construis progressivement mon "cloud" personnel service après service pour conserver un certain contrôle sur mes données numériques.

Une réflexion sur « Magical Graphite »

  1. It took me some time to figure out why this
    whisper-resize.py /opt/graphite/storage/whisper/collectd/test-java01/cpu-0/cpu-user.wsp 3s:30m,1m:1d,5m:2y –xFilesFactor=0.1
    doesn’t work for me when using multiple retentions (not just 3s:30m but 3s:30m,1m:1d,5m:2y).

    After looking in Python code I’ve got it! The correct version is the following:
    whisper-resize.py /opt/graphite/storage/whisper/collectd/test-java01/cpu-0/cpu-user.wsp 3s:30m 1m:1d 5m:2y –xFilesFactor=0.1

    Spaces instead of commas! So simple.

Laisser un commentaire

Votre adresse e-mail ne sera pas publiée. Les champs obligatoires sont indiqués avec *