Installing Cabot on Debian

Let’s say we want to install Cabot on a server, but not on AWS, nor on DigitalOcean. And because we like challenges, let’s just use Debian Wheezy 7.5 instead of the recommended Ubuntu 12.04 LTS. Ready?

We will try to follow Cabot quickstart to perform the installation.
But first, we must set up a few things. I’ve discovered that during installation, Cabot locks the password of the root account using passwd -l root. So be aware that you won’t be able to log with root and ssh if you haven’t set an authentication process using SSH keys. As The Hitchhiker’s Guide to the Galaxy would say: « Don’t Panic », you can reverse the process if you want using passwd -u root. As a security, you could always create another user with adduser mynewuser and give him an admin status by adding it to the suddoers file with visudo.

So let’s create our keys and configure ssh.
Generate the key:

ssh-keygen -t rsa

Now that we have a key named id_rsa.pub in our .ssh directory, we should do the following steps:

  1. Generate the key-file.
  2. Somehow get the key-file over to the right user-id on the right host.
  3. If that user doesn’t already have an « .ssh » directory, create one AND set its permissions to « 700. » (« rwx——« )
  4. If that user doesn’t already have an « .ssh/authorized_keys » file, create one AND set its permissions to « 600. » (« rw——-« )
  5. Append … don’t overwrite(!) … the new key to that file.

Or we can use ssh-copy-id:

 ssh-copy-id -i ~/.ssh/id_rsa.pub root@hostname.org

You should now be able to log in without having to type in a password.

Before going further, we need to install Fabric as it is used to provision the server and deploy Cabot.

pip install fabric

Fabric also requires some dependencies which are described on Fabric installation page.

Next step, clone the repository. I am using the deploy branch of lincolnloop fork which include awesome python packaging.

 git clone https://github.com/lincolnloop/cabot.git
 git checkout deploy
 cd cabot

Modify configuration: (might need to use development.env with the fork here)

 cp conf/production.env.example conf/production.env
 vim conf/production.env

Before doing anything, make sure you can or can’t install nodejs: aptitude search node. If nodejs can’t be found, you’ll need to install it manually.
A few changes before provisioning our server. Edit file bin/setup_dependencies.sh
and remove line 57 and 58:   ‘nodejs‘ and ‘npm’
Install nodejs manually, npm should come it.

fab provision -H root@your.server.hostname

Once it’s done, we can deploy cabot:

 fab deploy -H ubuntu@your.server.hostname

You should get an error because cabot is trying to use upstart but can’t find it.
To run Cabot, log in to your server under the user ubuntu.

 cd 2014-06-19-e662635

(Your directory will have a different name following a similar pattern year-month-day-wathever)

 foreman start

Congratulations, Cabot should now be running!

Cabot

Things to do

I will certainly add a simple init.d script to my Cabot fork so that we can run it easily. I will maybe change one or more things as the ubuntu username.

Understanding how to install and run Cabot took me a few hours. I got disturbed by the quickstart speaking of AWS or DigitalOcean server. I also tried to install it manually but didn’t manage to get it work, although I was close to it (I think. Or at least I hope ^^). Installing it on Debian added some difficulties, but nothing insurmountable. As a conclusion, I must say that Cabot is worth the effort. It provides a great way to monitor your service with Http checks and the possibility to alert based on Graphite metrics is just priceless.

Magical Graphite

Or, « What I’ve learned about Graphite configuration ».

Last week, I worked on configuring Graphite and had to understand how it stores and aggregates data. So here are a few facts.

Graphite Retention

The way our data will be stored is described in /opt/graphite/conf/storage-schemas.conf. As an example:

[default]
 pattern = .*
 retentions = 1s:30m,1m:1d,5m:2y

This worked great when I was looking at data from the last 30 minutes.
If I was trying to display last hour metrics: nothing.
Drawing null as zero was giving me a horizontal line at the bottom of the graph.

The magic of aggregation

This behaviour comes from the file /opt/graphite/conf/storage-aggregation.conf where we find the following lines:

[99_default_avg]
 pattern = .*
 xFilesFactor = 0.5
 aggregationMethod = average

Our problem comes from xFilesFactor. It means that by default, we need at least 50% of the data to be non-null to store an average value. Think about it.

So here, I’m having a metric every second during 30 minutes. If Graphite doesn’t have something for a given second, the value is set to null. Fine, let’s move forward.
For interval higher than 30 minutes (and lower than a day), Graphite will gather data based on the aggregation configured. So it will average data and set the value null if it has less than 50% usable values (not null).

In our case, Graphite tries to average one minute of data (1m:1d) with the precision of 1s from the first retention rule (1s:30m). To understand why nothing is displayed, consider I’m Collectd is sending data to Graphite. On average, metrics are arriving every 3s. On a one minute interval, we gather 20 values but Graphite is considering 60 values, 40 being null. We only have 33% (0.33) metrics usable which is lower than 50% Graphite is waiting for so the averaged value is set to null.

The art of confusion

Now that we updated our configuration, set xFilesFactor to 0 to be sure, restart carbon-cache, everything should work fine…

But that’s not the case; no change.

In fact, previous configuration is still being used in wsp storage files. We can check it with whisper-info.py.

whisper-info.py /opt/graphite/storage/whisper/collectd/test-java01/cpu-0/cpu-user.wsp
 
 maxRetention: 63072000
 xFilesFactor: 0.5
 aggregationMethod: average
 fileSize: 2561812

Archive 0
 retention: 1800
 secondsPerPoint: 1
 points: 1800
 size: 21600
 offset: 52
Archive 1
 retention: 86400
 secondsPerPoint: 60
 points: 1440
 size: 17280
 offset: 21652
Archive 2
 retention: 63072000
 secondsPerPoint: 300
 points: 210240
 size: 2522880
 offset: 38932

See, we still have xFilesFactor: 0.5.
If you don’t care about previous data, a good solution is to delete files so that the new parameters will be used (rm -rf /opt/graphite/storage/whisper/collectd/). Maybe it’s a little bit overkill, (but easy and fast).

The other solution consists in using whisper-resize.py to enforce the new configuration.
whisper-resize.py /opt/graphite/storage/whisper/collectd/test-java01/cpu-0/cpu-user.wsp 3s:30m,1m:1d,5m:2y –xFilesFactor=0.1

The above works fine, but this is the other way to configure how many metrics Graphite can keep. It has the format n:i, which means we store a measure every n seconds and we want i points to be stored (computed with interval / n).

Example: 3s:30m
30m = 1800s
1800 / 3 = 600

3:600

So 3s:30m,1m:1d,5m:2y gives us 3:600 60:1440 300:210380.

« An average Gregorian year is 365.2425 days = 52.1775 weeks = 8765.82 hours = 525949.2 minutes = 31556952 seconds (mean solar, not SI). » Wikipedia

Note

Thing to remember concerning storage-schemas.conf (taken from Graphite doc):

« Changing this file will not affect already-created .wsp files. Use whisper-resize.py to change those. »

Déterminer l’emplacement des sources de Django

Voici comment déterminer l’emplacement des sources de Django :

$ python
>>>import django
>>> django
<module 'django' from '/usr/lib/python2.7/dist-packages/django/__init__.pyc'>

Le chemin vers Django est donc

/usr/lib/python2.7/dist-packages/django/

Autre possibilité plus rapide :

python -c "
import sys
sys.path = sys.path[1:]
import django
print(django.__path__)"

Qui nous donne

['/usr/lib/python2.7/dist-packages/django']

Graphite: Vérifier la réception de données

Une commande bien utile pour vérifier ce qui transite sur une interface réseau pour un port donné. Par exemple, pour vérifier l’envoi de paquet depuis Riemann vers Graphite en local (interface lo et port 2003):

ngrep port 2003 -d lo

Qui nous donne le résultat suivant:

interface: lo (127.0.0.0/255.0.0.0)
filter: (ip or ip6) and ( port 2003 )
#
T 127.0.0.1:33936 -> 127.0.0.1:2003 [AP]
 test 1.4 1401805746. 
##
T 127.0.0.1:33935 -> 127.0.0.1:2003 [AP]
 test 1.0 1401805747.

Les messages arrivent jusqu’à Graphite et utilisent le bon format, donc pas de problème du côté de l’envoi.

[POC] Puppet master-agent avec Docker

Petit Proof of Concept réalisé la semaine dernière autour des technologies Puppet et Docker. L’idée consiste à utiliser Docker pour pouvoir facilement déployer un environnement master-slave Puppet et avoir ainsi la possibilité de tester ses scripts de déploiement ou plus généralement, d’étudier et de comprendre le fonctionnement de Puppet avant de passer à une utilisation en production.

Le projet permet donc la création de deux containers Docker, l’un contenant le master et l’autre l’agent. Après création du container master, on peut créer l’agent et lier les deux containers à l’aide du paramètre -link de Docker. On permet ainsi aux deux containers de communiquer entre eux sans avoir à essayer de déterminer leur IP respective à la main. Par la suite, après signature du certificat de l’agent par le master, la configuration décrite dans le fichier site.pp sous le nœud agent pourra être mise en place sur celui-ci.

node  'agent' {
  class { 'apache': }
}

Par exemple, on installe ici apache dans le container de l’agent.

Mes différents tests m’ont également permis d’identifier certains limitations au niveau de Docker:

  • Impossible de modifier le fichier /etc/hosts au sein d’un container.
  • Impossible de modifier les paramètres ulimit dans un container.

Le code est bien sûr disponible disponible sous licence libre sur Github: vvision/docker-puppet-master-agent.