Saltstack highstate email alerts

The problem

Recently, we have begun to roll out Saltstack for our server infrastructure at my workplace. For those who aren’t familiar with Saltstack, it is a configuration management system in the same vein as Puppet, Chef, and Ansible. You describe to Saltstack how you want your servers to be configured, and Saltstack goes out and does it. Magic.

Given Saltstack’s god-like power over our servers, we need to be able to conveniently monitor its activity for failures and unexpected changes. At this early stage in our deployment, having simple email alerts when highstates fail or cause changes would suffice, but to my slight annoyance, Saltstack doesn’t have this capability out of the box.

To be fair, Saltstack does have a sophisticated system for handling returned state data known as returners. Returners allow you to ship Saltstack state data to other systems such as MySQL, Redis, Elasticsearch, etc, where the data can then be monitored for failures and changes by something else. This is definitely the proper way to handle Saltstack state data, but frankly I haven’t yet had time to set up a separate system to do all this. Having email alerts handled by Saltstack itself would be the most convenient approach at the moment.

But hold on. Saltstack already has an SMTP returner. Problem solved, right? Not quite, because this would send an email for all Saltstack state calls, not just the ones that fail or cause changes. Too noisy.

So what is an admin to do? Is it possible for Saltstack to send email alerts only when states fail or cause changes? It turns out that it is possible, thanks to a combination of Saltstack’s reactor and runner systems.

The solution

Saltstack’s reactor system allows you to monitor for various events and “react” to them. In this case, we’ll be listening for Saltstack job’s to complete (using the reactor system) and then have the salt-master examine the returned data for failures or changes and send an email if any are found (using the runner system).

First, we’ll need to add the following to the Saltstack master config (/etc/salt/master):

reactor:
  - 'salt/job/*/ret/*':
    - salt://reactor/email-on-failure.sls

runner_dirs:
  - /srv/salt/_runners

state_output: changes

The reactor section simply listens for any Saltstack minion to return data and then triggers the reactor/email-on-failure.sls state in your Saltstack state repository (/srv/salt). The runner_dirs section tells the Saltstack master where to find custom runners, which we’ll need later on. The state_output section tells Saltstack’s state output to only be verbose on failures or changes, which is important for readability.

This is what the reactor/email-on-failure.sls state file looks like:

email-on-failure:
  runner.process_minion_data.email_errors:
    - smtp_server: 127.0.0.1
    - fromaddr: from@example.com
    - toaddrs: to@example.com
    - subject: "Salt master: minion failure or changes: id: {{ data['id'] }} jid: {{ data['jid'] }}"
    - data_str: {{ data|yaml_dquote }}

This state calls a custom runner, passing in email information as well as the entire returned data set.

The source for the custom runner (which you can place in /srv/salt/_runners/process_minion_data.py) is below:

import subprocess
import salt.modules.smtp

def email_errors(fromaddr, toaddrs, subject, data_str, smtp_server):
   data = eval(data_str)
   error = False
   changes = False

   if type(data['return']) is dict:
      for state, result in data['return'].iteritems():
         if not result['result']:
            error = True
            break
         if result['changes']:
            changes = True
            break
   else:
      if not data['success']:
         error = True

   if error or changes:
      body = subprocess.check_output(["salt-run", "jobs.lookup_jid", data['jid']])
      salt.modules.smtp.send_msg(\
            toaddrs,\
            body,\
            subject=subject,\
            sender=fromaddr,\
            server=smtp_server,\
            use_ssl=True)

   return True

The gist of this runner is that it examines the returned data for failures or changes, and only sends an email if it finds any. The email is sent using Saltstack’s built in SMTP module. Note that the data that’s actually in the email is retrieved from a call to the jobs.lookup_jid runner, since that data is already formatted nicely for human consumption (it’s the same output format you would see if you ran a state on the command line).

Once all of the above is in place, you should be able to just restart the Saltstack master and you’ll get your nice email alerts. Hooray :D