Dec 012011
 
English: This is a logo for Upstart.

Image via Wikipedia

We are in December and for this month i suggest to keep an eye on the site: http://sysadvent.blogspot.com

This site it’s a sysadmin relative of the Perl Advent Calendar: One article for each day of December, ending on the 25th article. With the goals of of sharing, openness, and mentoring, the authors aim to provide great articles about systems administration topics written by fellow sysadmins.

Today i present to you one of their articles of the last year

This article was written by Jordan Sissel (@jordansissel)

In past sysadvents, I’ve talked about babysitting services and showed how to use supervisord to achieve it. This year, Ubuntu started shipping its release with a new init system called Upstart that has babysitting built in, so let’s talk about that. I’ll be doing all of these examples on Ubuntu 10.04, but any upstart-using system should work.



For me, the most important two features of Upstart are babysitting and events. Upstart supports the simple runner scripts that daemontools, supervisord, and other similar-class tools support. It also lets you configure jobs to respond to arbitrary events.

Diving in, let’s take a look the ssh server configuration Ubuntu ships for Upstart (I edited for clarity). This file lives as /etc/init/ssh.conf:

description     "OpenSSH server"

# Start when we get the 'filesystem' event, presumably once the file
# systems are mounted. Stop when shutting down.
start on filesystem
stop on runlevel S

expect fork
respawn
respawn limit 10 5
umask 022
oom never

exec /usr/sbin/sshd

Some points:

  • respawn – tells Upstart to restart it if sshd ever stops abnormally (which means every exit except for those caused by you telling it to stop).
  • oom never – Gives hints to the Out-Of-Memory killer. In this case, we say never kill this process. This is super useful as a built-in feature.
  • exec /usr/bin/sshd – no massive SysV init script, just one line saying what binary to run. Awesome!

Notice:

  • No poorly-written ‘status’ commands.
  • No poorly-written /bin/sh scripts
  • No confusing/misunderstood restart vs reload vs stop/start semantics.

The initctl(8) command is the main interface to upstart, but there are shorthand commands statusstopstart, and restart. Let’s query status:

% sudo initctl status ssh
ssh start/running, process 1141

# Or this works, too (/sbin/status is a symlink to /sbin/initctl):
% sudo status ssh 
ssh start/running, process 1141

# Stop the ssh server
% sudo initctl stop ssh
ssh stop/waiting

# And start it again
% sudo initctl start ssh 
ssh start/running, process 28919

Honestly, I’m less interested in how to be a user of upstart and more interested in running processes in upstart.

How about running nagios with upstart? Make /etc/init/nagios.conf:

description "Nagios"
start on filesystem
stop on runlevel S
respawn

# Run nagios
exec /usr/bin/nagios3 /etc/nagios3/nagios.cfg

Let’s start it:

% sudo initctl start nagios
nagios start/running, process 1207
% sudo initctl start nagios
initctl: Job is already running: nagios

Most importantly, if something goes wrong and nagios crashes or otherwise dies, it should restart, right? Let’s see:

% sudo initctl status nagios
nagios start/running, process 4825
% sudo kill 4825            
% sudo initctl status nagios
nagios start/running, process 4904

Excellent.

Events

Upstart supports simple messages. That is, you can create messages with ‘initctl emit [KEY=VALUE] …’ You can subscribe to an event in your config by specifying ‘start on …’ and same for ‘stop.’ A very simple example:

# /etc/init/helloworld.conf
start on helloworld
exec env | logger -t helloworld

Now send the ‘helloworld’ message, but also set some parameters in that message.

% sudo initctl emit helloworld foo=bar baz=fizz

And look at the logger results (writes to syslog)

2010-12-19T11:03:29.000+00:00 ops helloworld: UPSTART_INSTANCE=
2010-12-19T11:03:29.000+00:00 ops helloworld: foo=bar
2010-12-19T11:03:29.000+00:00 ops helloworld: baz=fizz
2010-12-19T11:03:29.000+00:00 ops helloworld: UPSTART_JOB=helloworld
2010-12-19T11:03:29.000+00:00 ops helloworld: TERM=linux
2010-12-19T11:03:29.000+00:00 ops helloworld: PATH=/usr/local/sbin:/usr/local/bin:/usr/bin:/usr/sbin:/sbin:/bin
2010-12-19T11:03:29.000+00:00 ops helloworld: UPSTART_EVENTS=helloworld
2010-12-19T11:03:29.000+00:00 ops helloworld: PWD=/

You can also conditionally accept events with key/value settings, too. See the init(5) manpage for more details.

Additionally, you can start jobs and pass parameters to the job withstart helloworld key1=value1 ...

Problems

Upstart has issues.

First: Debugging it sucks. Why is your pre-start script failing? There’s no built-in way to capture the output and log it. You’re best doing ‘exec 2> /var/log/upstart.${UPSTART_JOB}.log‘ or something similar. Your only option for capturing output otherwise is the ‘console‘ setting which lets you send output to /dev/console, but that’s not useful.

Second: The common ‘graceful restart’ idiom (test then restart) is hard to implement directly in Upstart. I tried one way, which is to in the ‘pre-start’ perform a config test, and on success, copy the file to a ‘good’ file and running on that, but that doesn’t work well for things like Nagios that can have many config files:

# Set two variables for easier maintainability:
env CONFIG_FILE=/etc/nagios3/nagios.cfg
env NAGIOS=/usr/sbin/nagios3

pre-start script
  if $NAGIOS -v $CONFIG_FILE ; then
    # Copy to '.test_ok'
    cp $CONFIG_FILE ${CONFIG_FILE}.test_ok
  else
    echo "Config check failed, using old config."
  fi
end script

# Use the verified 'test_ok' config
exec $NAGIOS $CONFIG_FILE.test_ok

The above solution kind of sucks. The right way to implement graceful restart , with upstart, is to implement the ‘test’ yourself and only callinitctl restart nagios on success – that is, keep it external to upstart.

Third: D-Bus (the message backend for Upstart) has very bad user documentation. The system seems to support access control, but I couldn’t find any docs on the subject. Upstart doesn’t seem to mention how, but you can see access control in action when you try to ‘start’ ssh as non-root:

initctl: Rejected send message, 1 matched rules; type="method_call",
sender=":1.328" (uid=1000 pid=29686 comm="initctl)
interface="com.ubuntu.Upstart0_6.Job" member="Start" error name="(unset)"
requested_reply=0 destination="com.ubuntu.Upstart" (uid=0 pid=1 comm="/sbin/init"))

So, there’s access control, but I’m not sure anyone knows how to use it.

Fourth: There’s no “died” or “exited” event to otherwise indicate that a process has exited unexpectedly, so you can’t have event-driven tasks that alert you if a process is flapping or to notify you otherwise that it died.

Fifth: Again on the debugging problem, there’s no way to watch events passing along to upstart. strace doesn’t help very much:

% sudo strace -s1500 -p 1 |& grep com.ubuntu.Upstart
# output edited for sanity, I ran 'sudo initctl start ssh'
read(10, "BEGIN ... binary mess ... /com/ubuntu/Upstart ... GetJobByName ...ssh\0", 2048) = 127
...

Lastly, the system feels like it was built for desktops: lack of ‘exited’ event, confusing or missing access control, stopped state likely being lost across reboots, no slow-starts or backoff, little/no output on failures, etc.

CONCLUSION

Anyway, despite some problems, Upstart seems like a promising solution to the problem of babysitting your daemons. If it has no other benefit, the best benefit is that it comes with Ubuntu 10.04 and beyond, by default, so if you’re an Ubuntu infrastructure, it’s worth learning.

Further reading:

Popular Posts:

Flattr this!

 Leave a Reply

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

(required)

(required)

*