GNU sort traps with field separators

GNU sort is an excellent utility that is a mainstay of the linux command line. It has all kinds of tricks up its sleeves, including support for uniquifying records, stable sorts, files larger than memory, parallelisation, controlled memory usage, etc. Go read the man page for all the gory details.

It also supports sorting with field separators, but unfortunately this support has some nasty traps for the unwary. Hence this post.

First, GNU sort cannot do general sorts of CSV-style datasets, because it doesn't understand CSV-features like quoting rules, quote-escaping, separator-escaping, etc. If you have very simple CSV files that don't do any escaping and you can avoid quotes altogether (or always use them), you might be able to use GNU sort - but it can get difficult fast.

Here I'm only interested in very simple delimited files - no quotes or escaping at all, Even here, though, there are some nasty traps to watch out for.

Here's a super-simple example file with just two lines and three fields, called dsort.csv:

$ cat dsort.csv

If we do a vanilla sort on this file, we get the following (I'm also running it through md5sum to highlight when the output changes):

$ sort dsort.csv | tee /dev/stderr | md5sum
5efd74fa9bef453dd477ec9acb2cef5f  -

The longer line sorts before the shorter line because the '+' sign collates before the second comma in the short line - this is sorting on the whole line, not on the individual fields.

Okay, so if I want do an individual field sort, I can just use the -t option, right? You would think so, but unfortunately:

$ sort -t, dsort.csv | tee /dev/stderr | md5sum
5efd74fa9bef453dd477ec9acb2cef5f  -

Huh? Why doesn't that work the short line first, like we'd expect? Maybe it's not sorting on all the fields or something? Do I need to explicitly include all fields? Let's see:

$ sort -t, -k1,3 dsort.csv | tee /dev/stderr | md5sum
5efd74fa9bef453dd477ec9acb2cef5f  -

Huh? What the heck is going on here?

It turns out this unintuitive behaviour is because of the way sort interprets the the -k option - -kM,N (where M != N) doesn't mean 'sort by field M, then field M+1,... then by field N', it means instead 'join all fields from M to N (with the field separator?), and sort by that'. Ugh!

So I just need to specify the fields individually? Unfortunately, even that's not enough:

$ sort -t, -k1 -k2 -k3 dsort.csv | tee /dev/stderr | md5sum
5efd74fa9bef453dd477ec9acb2cef5f  -

This is because the first option here - -k1 is interpreted as -k1,3 (since the last field is '3'), because the default 'end-field' is the last. Double-ugh!

So the takeaway is: if you want an individual-field sort you have to specify every field individually, AND you have to use -kN,N syntax, like so:

$ sort -t, -k1,1 -k2,2 -k3,3 dsort.csv | tee /dev/stderr | md5sum
493ce7ca60040fa184f1bf7db7758516  -

Yay, finally what we're after!

Also, unfortunately, there doesn't seem to be a generic way of specifying 'all fields' or 'up to the last field' or 'M-N' fields - you have to specify them all individually. It's verbose and ugly, but it works.

And for some good news, you can use sort suffixes on those individual options (like n for numerics, r for reverse sorts, etc.) just fine.

Happy sorting!

Symlink redirects in nginx

Solved an interesting problem this week using nginx.

We have an internal nginx webserver for distributing datasets with dated filenames, like foobar-20190213.tar.gz. We also create a symlink called foobar-latest.tar.gz, that is updated to point to the latest dataset each time a new version is released. This allows users to just use a fixed url to grab the latest release, rather than having to scrape the page to figure out which version is the latest.

Which generally works well. However, one wrinkle is that when you download via the symlink you end up with a file named with the symlink filename (foobar-latest.tar.gz), rather than a dated one. For some use cases this is fine, but for others you actually want to know what version of the dataset you are using.

What would be ideal would be a way to tell nginx to handle symlinks differently from other files. Specifically, if the requested file is a symlink, look up the file the symlink points to and issue a redirect to request that file. So you'd request foobar-latest.tar.gz, but you'd then be redirected to foobar-20190213.tar.gz instead. This gets you the best of both worlds - a fixed url to request, but a dated filename delivered. (If you don't need dated filenames, of course, you just save to a fixed name of your choice.)

Nginx doesn't support this functionality directly, but it turns out it's pretty easy to configure - at least as long as your symlinks are strictly local (i.e. your target and your symlink both live in the same directory), and as long as you have the nginx embedded perl module included in your nginx install (the one from RHEL/CentOS EPEL does, for instance.)

Here's how:

1. Add a couple of helper directives in the http context (that's outside/as a sibling to your server section):

# Setup a variable with the dirname of the current uri
# cf.
map $uri $uri_dirname {
  ~^(?<capture>.*)/ $capture;

# Use the embedded perl module to return (relative) symlink target filenames
# cf.
perl_set $symlink_target_rel '
  sub {
    my $r = shift;
    my $filename = $r->filename;
    return "" if ! -l $filename;
    my $target = readlink($filename);
    $target =~ s!^.*/!!;    # strip path (if any)
    return $target;

2. In a location section (or similar), just add a test on $symlink_target_rel and issue a redirect using the variables we defined previously:

location / {
  autoindex on;

  # Symlink redirects FTW!
  if ($symlink_target_rel != "") {
    # Note this assumes that your symlink and target are in the same directory
    return 301$uri_dirname/$symlink_target_rel;

Now when you make a request to a symlinked resource you get redirected instead to the target, but everything else is handled using the standard nginx pathways.

$ curl -i -X HEAD
HTTP/1.1 301 Moved Permanently
Server: nginx/1.12.2
Date: Wed, 13 Feb 2019 05:23:11 GMT

incron tips and traps

(Updated April 2020: added new no. 7 after being newly bitten...)

incron is a useful little cron-like utility that lets you run arbitrary jobs (like cron), but instead of being triggered at certain times, your jobs are triggered by changes to files or directories.

It uses the linux kernel inotify facility (hence the name), and so it isn't cross-platform, but on linux it can be really useful for monitoring file changes or uploads, reporting or forwarding based on status files, simple synchronisation schemes, etc.

Again like cron, incron supports the notion of job 'tables' where commands are configured, and users can have manage their own tables using an incrontab command, while root can manage multiple system tables.

So it's a really useful linux utility, but it's also fairly old (the last release, v0.5.10, is from 2012), doesn't appear to be under active development any more, and it has a few frustrating quirks that can make using it unnecessarily difficult.

So this post is intended to highlight a few of the 'gotchas' I've experienced using incron:

  1. You can't monitor recursively i.e. if you create a watch on a directory incron will only be triggered on events in that directory itself, not in any subdirectories below it. This isn't really an incron issue since it's a limitation of the underlying inotify mechanism, but it's definitely something you'll want to be aware of going in.

  2. The incron interface is enough like cron (incrontab -l, incrontab -e, man 5 incrontab, etc.) that you might think that all your nice crontab features are available. Unfortunately that's not the case - most significantly, you can't have comments in incron tables (incron will try and parse your comment lines and fail), and you can't set environment variables to be available for your commands. (This includes PATH, so you might need to explicitly set a PATH inside your incron scripts if you need non-standard locations. The default PATH is documented as /usr/local/bin:/usr/bin:/bin:/usr/X11R6/bin.)

  3. That means that cron MAILTO support is not available, and in general there's no easy way of getting access to the stdout or stderr of your jobs. You can't even use shell redirects in your command to capture the output (e.g. echo $@/$# >> /tmp/incron.log doesn't work). If you're debugging, the best you can do is add a layer of indirection by using a wrapper script that does the redirection you need (e.g. echo $1 2&>1 >> /tmp/incron.log) and calling the wrapper script in your incrontab with the incron arguments (e.g. $@/$#). This all makes debugging misbehaving commands pretty painful. The main place to check if your commands are running is the cron log (/var/log/cron) on RHEL/CentOS, and syslog (/var/log/syslog) on Ubuntu/Debian.

  4. incron is also very picky about whitespace in your incrontab. If you put more than one space (or a tab) between the inotify masks and your command, you'll get an error in your cron log saying cannot exec process: No such file or directory, because incron will have included everything after the first space as part of your command e.g. (gavin) CMD ( echo /home/gavin/tmp/foo) (note the evil space before the echo).

  5. It's often difficult (and non-intuitive) to figure out what inotify events you want to trigger on in your incrontab masks. For instance, does 'IN_CREATE' get fired when you replace an existing file with a new version? What events are fired when you do a mv or a cp? If you're wanting to trigger on an incoming remote file copy, should you use 'IN_CREATE' or 'IN_CLOSE_WRITE'? In general, you don't want to guess, you actually want to test and see what events actually get fired on the operations you're interested in. The easiest way to do this is use inotifywait from the inotify-tools package, and run it using inotifywait -m <dir>, which will report to you all the inotify events that get triggered on that directory (hit <Ctrl-C> to exit).

  6. The "If you're wanting to trigger on an incoming remote file copy, should you use 'IN_CREATE' or 'IN_CLOSE_WRITE'?" above was a trick question - it turns out it depends how you're doing the copy! If you're just going a simple copy in-place (e.g. with scp), then (assuming you want the completed file) you're going to want to trigger on 'IN_CLOSE_WRITE', since that's signalling all writing is complete and the full file will be available. If you're using a vanilla rsync, though, that's not going to work, as rsync does a clever write-to-a-hidden-file trick, and then moves the hidden file to the destination name atomically. So in that case you're going to want to trigger on 'IN_MOVED_TO', which will give you the destination filename once the rsync is completed. So again, make sure you test thoroughly before you deploy.

  7. Though cron works fine with symlinks to crontab files (in e.g. /etc/cron.d, incron doesn't support this in /etc/incron.d - symlinks just seem to be quietly ignored. (Maybe this is for security, but it's not documented, afaict.)

Have I missed any? Any other nasties bitten you using incron?

MongoDB Database Upgrades

I've been doing a few upgrades of old standalone (not replica set) mongodb databases lately, taking them from 2.6, 3.0, or 3.2 up to 3.6. Here's my upgrade process on RHEL/CentOS, which has been working pretty smoothly (cf. the mongodb notes here:

First, the WiredTiger storage engine (the default since mongodb 3.2) "strongly" recommends using the xfs filesystem on linux, rather than ext4 (see for details). So the first thing to do is reorganise your disk to make sure you have an xfs filesystem available to hold your upgraded database. If you have the disk space, this may be reasonably straightforward; if you don't, it's a serious PITA.

Once your filesystems are sorted, here's the upgrade procedure.

1. Take a full mongodump of all your databases

cd /data    # Any path with plenty of disk available
for $DB in db1 db2 db3; do
mongodump -d $DB -o mongodump-$DB-$(date +%Y%m%d)

2. Shut the current mongod down

systemctl stop mongod
# Save the current mongodb.conf for reference
mv /etc/mongodb.conf /etc/mongod.conf.rpmsave

3. Hide the current /var/lib/mongo directory to avoid getting confused later. Create your new mongo directory on the xfs filesystem you've prepared e.g. /opt.

cd /var/lib
mv mongo mongo-old
# Create new mongo directory on your (new?) xfs filesytem
mkdir /opt/mongo
chown mongod:mongod /opt/mongo

4. Upgrade to mongo v3.6

vi /etc/yum.repos.d/mongodb.repo

# Add the following section, and disable any other mongodb repos you might have
name=MongoDB 3.6 Repository

# Then do a yum update on the mongodb packages
yum update mongodb-org-{server,tools,shell}

5. Check/modify the new mongod.conf. See for all the details on the 3.6 config file options. In particular, dbPath should point to the new xfs-based mongo directory you created in (3) above.

vi /etc/mongod.conf

# Your 'storage' settings should look something like this:
  dbPath: /opt/mongo
    enabled: true
  engine: "wiredTiger"
      blockCompressor: snappy
      prefixCompression: true

6. Restart mongod

systemctl daemon-reload
systemctl enable mongod
systemctl start mongod
systemctl status mongod

7. If all looks good, reload the mongodump data from (1):

cd /data
for $DB in db1 db2 db3; do
mongorestore --drop mongodump-$DB-$(date +%Y%m%d)

All done!

These are the basics anyway. This doesn't cover configuring access control on your new database, or wrangling SELinux permissions on your database directory, but if you're doing those currently you should be able to figure those out.

Setting up a file transfer host

Had to setup a new file transfer host recently, with the following requirements:

  • individual login accounts required (for customers, no anonymous access)
  • support for (secure) downloads, ideally via a browser (no special software required)
  • support for (secure) uploads, ideally via sftp (most of our customers are familiar with ftp)

Our target was RHEL/CentOS 7, but this should transfer to other linuxes pretty readily.

Here's the schema we ended up settling on, which seems to give us a good mix of security and flexibility.

  • use apache with HTTPS and PAM with local accounts, one per customer, and nologin shell accounts
  • users have their own groups (group=$USER), and also belong to the sftp group
  • we use the users group for internal company accounts, but NOT for customers
  • customer data directories live in /data
  • we use a 3-layer hierarchy for security: /data/chroot_$USER/$USER are created with a nologin shell
  • the /data/chroot_$USER directory must be owned by root:$USER, with permissions 750, and is used for an sftp chroot directory (not writeable by the user)
  • the next-level /data/chroot_$USER/$USER directory should be owned by $USER:users, with permissions 2770 (where users is our internal company user group, so both the customer and our internal users can write here)
  • we also add an ACL to /data/chroot_$USER to allow the company-internal users group read/search access (but not write)

We just use openssh internal-sftp to provide sftp access, with the following config:

Subsystem sftp internal-sftp
Match Group sftp
  ChrootDirectory /data/chroot_%u
  X11Forwarding no
  AllowTcpForwarding no
  ForceCommand internal-sftp -d /%u

So we chroot sftp connections to /data/chroot_$USER and then (via the ForceCommand) chdir to /data/chroot_$USER/$USER, so they start off in the writeable part of their tree. (If they bother to pwd, they see that they're in /$USER, and they can chdir up a level, but there's nothing else there except their $USER directory, and they can't write to the chroot.)

Here's a slightly simplified version of the newuser script we use:

die() {
  echo $*
  exit 1

test -n "$1" || die "usage: $(basename $0) <username>"


# Create the user and home directories
mkdir -p /data/chroot_$USERNAME/$USERNAME
useradd --user-group -G sftp -d /data/chroot_$USERNAME/$USERNAME -s /sbin/nologin $USERNAME

# Set home directory permissions
chown root:$USERNAME /data/chroot_$USERNAME
chmod 750 /data/chroot_$USERNAME
setfacl -m group:users:rx /data/chroot_$USERNAME
chown $USERNAME:users /data/chroot_$USERNAME/$USERNAME
chmod 2770 /data/chroot_$USERNAME/$USERNAME

# Set user password manually
passwd $USERNAME

And we add an apache config file like the following to /etc/httpd/user.d:

<Directory /data/chroot_CUSTOMER/CUSTOMER>
Options +Indexes
Include "conf/auth.conf"
Require user CUSTOMER

(with CUSTOMER changed to the local username), and where conf/auth.conf has the authentication configuration against our local PAM users and allows internal company users access.

So far so good, but how do we restrict customers to their own /CUSTOMER tree?

That's pretty easy too - we just disallow customers from accessing our apache document root, and redirect them to a magic '/user' endpoint using an ErrorDocument 403 directive:

<Directory /var/www/html>
Options +Indexes +FollowSymLinks
Include "conf/auth.conf"
# Any user not in auth.conf, redirect to /user
ErrorDocument 403 "/user"

with /user defined as follows:

# Magic /user endpoint, redirecting to /$USERNAME
<Location /user>
Include "conf/auth.conf"
Require valid-user
RewriteEngine On
RewriteCond %{LA-U:REMOTE_USER} ^[a-z].*
RewriteRule ^\/(.*)$ /%{LA-U:REMOTE_USER}/ [R]

The combination of these two says that any valid user NOT in auth.conf should be redirected to their own /CUSTOMER endpoint, so each customer user lands there, and can't get anywhere else.

Works well, no additional software is required over vanilla apache and openssh, and it still feels relatively simple, while meeting our security requirements.

The iconv slurp misfeature

Since I got bitten by this recently, let me blog a quick warning here: glibc iconv - a utility for character set conversions, like iso8859-1 or windows-1252 to utf-8 - has a nasty misfeature/bug: if you give it data on stdin it will slurp the entire file into memory before it does a single character conversion.

Which is fine if you're running small input files. If you're trying to convert a 10G file on a VPS with 2G of RAM, however ... not so good!

This looks to be a known issue, with patches submitted to fix it in August 2015, but I'm not sure if they've been merged, or into which version of glibc. Certainly RHEL/CentOS 7 (with glibc 2.17) and Ubuntu 14.04 (with glibc 2.19) are both affected.

Once you know about the issue, it's easy enough to workaround - there's an iconv-chunks wrapper on github that breaks the input into chunks before feeding it to iconv, or you can do much the same thing using the lovely GNU parallel e.g.

gunzip -c monster.csv.gz | parallel --pipe -k iconv -f windows-1252 -t utf8

Nasty OOM avoided!

Checking for a Dell DRAC card on linux

Note to self: this seems to be the most reliable way of checking whether
a Dell machine has a DRAC card installed:
sudo ipmitool sdr elist mcloc

If there is, you'll see some kind of DRAC card:

iDRAC6           | 00h | ok  |  7.1 | Dynamic MC @ 20h

If there isn't, you'll see only a base management controller:

BMC              | 00h | ok  |  7.1 | Dynamic MC @ 20h

You need ipmi setup for this (if you haven't already):

# on RHEL/CentOS etc.
yum install OpenIPMI
service ipmi start

Extract - some devops yang

If you're a modern sysadmin you've probably been sipping at the devops koolaid and trying out one or more of the current system configuration management tools like puppet or chef.

These tools are awesome - particularly for homogenous large-scale deployments of identical nodes.

In practice in the enterprise, though, things get more messy. You can have legacy nodes that can't be puppetised due to their sensitivity and importance; or nodes that are sufficiently unusual that the payoff of putting them under configuration management doesn't justify the work; or just systems which you don't have full control over.

We've been using a simple tool called extract in these kinds of environments, which pulls a given set of files from remote hosts and stores them under version control in a set of local per-host trees.

You can think of it as the yang to puppet or chef's yin - instead of pushing configs onto remote nodes, it's about pulling configs off nodes, and storing them for tracking and change control.

We've been primarily using it in a RedHat/CentOS environment, so we use it in conjunction with rpm-find-changes, which identifies all the config files under /etc that have been changed from their deployment versions, or are custom files not belonging to a package.

Extract doesn't care where its list of files to extract comes from, so it should be easily customised for other environments.

It uses a simple extract.conf shell-variable-style config file, like this:

# Where extracted files are to be stored (in per-host trees)

# Hosts from which to extract (space separated)
EXTRACT_HOSTS=host1 host2 host3

# File containing list of files to extract (on the remote host, not locally)

Extract also allows arbitrary scripts to be called at the beginning (setup) and end (teardown) of a run, and before and/or after each host. Extract ships with some example shell scripts for loading ssh keys, and checking extracted changes into git or bzr. These hooks are also configured in the extract.conf config e.g.:

# Pre-process scripts
# PRE_EXTRACT_SETUP - run once only, before any extracts are done
# PRE_EXTRACT_HOST - run before each host extraction

# Post process scripts
# POST_EXTRACT_HOST - run after each host extraction
# POST_EXTRACT_TEARDOWN - run once only, after all extracts are completed

Extract is available on github, and packages for RHEL/CentOS 5 and 6 are available from my repository.

Feedback/pull requests always welcome.

OpenLDAP Tips and Tricks

Having spent too much of this week debugging problems around migrating ldap servers from RHEL5 to RHEL6, here are some miscellaneous notes to self:

  1. The service is named ldap on RHEL5, and slapd on RHEL6 e.g. you do service ldap start on RHEL5, but service slapd start on RHEL6

  2. On RHEL6, you want all of the following packages installed on your clients:

    yum install openldap-clients pam_ldap nss-pam-ldapd
  3. This seems to be the magic incantation that works for me (with real SSL certificates, though):

    authconfig --enableldap --enableldapauth \
      --ldapserver \
      --ldapbasedn="dc=example,dc=com" \
  4. Be aware that there are multiple ldap configuration files involved now. All of the following end up with ldap config entries in them and need to be checked:

    • /etc/openldap/ldap.conf
    • /etc/pam_ldap.conf
    • /etc/nslcd.conf
    • /etc/sssd/sssd.conf

    Note too that /etc/openldap/ldap.conf uses uppercased directives (e.g. URI) that get lowercased in the other files (URI -> uri). Additionally, some directives are confusingly renamed as well - e.g. TLA_CACERT in /etc/openldap/ldap.conf becomes tla_cacertfile in most of the others. :-(

  5. If you want to do SSL or TLS, you should know that the default behaviour is for ldap clients to verify certificates, and give misleading bind errors if they can't validate them. This means:

    • if you're using self-signed certificates, add TLS_REQCERT allow to /etc/openldap/ldap.conf on your clients, which means allow certificates the clients can't validate

    • if you're using CA-signed certificates, and want to verify them, add your CA PEM certificate to a directory of your choice (e.g. /etc/openldap/certs, or /etc/pki/tls/certs, for instance), and point to it using TLA_CACERT in /etc/openldap/ldap.conf, and tla_cacertfile in /etc/ldap.conf.

  6. RHEL6 uses a new-fangled /etc/openldap/slapd.d directory for the old /etc/openldap/slapd.conf config data, and the RHEL6 Migration Guide tells you to how to convert from one to the other. But if you simply rename the default slapd.d directory, slapd will use the old-style slapd.conf file quite happily, which is much easier to read/modify/debug, at least while you're getting things working.

  7. If you run into problems on the server, there are lots of helpful utilities included with the openldap-servers package. Check out the manpages for slaptest(8), slapcat(8), slapacl(8), slapadd(8), etc.

Further reading:


rpm-find-changes is a little script I wrote a while ago for rpm-based systems (RedHat, CentOS, Mandriva, etc.). It finds files in a filesystem tree that are not owned by any rpm package (orphans), or are modified from the version distributed with their rpm. In other words, any file that has been introduced or changed from it's distributed version.

It's intended to help identify candidates for backup, or just for tracking interesting changes. I run it nightly on /etc on most of my machines, producing a list of files that I copy off the machine (using another tool, which I'll blog about later) and store in a git repository.

I've also used it for tracking changes to critical configuration trees across multiple machines, to make sure everything is kept in sync, and to be able to track changes over time.

Available on github:

Hosttag - Tagging for Hosts

When you have more than a handful of hosts on your network, you need to start keeping track of what services are living where, what roles particular servers have, etc. This can be documentation-based (say on a wiki, or offline), or it can be implicit in a configuration management system. Old-school sysadmins often used dns TXT records for these kind of notes, on the basis that it was easy to look them up from the command line from anywhere.

I've been experimenting with the idea of using lightweight tags attached to hostnames for this kind of data, and it's been working really nicely. Hosttag is just a couple of ruby command line utilities, one (hosttag or ht) for doing tag or host lookups, and one (htset/htdel) for doing adds and deletes. Both are network based, so you can do lookups from wherever you are, rather than having to go to somewhere centralised.

Hosttag uses a redis server to store the hostname-tag and tag-hostname mappings as redis sets, which makes queries lightning fast, and setup straightforward.

So let's see it in action (rpms available in my yum repo):

# Installation - first install redis somewhere, and setup a 'hosttag'
# dns alias to the redis host (or use the `-s <server>` option in
# the examples that follow). e.g. on CentOS:
$ yum install redis rubygem-redis

# Install hosttag as an rpm package (from my yum repo).
# Also requires/installs the redis rubygem.
$ yum install hosttag
# gem version coming soon (gem install hosttag)

# Setup some test data (sudo is required for setting and deleting)
# Usage: htset --tag <host> <tag1> <tag2> <tag3> ...
$ sudo htset --tag server1 dns dell ldap server centos centos5 i386 syd
$ sudo htset --tag server2 dns dell ldap server debian debian6 x86_64 mel
$ sudo htset --tag server3 hp nfs server centos centos6 x86_64 syd
$ sudo htset --tag lappy laptop ubuntu maverick i386 syd

# Now run some queries
# Query by tag
$ ht dns
server1 server2
$ ht i386
lappy server1

# Query by host
$ ht server2
debian debian6 dell dns ldap mel server x86_64

# Multiple arguments
$ ht --or centos debian
server1 server2 server3
$ ht --and dns ldap
server1 server2

# All hosts
$ ht --all
lappy server1 server2 server3
# All tags
$ ht --all-tags
centos centos5 centos6 debian debian6 dell dns hp i386 laptop ldap \
maverick mel nfs server syd ubuntu x86_64

An obvious use case is to perform actions on multiple hosts using your ssh loop of choice e.g.

$ sshr $(ht centos) 'yum -y update'

Finally, a warning: hosttag doesn't have any security built in yet, so it should only be used on trusted networks.

Source code is on github - patches welcome :-).

Rebuild Inventory

Here's what I use to take a quick inventory of a machine before a rebuild, both to act as a reference during the rebuild itself, and in case something goes pear-shaped. The whole chunk after script up to exit is cut-and-pastable.

# as root, where you want your inventory file
script $(hostname).inventory
export PS1='\h:\w\$ '               # reset prompt to avoid ctrl chars
fdisk -l /dev/sd?                   # list partition tables
cat /proc/mdstat                    # list raid devices
pvs                                 # list lvm stuff
df -h                               # list mounts
ip addr                             # list network interfaces
ip route                            # list network routes
cat /etc/resolv.conf                # show resolv.conf

# Cleanup control characters in the inventory
perl -i -pe 's/\r//g; s/\033\]\d+;//g; s/\033\[\d+m//g; s/\007/\//g' \

# And then copy it somewhere else in case of problems ;-)
scp $(hostname).inventory somewhere:

Anything else useful I've missed?


Came across cronologger (blog post) recently (via Dean Wilson), which is a simple wrapper script you use around your cron(8) jobs, which captures any stdout and stderr output and logs it to a couchdb database, instead of the traditional behaviour of sending it to you as email.

It's a nice idea, particularly for jobs with important output where it would be nice to able to look back in time more easily than by trawling through a noisy inbox, or for sites with lots of cron jobs where the sheer volume is difficult to handle usefully as email.

Cronologger comes with a simple web interface for displaying your cron jobs, but so far it's pretty rudimentary. I quickly realised that this was another place (cf. blosxom4nagios) where blosxom could be used to provide a pretty useful gui with very little work.

Thus: cronologue.

cronologue(1) is the wrapper, written in perl, which logs job records and and stdout/stderr output via standard HTTP PUTs back to a designated apache server, as flat text files. Parameters can be used to control whether job records are always created, or only when there is output produced. There's also a --passthru mode in which stdout and stderr streams are still output, allowing both email and cronologue output to be produced.

On the server side a custom blosxom install is used to display the job records, which can be filtered by hostname or by date. There's also an RSS feed available.

Obligatory screenshot:

Cronologue GUI

Update: I should add that RPMs for CentOS5 (but which will probably work on most RPM-based distros) are available from my yum repository.

Exploring Riak

Been playing with Riak recently, which is one of the modern dynamo-derived nosql databases (the other main ones being Cassandra and Voldemort). We're evaluating it for use as a really large brackup datastore, the primary attraction being the near linear scalability available by adding (relatively cheap) new nodes to the cluster, and decent availability options in the face of node failures.

I've built riak packages for RHEL/CentOS 5, available at my repository, and added support for a riak 'target' to the latest version (1.10) of brackup (packages also available at my repo).

The first thing to figure out is the maximum number of nodes you expect your riak cluster to get to. This you use to size the ring_creation_size setting, which is the number of partitions the hash space is divided into. It must be a power of 2 (64, 128, 256, etc.), and the reason it's important is that it cannot be easily changed after the cluster has been created. The rule of thumb is that for performance you want at least 10 partitions per node/machine, so the default ring_creation_size of 64 is really only useful up to about 6 nodes. 128 scales to 10-12, 256 to 20-25, etc. For more info see the Riak Wiki.

Here's the script I use for configuring a new node on CentOS. The main things to tweak here are the ring_creation_size you want (here I'm using 512, for a biggish cluster), and the interface to use to get the default ip address (here eth0, or you could just hardcode instead of $ip).

# Riak configuration script for CentOS/RHEL

# Install riak (and IO::Interface, for next)
yum -y install riak perl-IO-Interface

# To set app.config:web_ip to use primary ip, do:
perl -MIO::Interface::Simple -i \
  -pe "BEGIN { \$ip = IO::Interface::Simple->new(q/eth0/)->address; }
      s/127\.0\.0\.1/\$ip/" /etc/riak/app.config

# To add a ring_creation_size clause to app.config, do:
perl -i \
  -pe 's/^((\s*)%% riak_web_ip)/$2%% ring_creation_size is the no. of partitions to divide the hash
$2%% space into (default: 64).
$2\{ring_creation_size, 512\},

$1/' /etc/riak/app.config

# To set riak vm_args:name to hostname do:
perl -MSys::Hostname -i -pe 's/127\.0\.0\.1/hostname/e' /etc/riak/vm.args

# display (bits of) config files for checking
echo '********************'
echo /etc/riak/app.config
echo '********************'
head -n30 /etc/riak/app.config
echo '********************'
echo /etc/riak/vm.args
echo '********************'
cat /etc/riak/vm.args

Save this to a file called e.g. riak_configure, and then to configure a couple of nodes you do the following (note that NODE is any old internal hostname you use to ssh to the host in question, but FIRST_NODE needs to use the actual -name parameter defined in /etc/riak/vm.args on your first node):

# First node
cat riak_configure | ssh $NODE sh
ssh $NODE 'chkconfig riak on; service riak start'
# Run the following until ringready reports TRUE
ssh $NODE riak-admin ringready

# All nodes after the first
cat riak_configure | ssh $NODE sh
ssh $NODE "chkconfig riak on; service riak start && riak-admin join $FIRST_NODE"
# Run the following until ringready reports TRUE
ssh $NODE riak-admin ringready

That's it. You should now have a working riak cluster accessible on port 8098 on your cluster nodes.

Remote Rebuild, CentOS-style

Problem: you've got a remote server that's significantly hosed, either through a screwup somewhere or a power outage that did nasty things to your root filesystem or something. You have no available remote hands, and/or no boot media anyway.

Preconditions: You have another server you can access on the same network segment, and remote access to the broken server, either through a DRAC or iLO type card, or through some kind of serial console server (like a Cyclades/Avocent box).

Solution: in extremis, you can do a remote rebuild. Here's the simplest recipe I've come up with. I'm rebuilding using centos5-x86_64 version 5.5; adjust as necessary.

Note: dnsmasq, mrepo and syslinux are not core CentOS packages, so you need to enable the rpmforge repository to follow this recipe. This just involves:

rpm -Uvh rpmforge-release-0.5.1-1.el5.rf.x86_64.rpm

1. On your working box (which you're now going to press into service as a build server), install and configure dnsmasq to provide dhcp and tftp services:

# Install dnsmasq
yum install dnsmasq

# Add the following lines to the bottom of your /etc/dnsmasq.conf file
# Note that we don't use the following ip address, but the directive
# itself is required for dnsmasq to turn dhcp functionality on
# Here use the broken server's mac addr, hostname, and ip address
# Point the centos5x tag at the tftpboot environment you're going to setup
# And enable tftp
tftp-root = /tftpboot

# Then start up dnsmasq
service dnsmasq start

2. Install and configure mrepo to provide your CentOS build environment:

# Install mrepo and syslinux
yum install mrepo syslinux

# Setup a minimal /etc/mrepo.conf e.g.
cat > /etc/mrepo.conf
srcdir = /var/mrepo
wwwdir = /var/www/mrepo
confdir = /etc/mrepo.conf.d
arch = x86_64
mailto =
smtp-server = localhost
pxelinux = /usr/lib/syslinux/pxelinux.0
tftpdir = /tftpboot

release = 5
arch = x86_64
metadata = repomd repoview
name = Centos-$release $arch
#iso = CentOS-$release.5-$arch-bin-DVD-?of2.iso
#iso = CentOS-$release.5-$arch-bin-?of8.iso
# (uncomment one of the iso lines above, either the DVD or the CD one)

# Download the set of DVD or CD ISOs for the CentOS version you want
# There are fewer DVD ISOs, but you need to use bittorrent to download
mkdir -p /var/mrepo/iso
cd /var/mrepo/iso

# Once your ISOs are available in /var/mrepo/iso, and the 'iso' line
# in /etc/mrepo.conf updated appropriately, run mrepo itself
mrepo -gvv

3. Finally, finish setting up your tftp environment. mrepo should have copied appropriate pxelinux.0, initrd.img, and vmlinuz files into your /tftpboot/centos5-x86_64 directory, so all you need to supply is an appropriate grub boot config:

cd /tftpboot/centos5-x86_64
mkdir -p pxelinux.cfg

# Setup a default grub config (adjust the serial/console and repo params as needed)
cat > pxelinux.cfg/default
default linux
serial 0,9600n8
label linux
  root (nd)
  kernel vmlinuz
  append initrd=initrd.img console=ttyS0,9600 repo=

Now get your server to do a PXE boot (via a boot option or the bios or whatever), and hopefully your broken server will find your dhcp/tftp environment and boot up in install mode, and away you go.

If you have problems with the boot, try checking your /var/log/messages file on the boot server for hints.


Following on from my IPMI explorations, here's the next chapter in my getting-down-and-dirty-with-dell-hardware-on-linux adventures. This time I'm setting up Dell's OpenManage Server Administrator software, primarily in order to explore being able to configure bios settings from within the OS. As before, I'm running CentOS 5, but OMSA supports any of RHEL4, RHEL5, SLES9, and SLES10, and various versions of Fedora Core and OpenSUSE.

Here's what I did to get up and running:

# Configure the Dell OMSA repository
wget -O
# Review the script to make sure you trust it, and then run it
# OR, for CentOS5/RHEL5 x86_64 you can just install the following:
rpm -Uvh\

# Install base version of OMSA, without gui (install srvadmin-all for more)
yum install srvadmin-base

# One of daemons requires /usr/bin/lockfile, so make sure you've got procmail installed
yum install procmail

# If you're running an x86_64 OS, there are a couple of additional 32-bit
#   libraries you need that aren't dependencies in the RPMs
yum install compat-libstdc++-33-3.2.3-61.i386 pam.i386

# Start OMSA daemons
for i in instsvcdrv dataeng dsm_om_shrsvc; do service $i start; done

# Finally, you can update your path by doing logout/login, or just run:
. /etc/profile.d/

Now to check whether you're actually functional you can try a few of the following (as root):

omconfig about
omreport about
omreport system -?
omreport chassis -?

omreport is the OMSA CLI reporting/query tool, and omconfig is the equivalent update tool. The main documentation for the current version of OMSA is here. I found the CLI User's Guide the most useful.

Here's a sampler of interesting things to try:

# Report system overview
omreport chassis

# Report system summary info (OS, CPUs, memory, PCIe slots, DRAC cards, NICs)
omreport system summary

# Report bios settings
omreport chassis biossetup

# Fan info
omreport chassis fans

# Temperature info
omreport chassis temps

# CPU info
omreport chassis processors

# Memory and memory slot info
omreport chassis memory

# Power supply info
omreport chassis pwrsupplies

# Detailed PCIe slot info
omreport chassis slots

# DRAC card info
omreport chassis remoteaccess

omconfig allows setting object attributes using a key=value syntax, which can get reasonably complex. See the CLI User's Guide above for details, but here are some examples of messing with various bios settings:

# See available attributes and settings
omconfig chassis biossetup -?

# Turn the AC Power Recovery setting to On
omconfig chassis biossetup attribute=acpwrrecovery setting=on

# Change the serial communications setting (on with serial redirection via)
omconfig chassis biossetup attribute=serialcom setting=com1
omconfig chassis biossetup attribute=serialcom setting=com2

# Change the external serial connector
omconfig chassis biossetup attribute=extserial setting=com1
omconfig chassis biossetup attribute=extserial setting=rad

# Change the Console Redirect After Boot (crab) setting
omconfig chassis biossetup attribute=crab setting=enabled
omconfig chassis biossetup attribute=crab setting=disabled

# Change NIC settings (turn on PXE on NIC1)
omconfig chassis biossetup attribute=nic1 setting=enabledwithpxe

Finally, there are some interesting formatting options available to both omreport, for use in scripting e.g.

# Custom delimiter format (default semicolon)
omreport chassis -fmt cdv

# XML format
omreport chassis -fmt xml

# To change the default cdv delimiter
omconfig preferences cdvformat -?
omconfig preferences cdvformat delimiter=pipe

Backup Regimes with Brackup

After using brackup for a while you find you have a big list of backups sitting on your server, and start to think about cleaning up some of the older ones. The standard brackup tool for this is brackup-target, and the prune and gc (garbage collection) subcommands.

Typical usage is something like this:

# List the backups for a particular target on the server e.g.
brackup-target $TARGET list-backups
Backup File                      Backup Date                      Size (B)
-----------                      -----------                      --------
images-1262106544                Thu 31 Dec 2009 03:32:49          1263128
images-1260632447                Sun 13 Dec 2009 08:19:13          1168281
images-1250042378                Wed 25 Nov 2009 06:25:06           977464
images-1239323644                Mon 09 Nov 2009 00:30:34           846523
images-1239577352                Thu 29 Oct 2009 13:03:02           846523

# Decide how many backups you want to keep, and prune (delete) the rest
brackup-target --keep-backups 15 $TARGET prune

# Prune just removes the brackup files on the server, so now you need to
# run a garbage collect to delete any 'chunks' that are now orphaned
brackup-target --interactive $TARGET gc

This simple scheme - "keep the last N backups" - works pretty nicely for backups you do relatively infrequently. If you do more frequent backups, however, you might find yourself wanting to be able to implement more sophisticated retention policies. Traditional backup regimes often involve policies like this:

  • keep the last 2 weeks of daily backups
  • keep the last 8 weekly backups
  • keep monthly backups forever

It's not necessarily obvious how to do something like this with brackup, but it's actually pretty straightforward. The trick is to define multiple 'sources' in your brackup.conf, one for each backup 'level' you want to use. For instance, to implement the regime above, you might define the following:

# Daily backups
path = /data/images

# Weekly backups
path = /data/images

# Monthly backups
path = /data/images

You'd then use the images-monthly source once a month, the images-weekly source once a week, and the images source the rest of the time. Your list of backups would then look something like this:

Backup File                      Backup Date                      Size (B)
-----------                      -----------                      --------
images-1234567899                Sat 05 Dec 2009 03:32:49          1263128
images-1234567898                Fri 04 Dec 2009 03:19:13          1168281
images-1234567897                Thu 03 Dec 2009 03:19:13          1168281
images-1234567896                Wed 02 Dec 2009 03:19:13          1168281
images-monthly-1234567895        Tue 01 Dec 2009 03:19:13          1168281
images-1234567894                Mon 30 Nov 2009 03:19:13          1168281
images-weekly-1234567893         Sun 29 Nov 2009 03:19:13          1168281
images-1234567892                Sat 28 Nov 2009 03:25:06           977464

And when you prune, you want to specify a --source argument, and specify separate --keep-backups settings for each level e.g. for the above:

# Keep 2 weeks worth of daily backups
brackup-target --source images --keep-backups 12 $TARGET prune

# Keep 8 weeks worth of weekly backups
brackup-target --source images-weekly --keep-backups 8 $TARGET prune

# Keep all monthly backups, so we don't prune them at all

# And then garbage collect as normal
brackup-target --interactive $TARGET gc

Anycast DNS

(Okay, brand new year - must be time to get back on the blogging wagon ...)

Linux Journal recently had a really good article by Philip Martin on Anycast DNS. It's well worth a read - I just want to point it out and record a cutdown version of how I've been setting it up recently.

As the super-quick intro, anycast is the idea of providing a network service at multiple points in a network, and then routing requests to the 'nearest' service provider for any particular client. There's a one-to-many relationship between an ip address and the hosts that are providing services on that address.

In the LJ article above, this means you provide a service on a /32 host address, and then use a(n) (interior) dynamic routing protocol to advertise that address to your internal routers. If you're a non-cisco linux shop, that means using quagga/ospf.

The classic anycast service is dns, since it's stateless and benefits from the high availability and low latency benefits of a distributed anycast service.

So here's my quick-and-dirty notes on setting up an anycast dns server on CentOS/RHEL using dnsmasq for dns, and quagga zebra/ospfd for the routing.

  1. First, setup your anycast ip address (e.g. on a random virtual loopback interface e.g. lo:0. On CentOS/RHEL, this means you want to setup a /etc/sysconfig/network-scripts/ifcfg-lo:0 file containing:

  2. Setup your dns server to listen to (at least) your anycast dns interface. With dnsmasq, I use an /etc/dnsmasq.conf config like:

  3. Use quagga's zebra/ospfd to advertise this host address to your internal routers. I use a completely vanilla zebra.conf, and an /etc/quagga/ospfd.conf config like:

    hostname myhost
    password mypassword
    log syslog
    router ospf
      ! Local segments (adjust for your network config and ospf areas)
      network area 0
      ! Anycast address redistribution
      redistribute connected metric-type 1
      distribute-list ANYCAST out connected
    access-list ANYCAST permit

That's it. Now (as root) start everything up:

ifup lo:0
for s in dnsmasq zebra ospfd; do
  service $s start
  chkconfig $s on
tail -50f /var/log/messages

And then check on your router that the anycast dns address is getting advertised and picked up by your router. If you're using cisco, you're probably know how to do that; if you're using linux and quagga, the useful vtysh commands are:

show ip ospf interface <interface>
show ip ospf neighbor
show ip ospf database
show ip ospf route
show ip route

Brackup Tips and Tricks

Further to my earlier post, I've spent a good chunk of time implementing brackup over the last few weeks, both at home for my personal backups, and at $work on some really large trees. There are a few gotchas along the way, so thought I'd document some of them here.

Active Filesystems

First, as soon as you start trying to brackup trees on any size you find that brackup aborts if it finds a file has changed between the time it initially walks the tree and when it comes to back it up. On an active filesystem this can happen pretty quickly.

This is arguably reasonable behaviour on brackup's part, but it gets annoying pretty fast. The cleanest solution is to use some kind of filesystem snapshot to ensure you're backing up a consistent view of your data and a quiescent filesystem.

I'm using linux and LVM, so I'm using LVM snapshots for this, using something like:


mkdir -p /${PART}_snap
lvcreate -L$SIZE --snapshot --permission r -n ${PART}_snap /dev/$VG/$PART && \
  mount -o ro /dev/$VG/${PART}_snap /${PART}_snap

which snapshots /dev/VolGroup00/export to /dev/VolGroup00/export_snap, and mounts the snapshot read-only on /export_snap.

The reverse, post-backup, is similar:


umount /${PART}_snap && \
  lvremove -f /dev/$VG/${PART}_snap

which unmounts the snapshot and then deletes it.

You can then do your backup using the /${PART}_snap tree instead of your original ${PART} one.

Brackup Digests

So snapshots works nicely. Next wrinkle is that by default brackup writes its digest cache file to the root of your source tree, which in this case is readonly. So you want to tell brackup to put that in the original tree, not the snapshot, which you do in the your ~/.brackup.conf file e.g.

path = /export_snap/home
digestdb_file = /exportb/home/.brackup-digest.db
ignore = \.brackup-digest.db$

I've also added an explicit ignore rule for these digest files here. You don't really need to back these up as they're just caches, and they can get pretty large. Brackup automatically skips the digestdb_file for you, but it doesn't skip any others you might have, if for instance you're backing up the same tree to multiple targets.

Synching Backups Between Targets

Another nice hack you can do with brackup is sync backups on filesystem-based targets (that is, Target::Filesystem, Target::Ftp, and Target::Sftp) between systems. For instance, I did my initial home directory backup of about 10GB onto my laptop, and then carried my laptop into where my server is located, and then rsync-ed the backup from my laptop to the server. Much faster than copying 10GB of data over an ADSL line!

Similarly, at $work I'm doing brackups onto a local backup server on the LAN, and then rsyncing the brackup tree to an offsite server for disaster recovery purposes.

There are a few gotchas when doing this, though. One is that Target::Filesystem backups default to using colons in their chunk file names on Unix-like filesystems (for backwards-compatibility reasons), while Target::Ftp and Target::Sftp ones don't. The safest thing to do is just to turn off colons altogether on Filesystem targets:

type = Filesystem
path = /export/brackup/nox/home
no_filename_colons = 1

Second, brackup uses a local inventory database to avoid some remote filesystem checks to improve performance, so that if you replicate a backup onto another target you also need to make a copy of the inventory database so that brackup knows which chunks are already on your new target.

The inventory database defaults to $HOME/.brackup-target-TARGETNAME.invdb (see perldoc Brackup::InventoryDatabase), so something like the following is usually sufficient:

cp $HOME/.brackup-target-OLDTARGET.invdb $HOME/.brackup-target-NEWTARGET.invdb

Third, if you want to do a restore using a brackup file (the SOURCE-DATE.brackup output file brackup produces) from a different target, you typically need to make a copy and then update the header portion for the target type and host/path details of your new target. Assuming you do that and your new target has all the same chunks, though, restores work just fine.

Fun with brackup

I've been playing around with Brad Fitzpatrick's brackup for the last couple of weeks. It's a backup tool that "slices, dices, encrypts, and sprays across the net" - notably to Amazon S3, but also to filesystems (local or networked), FTP servers, or SSH/SFTP servers.

I'm using it to backup my home directories and all my image and music files both to a linux server I have available in a data centre (via SFTP) and to Amazon S3.

brackup's a bit rough around the edges and could do with some better documentation and some optimisation, but it's pretty useful as it stands. Here are a few notes and tips from my playing so far, to save others a bit of time.

Version: as I write the latest version on CPAN is 1.06, but that's pretty old - you really want to use the current subversion trunk instead. Installation is the standard perl module incantation e.g.

# Checkout from svn or whatever
cd brackup
perl Makefile.PL
make test
sudo make install

Basic usage is as follows:

# First-time through (on linux, in my case):
mkdir brackup
cd brackup
Error: Your config file needs tweaking. I put a commented-out template at:

# Edit the vanilla .brackup.conf that was created for you.
# You want to setup at least one SOURCE and one TARGET section initially,
# and probably try something smallish i.e. not your 50GB music collection!
# The Filesystem target is probably the best one to try out first.
# See '`perldoc Brackup::Root`' and '`perldoc Brackup::Target`' for examples
$EDITOR ~/.brackup.conf

# Now run your first backup changing SOURCE and TARGET below to the names
# you used in your .brackup.conf file
brackup -v --from=SOURCE --to=TARGET

# You can also do a dry run to see what brackup's going to do (undocumented)
brackup -v --from=SOURCE --to=TARGET --dry-run

If all goes well you should get some fairly verbose output about all the files in your SOURCE tree that are being backed up for you, and finally a brackup output file (typically named SOURCE-DATE.brackup) should be written to your current directory. You'll need this brackup file to do your restores, but it's also stored on the target along with your backup, so you can also retrieve it from there (using brackup-target, below) if your local copy gets lost, or if you need to restore to somewhere else.

Restores reference that SOURCE-DATE.brackup file you just created:

# Create somewhere to restore to
mkdir -p /tmp/brackup-restore/full

# Restore the full tree you just backed up
brackup-restore -v --from=SOURCE-DATE.brackup --to=/tmp/brackup-restore/full --full

# Or restore just a subset of the tree
brackup-restore -v --from=SOURCE-DATE.brackup --to=/tmp/brackup-restore --just=DIR
brackup-restore -v --from=SOURCE-DATE.brackup --to=/tmp/brackup-restore --just=FILE

You can also use the brackup-target utility to query a target for the backups it has available, and do various kinds of cleanup:

# List the backups available on the given target
brackup-target TARGET list_backups

# Get the brackup output file for a specific backup (to restore)
brackup-target TARGET get_backup BACKUPFILE

# Delete a brackup file on the target
brackup-target TARGET delete_backup BACKUPFILE

# Prune the target to the most recent N backup files
brackup-target --keep-backups 15 TARGET prune

# Remove backup chunks no longer referenced by any backup file
brackup-target TARGET gc

That should be enough to get you up and running with brackup - I'll cover some additional tips and tricks in a subsequent post.