Hackery

GNU sort traps with field separators

Sat 06 Nov 2021

GNU sort is an excellent utility that is a mainstay of the linux command line. It has all kinds of tricks up its sleeves, including support for uniquifying records, stable sorts, files larger than memory, parallelisation, controlled memory usage, etc. Go read the man page for all the gory details.

It also supports sorting with field separators, but unfortunately this support has some nasty traps for the unwary. Hence this post.

First, GNU sort cannot do general sorts of CSV-style datasets, because it doesn't understand CSV-features like quoting rules, quote-escaping, separator-escaping, etc. If you have very simple CSV files that don't do any escaping and you can avoid quotes altogether (or always use them), you might be able to use GNU sort - but it can get difficult fast.

Here I'm only interested in very simple delimited files - no quotes or escaping at all, Even here, though, there are some nasty traps to watch out for.

Here's a super-simple example file with just two lines and three fields, called dsort.csv:

$ cat dsort.csv
a,b,c
a,b+c,c

If we do a vanilla sort on this file, we get the following (I'm also running it through md5sum to highlight when the output changes):

$ sort dsort.csv | tee /dev/stderr | md5sum
a,b+c,c
a,b,c
5efd74fa9bef453dd477ec9acb2cef5f  -

The longer line sorts before the shorter line because the '+' sign collates before the second comma in the short line - this is sorting on the whole line, not on the individual fields.

Okay, so if I want do an individual field sort, I can just use the -t option, right? You would think so, but unfortunately:

$ sort -t, dsort.csv | tee /dev/stderr | md5sum
a,b+c,c
a,b,c
5efd74fa9bef453dd477ec9acb2cef5f  -

Huh? Why doesn't that work the short line first, like we'd expect? Maybe it's not sorting on all the fields or something? Do I need to explicitly include all fields? Let's see:

$ sort -t, -k1,3 dsort.csv | tee /dev/stderr | md5sum
a,b+c,c
a,b,c
5efd74fa9bef453dd477ec9acb2cef5f  -

Huh? What the heck is going on here?

It turns out this unintuitive behaviour is because of the way sort interprets the the -k option - -kM,N (where M != N) doesn't mean 'sort by field M, then field M+1,... then by field N', it means instead 'join all fields from M to N (with the field separator?), and sort by that'. Ugh!

So I just need to specify the fields individually? Unfortunately, even that's not enough:

$ sort -t, -k1 -k2 -k3 dsort.csv | tee /dev/stderr | md5sum
a,b+c,c
a,b,c
5efd74fa9bef453dd477ec9acb2cef5f  -

This is because the first option here - -k1 is interpreted as -k1,3 (since the last field is '3'), because the default 'end-field' is the last. Double-ugh!

So the takeaway is: if you want an individual-field sort you have to specify every field individually, AND you have to use -kN,N syntax, like so:

$ sort -t, -k1,1 -k2,2 -k3,3 dsort.csv | tee /dev/stderr | md5sum
a,b,c
a,b+c,c
493ce7ca60040fa184f1bf7db7758516  -

Yay, finally what we're after!

Also, unfortunately, there doesn't seem to be a generic way of specifying 'all fields' or 'up to the last field' or 'M-N' fields - you have to specify them all individually. It's verbose and ugly, but it works.

And for some good news, you can use sort suffixes on those individual options (like n for numerics, r for reverse sorts, etc.) just fine.

Happy sorting!

View Comments

Simple dual upstream gateways in CentOS, revisited

Sun 05 May 2019

Tags: linux, centos, networking

Recently had to setup a few servers that needed dual upstream gateways, and used an ancient blog post I wrote 11 years ago (!) to get it all working. This time around I hit a gotcha that I hadn't noted in that post, and used a simpler method to define the rules, so this is an updated version of that post.

Situation: you have two upstream gateways (gw1 and gw2) on separate interfaces and subnets on your linux server. Your default route is via gw1 (so all outward traffic, and most incoming traffic goes via that), but you want to be able to use gw2 as an alternative ingress pathway, so that packets that have come in on gw2 go back out that interface.

(Everything below done as root, so sudo -i first if you need to.)

1) First, define a few variables to make things easier to modify/understand:

# The device/interface on the `gw2` subnet
GW2_DEV=eth1
# The ip address of our `gw2` router
GW2_ROUTER_ADDR=172.16.2.254
# Our local ip address on the `gw2` subnet i.e. $GW2_DEV's address
GW2_LOCAL_ADDR=172.16.2.10

2) The gotcha I hit was that 'strict reverse-path filtering' in the kernel will drop all asymmetrically routed entirely, which will kill our response traffic. So the first thing to do is make sure that is either turned off or set to 'loose' instead of 'strict':

# Check the rp_filter setting for $GW2_DEV
# A value of '0' means rp_filtering is off, '1' means 'strict', and '2' means 'loose'
$ cat /proc/sys/net/ipv4/conf/$GW2_DEV/rp_filter
1
# For our purposes values of either '0' or '2' will work. '2' is slightly
# more conservative, so we'll go with that.
echo 2 > /proc/sys/net/ipv4/conf/$GW2_DEV/rp_filter
$ cat /proc/sys/net/ipv4/conf/$GW2_DEV/rp_filter
2

3) Define an extra routing table called gw2 e.g.

$ cat /etc/iproute2/rt_tables
#
# reserved values
#
255     local
254     main
253     default
0       unspec
#
# local tables
#
102     gw2
#

4) Add a default route via gw2 (here 172.16.2.254) to the gw2 routing table:

$ echo "default table gw2 via $GW2_ROUTER_ADDR" > /etc/sysconfig/network-scripts/route-${GW2_DEV}
$ cat /etc/sysconfig/network-scripts/route-${GW2_DEV}
default table gw2 via 172.16.2.254

5) Add an iproute 'rule' saying that packets that come in on our $GW2_LOCAL_ADDR should use routing table gw2:

$ echo "from $GW2_LOCAL_ADDR table gw2" > /etc/sysconfig/network-scripts/rule-${GW2_DEV}
$ cat /etc/sysconfig/network-scripts/rule-${GW2_DEV}
from 172.16.2.10 table gw2

6) Take $GW2_DEV down and back up again, and test:

$ ifdown $GW2_DEV
$ ifup $GW2_DEV

# Test that incoming traffic works as expected e.g. on an external server
$ ssh -v server-via-gw2

For more, see:

View Comments

incron tips and traps

Fri 31 Aug 2018

Tags: linux, sysadmin

(Updated April 2020: added new no. 7 after being newly bitten...)

incron is a useful little cron-like utility that lets you run arbitrary jobs (like cron), but instead of being triggered at certain times, your jobs are triggered by changes to files or directories.

It uses the linux kernel inotify facility (hence the name), and so it isn't cross-platform, but on linux it can be really useful for monitoring file changes or uploads, reporting or forwarding based on status files, simple synchronisation schemes, etc.

Again like cron, incron supports the notion of job 'tables' where commands are configured, and users can have manage their own tables using an incrontab command, while root can manage multiple system tables.

So it's a really useful linux utility, but it's also fairly old (the last release, v0.5.10, is from 2012), doesn't appear to be under active development any more, and it has a few frustrating quirks that can make using it unnecessarily difficult.

So this post is intended to highlight a few of the 'gotchas' I've experienced using incron:

You can't monitor recursively i.e. if you create a watch on a directory incron will only be triggered on events in that directory itself, not in any subdirectories below it. This isn't really an incron issue since it's a limitation of the underlying inotify mechanism, but it's definitely something you'll want to be aware of going in.
The incron interface is enough like cron (incrontab -l, incrontab -e, man 5 incrontab, etc.) that you might think that all your nice crontab features are available. Unfortunately that's not the case - most significantly, you can't have comments in incron tables (incron will try and parse your comment lines and fail), and you can't set environment variables to be available for your commands. (This includes PATH, so you might need to explicitly set a PATH inside your incron scripts if you need non-standard locations. The default PATH is documented as /usr/local/bin:/usr/bin:/bin:/usr/X11R6/bin.)
That means that cron MAILTO support is not available, and in general there's no easy way of getting access to the stdout or stderr of your jobs. You can't even use shell redirects in your command to capture the output (e.g. echo $@/$# >> /tmp/incron.log doesn't work). If you're debugging, the best you can do is add a layer of indirection by using a wrapper script that does the redirection you need (e.g. echo $1 2&>1 >> /tmp/incron.log) and calling the wrapper script in your incrontab with the incron arguments (e.g. debug.sh $@/$#). This all makes debugging misbehaving commands pretty painful. The main place to check if your commands are running is the cron log (/var/log/cron) on RHEL/CentOS, and syslog (/var/log/syslog) on Ubuntu/Debian.
incron is also very picky about whitespace in your incrontab. If you put more than one space (or a tab) between the inotify masks and your command, you'll get an error in your cron log saying cannot exec process: No such file or directory, because incron will have included everything after the first space as part of your command e.g. (gavin) CMD ( echo /home/gavin/tmp/foo) (note the evil space before the echo).
It's often difficult (and non-intuitive) to figure out what inotify events you want to trigger on in your incrontab masks. For instance, does 'IN_CREATE' get fired when you replace an existing file with a new version? What events are fired when you do a mv or a cp? If you're wanting to trigger on an incoming remote file copy, should you use 'IN_CREATE' or 'IN_CLOSE_WRITE'? In general, you don't want to guess, you actually want to test and see what events actually get fired on the operations you're interested in. The easiest way to do this is use inotifywait from the inotify-tools package, and run it using inotifywait -m <dir>, which will report to you all the inotify events that get triggered on that directory (hit <Ctrl-C> to exit).
The "If you're wanting to trigger on an incoming remote file copy, should you use 'IN_CREATE' or 'IN_CLOSE_WRITE'?" above was a trick question - it turns out it depends how you're doing the copy! If you're just going a simple copy in-place (e.g. with scp), then (assuming you want the completed file) you're going to want to trigger on 'IN_CLOSE_WRITE', since that's signalling all writing is complete and the full file will be available. If you're using a vanilla rsync, though, that's not going to work, as rsync does a clever write-to-a-hidden-file trick, and then moves the hidden file to the destination name atomically. So in that case you're going to want to trigger on 'IN_MOVED_TO', which will give you the destination filename once the rsync is completed. So again, make sure you test thoroughly before you deploy.
Though cron works fine with symlinks to crontab files (in e.g. /etc/cron.d, incron doesn't support this in /etc/incron.d - symlinks just seem to be quietly ignored. (Maybe this is for security, but it's not documented, afaict.)

Have I missed any? Any other nasties bitten you using incron?

View Comments

Linux Software RAID Mirror In-Place Upgrade

Mon 27 Nov 2017

Tags: linux, mdadm, lvm

Ran out of space on an old CentOS 6.8 server in the weekend, and so had to upgrade the main data mirror from a pair of Hitachi 2TB HDDs to a pair of 4TB WD Reds I had lying around.

The volume was using mdadm, aka Linux Software RAID, and is a simple mirror (RAID1), with LVM volumes on top of the mirror. The safest upgrade path is to build a new mirror on the new disks and sync the data across, but there weren't any free SATA ports on the motherboard, so instead I opted to do an in-place upgrade. I haven't done this for a good few years, and hit a couple of wrinkles on the way, so here are the notes from the journey.

Below, the physical disk partitions are /dev/sdb1 and /dev/sdd1, the mirror is /dev/md1, and the LVM volume group is extra.

1. Backup your data (or check you have known good rock-solid backups in place), because this is a complex process with plenty that could go wrong. You want an exit strategy.

2. Break the mirror, failing and removing one of the old disks

mdadm --manage /dev/md1 --fail /dev/sdb1
mdadm --manage /dev/md1 --remove /dev/sdb1

3. Shutdown the server, remove the disk you've just failed, and insert your replacement. Boot up again.

4. Since these are brand new disks, we need to partition them. And since these are 4TB disks, we need to use parted rather than the older fdisk.

parted /dev/sdb
print
mklabel gl
# Create a partition, skipping the 1st MB at beginning and end
mkpart primary 1 -1
unit s
print
# Not sure if this flag is required, but whatever
set 1 raid on
quit

5. Then add the new partition back into the mirror. Although this is much bigger, it will just sync up at the old size, which is what we want for now.

mdadm --manage /dev/md1 --add /dev/sdb1
# This will take a few hours to resync, so let's keep an eye on progress
watch -n5 cat /proc/mdstat

6. Once all resynched, rinse and repeat with the other disk - fail and remove /dev/sdd1, shutdown and swap the new disk in, boot up again, partition the new disk, and add the new partition into the mirror.

7. Once all resynched again, you'll be back where you started - a nice stable mirror of your original size, but with shiny new hardware underneath. Now we can grow the mirror to take advantage of all this new space we've got.

mdadm --grow /dev/md1 --size=max
mdadm: component size of /dev/md0 has been set to 2147479552K

Ooops! That size doesn't look right, that's 2TB, but these are 4TB disks?! Turns out there's a 2TB limit on mdadm metadata version 0.90, which this mirror is using, as documented on https://raid.wiki.kernel.org/index.php/RAID_superblock_formats#The_version-0.90_Superblock_Format.

mdadm --detail /dev/md1
/dev/md1:
        Version : 0.90
  Creation Time : Thu Aug 26 21:03:47 2010
    Raid Level : raid1
    Array Size : 2147483648 (2048.00 GiB 2199.02 GB)
  Used Dev Size : -1
  Raid Devices : 2
  Total Devices : 2
Preferred Minor : 1
    Persistence : Superblock is persistent

    Update Time : Mon Nov 27 11:49:44 2017
          State : clean 
Active Devices : 2
Working Devices : 2
Failed Devices : 0
  Spare Devices : 0

          UUID : f76c75fb:7506bc25:dab805d9:e8e5d879
        Events : 0.1438

    Number   Major   Minor   RaidDevice State
      0       8       17        0      active sync   /dev/sdb1
      1       8       49        1      active sync   /dev/sdd1

Unfortunately, mdadm doesn't support upgrading the metadata version. But there is a workaround documented on that wiki page, so let's try that:

mdadm --detail /dev/md1
# (as above)
# Stop/remove the mirror
mdadm --stop /dev/md1
mdadm: Cannot get exclusive access to /dev/md1:Perhaps a running process, mounted filesystem or active volume group?
# Okay, deactivate our volume group first
vgchange --activate n extra
# Retry stop
mdadm --stop /dev/md1
mdadm: stopped /dev/md1
# Recreate the mirror with 1.0 metadata (you can't go to 1.1 or 1.2, because they're located differently)
# Note that you should specify all your parameters in case the defaults have changed
mdadm --create /dev/md1 -l1 -n2 --metadata=1.0 --assume-clean --size=2147483648 /dev/sdb1 /dev/sdd1

That outputs:

mdadm: /dev/sdb1 appears to be part of a raid array:
      level=raid1 devices=2 ctime=Thu Aug 26 21:03:47 2010
mdadm: /dev/sdd1 appears to be part of a raid array:
      level=raid1 devices=2 ctime=Thu Aug 26 21:03:47 2010
mdadm: largest drive (/dev/sdb1) exceeds size (2147483648K) by more than 1%
Continue creating array? y
mdadm: array /dev/md1 started.

Success! Now let's reactivate that volume group again:

vgchange --activate y extra
3 logical volume(s) in volume group "extra" now active

Another wrinkle is that recreating the mirror will have changed the array UUID, so we need to update the old UUID in /etc/mdadm.conf:

# Double-check metadata version, and record volume UUID
mdadm --detail /dev/md1
# Update the /dev/md1 entry UUID in /etc/mdadm.conf
$EDITOR /etc/mdadm.conf

So now, let's try that mdadm --grow command again:

mdadm --grow /dev/md1 --size=max
mdadm: component size of /dev/md1 has been set to 3907016564K
# Much better! This will take a while to synch up again now:
watch -n5 cat /proc/mdstat

8. (You can wait for this to finish resynching first, but it's optional.) Now we need to let LVM know that the physical volume underneath it has changed size:

# Check our starting point
pvdisplay /dev/mda1
--- Physical volume ---
PV Name               /dev/md1
VG Name               extra
PV Size               1.82 TiB / not usable 14.50 MiB
Allocatable           yes 
PE Size               64.00 MiB
Total PE              29808
Free PE               1072
Allocated PE          28736
PV UUID               mzLeMW-USCr-WmkC-552k-FqNk-96N0-bPh8ip
# Resize the LVM physical volume
pvresize /dev/md1
Read-only locking type set. Write locks are prohibited.
Can't get lock for system
Cannot process volume group system
Read-only locking type set. Write locks are prohibited.
Can't get lock for extra
Cannot process volume group extra
Read-only locking type set. Write locks are prohibited.
Can't get lock for #orphans_lvm1
Cannot process standalone physical volumes
Read-only locking type set. Write locks are prohibited.
Can't get lock for #orphans_pool
Cannot process standalone physical volumes
Read-only locking type set. Write locks are prohibited.
Can't get lock for #orphans_lvm2
Cannot process standalone physical volumes
Read-only locking type set. Write locks are prohibited.
Can't get lock for system
Cannot process volume group system
Read-only locking type set. Write locks are prohibited.
Can't get lock for extra
Cannot process volume group extra
Read-only locking type set. Write locks are prohibited.
Can't get lock for #orphans_lvm1
Cannot process standalone physical volumes
Read-only locking type set. Write locks are prohibited.
Can't get lock for #orphans_pool
Cannot process standalone physical volumes
Read-only locking type set. Write locks are prohibited.
Can't get lock for #orphans_lvm2
Cannot process standalone physical volumes
Failed to find physical volume "/dev/md1".
0 physical volume(s) resized / 0 physical volume(s) not resized

Oops - that doesn't look good. But it turns out it's just a weird locking type default. If we tell pvresize it can use local filesystem write locks we should be good (cf. /etc/lvm/lvm.conf):

# Let's try that again...
pvresize --config 'global {locking_type=1}' /dev/md1
Physical volume "/dev/md1" changed
1 physical volume(s) resized / 0 physical volume(s) not resized
# Double-check the PV Size
pvdisplay /dev/mda1
--- Physical volume ---
PV Name               /dev/md1
VG Name               extra
PV Size               3.64 TiB / not usable 21.68 MiB
Allocatable           yes 
PE Size               64.00 MiB
Total PE              59616
Free PE               30880
Allocated PE          28736
PV UUID               mzLeMW-USCr-WmkC-552k-FqNk-96N0-bPh8ip

Success!

Finally, you can now resize your logical volumes using lvresize as you usually would.

View Comments

The iconv slurp misfeature

Fri 08 Jul 2016

Tags: linux, sysadmin

Since I got bitten by this recently, let me blog a quick warning here: glibc iconv - a utility for character set conversions, like iso8859-1 or windows-1252 to utf-8 - has a nasty misfeature/bug: if you give it data on stdin it will slurp the entire file into memory before it does a single character conversion.

Which is fine if you're running small input files. If you're trying to convert a 10G file on a VPS with 2G of RAM, however ... not so good!

This looks to be a known issue, with patches submitted to fix it in August 2015, but I'm not sure if they've been merged, or into which version of glibc. Certainly RHEL/CentOS 7 (with glibc 2.17) and Ubuntu 14.04 (with glibc 2.19) are both affected.

Once you know about the issue, it's easy enough to workaround - there's an iconv-chunks wrapper on github that breaks the input into chunks before feeding it to iconv, or you can do much the same thing using the lovely GNU parallel e.g.

gunzip -c monster.csv.gz | parallel --pipe -k iconv -f windows-1252 -t utf8

Nasty OOM avoided!

View Comments

ThinkPad X250 on CentOS 7

Tue 06 Oct 2015

Tags: hardware, linux, rhel, centos

Wow, almost a year since the last post. Definitely time to reboot the blog.

Got to replace my aging ThinkPad X201 with a lovely shiny new ThinkPad X250 over the weekend. Specs are:

CPU: Intel(R) Core(TM) i5-5300U CPU @ 2.30GHz
RAM: 16GB PC3-12800 DDR3L SDRAM 1600MHz SODIMM
Disk: 256GB SSD (swapped out for existing Samsung SSD)
Display: 12.5" 1920x1080 IPS display, 400nit, non-touch
Graphics: Intel Graphics 5500
Wireless: Intel 7265 AC/B/G/N Dual Band Wireless and Bluetooth 4.0
Batteries: 1 3-cell internal, 1 6-cell hot-swappable

A very nice piece of kit!

Just wanted to document what works and what doesn't (so far) on my standard OS, CentOS 7, with RH kernel 3.10.0-229.11.1. I had to install the following additional packages:

iwl7265-firmware (for wireless support)
acpid (for the media buttons)

Working so far:

media buttons (Fn + F1/mute, F2/softer, F3/louder - see below for configuration)
wifi button (Fn + F8 - worked out of the box)
keyboard backlight (Fn + space, out of the box)
sleep/resume (out of the box)
touchpad hard buttons (see below)
touchpad soft buttons (out of the box)

Not working / unconfigured so far:

brightness buttons (Fn + F5/F6)
fingerprint reader (supposedly works with fprintd)

Not working / no ACPI codes:

mute microphone button (Fn + F4)
application buttons (Fn + F9-F12)

Uncertain/not tested yet:

switch video mode (Fn + F7)

To get the touchpad working I needed to use the "evdev" driver rather than the "Synaptics" one - I added the following as /etc/X11/xorg.conf.d/90-evdev.conf:

Section "InputClass"
  Identifier "Touchpad/TrackPoint"
  MatchProduct "PS/2 Synaptics TouchPad"
  MatchDriver "evdev"
  Option "EmulateWheel" "1"
  Option "EmulateWheelButton" "2"
  Option "Emulate3Buttons" "0"
  Option "XAxisMapping" "6 7"
  Option "YAxisMapping" "4 5"
EndSection

This gives me 3 working hard buttons above the touchpad, including middle-mouse- button for paste.

To get fonts scaling properly I needed to add a monitor section as /etc/X11/xorg.conf.d/50-monitor.conf, specifically for the DisplaySize:

Section "Monitor"
  Identifier    "Monitor0"
  VendorName    "Lenovo ThinkPad"
  ModelName     "X250"
  DisplaySize   276 155
  Option        "DPMS"
EndSection

and also set the dpi properly in my ~/.Xdefaults:

*.dpi: 177

This fixes font size nicely in Firefox/Chrome and terminals for me.

I also found my mouse movement was too slow, which I fixed with:

xinput set-prop 11 "Device Accel Constant Deceleration" 0.7

(which I put in my old-school ~/.xsession file).

Finally, getting the media keys involved installing acpid and setting up the appropriate magic in 3 files in /etc/acpid/events:

# /etc/acpi/events/volumedown
event=button/volumedown
action=/etc/acpi/actions/volume.sh down

# /etc/acpi/events/volumeup
event=button/volumeup
action=/etc/acpi/actions/volume.sh up

# /etc/acpi/events/volumemute
event=button/mute
action=/etc/acpi/actions/volume.sh mute

Those files capture the ACPI events and handle them via a custom script in /etc/acpi/actions/volume.sh, which uses amixer from alsa-utils. Volume control worked just fine, but muting was a real pain to get working correctly due to what seems like a bug in amixer - amixer -c1 sset Master playback toggle doesn't toggle correctly - it mutes fine, but then doesn't unmute all the channels it mutes!

I worked around it by figuring out the specific channels that sset Master was muting, and then handling them individually, but it's definitely not as clean:

#!/bin/sh
#
# /etc/acpi/actions/volume.sh (must be executable)
#

PATH=/usr/bin

die() {
  echo $*
  exit 1
}
usage() {
  die "usage: $(basename $0) up|down|mute"
}

test -n "$1" || usage

ACTION=$1
shift

case $ACTION in
  up)
    amixer -q -c1 -M sset Master 5%+ unmute
    ;;
  down)
    amixer -q -c1 -M sset Master 5%- unmute
    ;;
  mute)
    # Ideally the next command should work, but it doesn't unmute correctly
#   amixer -q -c1 sset Master playback toggle

    # Manual version for ThinkPad X250 channels
    # If adapting for another machine, 'amixer -C$DEV contents' is your friend (NOT 'scontents'!)
    SOUND_IS_OFF=$(amixer -c1 cget iface=MIXER,name='Master Playback Switch' | grep 'values=off')
    if [ -n "$SOUND_IS_OFF" ]; then
      amixer -q -c1 cset iface=MIXER,name='Master Playback Switch'    on
      amixer -q -c1 cset iface=MIXER,name='Headphone Playback Switch' on
      amixer -q -c1 cset iface=MIXER,name='Speaker Playback Switch'   on
    else
      amixer -q -c1 cset iface=MIXER,name='Master Playback Switch'    off
      amixer -q -c1 cset iface=MIXER,name='Headphone Playback Switch' off
      amixer -q -c1 cset iface=MIXER,name='Speaker Playback Switch'   off
    fi
    ;;
  *)
    usage
    ;;
esac

So in short, really pleased with the X250 so far - the screen is lovely, battery life seems great, I'm enjoying the keyboard, and most things have Just Worked or have been pretty easily configurable with CentOS. Happy camper!

References:

View Comments

Fujitsu ScanSnap 1300i on RHEL/CentOS

Sun 21 Sep 2014

Tags: hardware, linux, scanning, rhel, centos, usb

Just picked up a shiny new Fujitsu ScanSnap 1300i ADF scanner to get more serious about less paper.

Fujitsu ScanSnap 1300i

I chose the 1300i on the basis of the nice small form factor, and that SANE reports it having 'good' support with current SANE backends. I'd also been able to find success stories of other linux users getting the similar S1300 working okay:

Here's my experience getting the S1300i up and running on CentOS 6.

I had the following packages already installed on my CentOS 6 workstation, so I didn't need to install any new software:

sane-backends
sane-backends-libs
sane-frontends
xsane
gscan2pdf (from rpmforge)
gocr (from rpmforge)
tesseract (from my repo)

I plugged the S1300i in (via the dual USB cables instead of the power supply - nice!), turned it on (by opening the top cover) and then ran sudo sane-find-scanner. All good:

found USB scanner (vendor=0x04c5 [FUJITSU], product=0x128d [ScanSnap S1300i]) at libusb:001:013
# Your USB scanner was (probably) detected. It may or may not be supported by
# SANE. Try scanimage -L and read the backend's manpage.

Ran sudo scanimage -L - no scanner found.

I downloaded the S1300 firmware Luuk had provided in his post and installed it into /usr/share/sane/epjitsu, and then updated /etc/sane.d/epjitsu.conf to reference it:

# Fujitsu S1300i
firmware /usr/share/sane/epjitsu/1300_0C26.nal
usb 0x04c5 0x128d

Ran sudo scanimage -L - still no scanner found. Hmmm.

Rebooted into windows, downloaded the Fujitsu ScanSnap Manager package and installed it. Grubbed around in C:/Windows and found the following 4 firmware packages:

Copied the firmware onto another box, and rebooted back into linux. Copied the 4 new firmware files into /usr/share/sane/epjitsu and updated /etc/sane.d/epjitsu.conf to try the 1300i firmware:

# Fujitsu S1300i
firmware /usr/share/sane/epjitsu/1300i_0D12.nal
usb 0x04c5 0x128d

Close and re-open the S1300i (i.e. restart, just to be sure), and retried sudo scanimage -L. And lo, this time the scanner whirrs briefly and ... victory!

$ sudo scanimage -L
device 'epjitsu:libusb:001:014' is a FUJITSU ScanSnap S1300i scanner

I start gscan2pdf to try some scanning goodness. Eeerk: "No devices found". Hmmm. How about sudo gscan2pdf? Ahah, success - "FUJITSU ScanSnap S1300i" shows up in the Device dropdown.

I exit, and google how to deal with the permissions problem. Looks like the usb device gets created by udev as root:root 0664, and I need 'rw' permissions for scanning:

$ ls -l /dev/bus/usb/001/014
crw-rw-r--. 1 root root 189, 13 Sep 20 20:50 /dev/bus/usb/001/014

The fix is to add a scanner group and local udev rule to use that group when creating the device path:

# Add a scanner group (analagous to the existing lp, cdrom, tape, dialout groups)
$ sudo groupadd -r scanner
# Add myself to the scanner group
$ sudo useradd -aG scanner gavin
# Add a udev local rule for the S1300i
$ sudo vim /etc/udev/rules.d/99-local.rules
# Added:
# Fujitsu ScanSnap S1300i
ATTRS{idVendor}=="04c5", ATTRS{idProduct}=="128d", MODE="0664", GROUP="scanner", ENV{libsane_matched}="yes"

Then logout and log back in to pickup the change in groups, and close and re-open the S1300i. If all is well, I'm now in the scanner group and can control the scanner sans sudo:

# Check I'm in the scanner group now
$ id
uid=900(gavin) gid=100(users) groups=100(users),10(wheel),487(scanner)
# Check I can scanimage without sudo
$ scanimage -L
device 'epjitsu:libusb:001:016' is a FUJITSU ScanSnap S1300i scanner
# Confirm the permissions on the udev path (adjusted to match the new libusb path)
$ ls -l /dev/bus/usb/001/016
crw-rw-r--. 1 root scanner 189, 15 Sep 20 21:30 /dev/bus/usb/001/016
# Success!

Try gscan2pdf again, and this time it works fine without sudo!

And so far gscan2pdf 1.2.5 seems to work pretty nicely. It handles both simplex and duplex scans, and both the cleanup phase (using unpaper) and the OCR phase (with either gocr or tesseract) work without problems. tesseract seems to perform markedly better than gocr so far, as seems pretty typical.

So thus far I'm a pretty happy purchaser. On to a paperless searchable future!

View Comments

Checking for a Dell DRAC card on linux

Mon 20 May 2013

Tags: dell, drac, linux, sysadmin

Note to self: this seems to be the most reliable way of checking whether
a Dell machine has a DRAC card installed:

sudo ipmitool sdr elist mcloc

If there is, you'll see some kind of DRAC card:

iDRAC6           | 00h | ok  |  7.1 | Dynamic MC @ 20h

If there isn't, you'll see only a base management controller:

BMC              | 00h | ok  |  7.1 | Dynamic MC @ 20h

You need ipmi setup for this (if you haven't already):

# on RHEL/CentOS etc.
yum install OpenIPMI
service ipmi start

View Comments

Yum Error Performing Checksum

Tue 08 May 2012

Tags: centos, rhel, yum

Ok, this has bitten me enough times now that I'm going to blog it so I don't forget it again.

Symptom: you're doing a yum update on a centos5 or rhel5 box, using rpms from a repository on a centos6 or rhel6 server (or anywhere else with a more modern createrepo available), and you get errors like this:

http://example.com/repodata/filelists.sqlite.bz2: [Errno -3] Error performing checksum
http://example.com/repodata/primary.sqlite.bz2: [Errno -3] Error performing checksum

What this really means that yum is too stupid to calculate the sha256 checksum correctly (and also too stupid to give you a sensible error message like "Sorry, primary.sqlite.bz2 is using a sha256 checksum, but I don't know how to calculate that").

The fix is simple:

yum install python-hashlib

from either rpmforge or epel, which makes the necessary libraries available for yum to calculate the new checksums correctly. Sorted.

View Comments

A Low Tech Pub-Sub Pattern

Tue 13 Mar 2012

I've been enjoying playing around with ZeroMQ lately, and exploring some of the ways it changes the way you approach system architecture.

One of the revelations for me has been how powerful the pub-sub (Publish- Subscribe) pattern is. An architecture that makes it straightforward for multiple consumers to process a given piece of data promotes lots of small simple consumers, each performing a single task, instead of a complex monolithic processor.

This is both simpler and more complex, since you end up with more pieces, but each piece is radically simpler. It's also more flexible and more scalable, since you can move components around individually, and it allows greater language and library flexibility, since you can write individual components in completely different languages.

What's also interesting is that the benefits of this pattern don't necessarily require an advanced toolkit like ZeroMQ, particularly for low-volume applications. Here's a sketch of a low-tech pub-sub pattern that uses files as the pub-sub inflection point, and incron, the 'inotify cron' daemon, as our dispatcher.

Recipe:

Install incron, the inotify cron daemon, to monitor our data directory for changes. On RHEL/CentOS this is available from the rpmforge or EPEL repositories: yum install incron.
Capture data to files in our data directory in some useful format e.g. json, yaml, text, whatever.
Setup an incrontab entry for each consumer monitoring CREATE operations on our data directory e.g.
```
/data/directory IN_CREATE /path/to/consumer1 $@/$#
/data/directory IN_CREATE /path/to/consumer2 $@/$#
/data/directory IN_CREATE /path/to/consumer3 $@/$#
```
The $@/$# magic passes the full file path to your consumer - see man 5 incrontab for details and further options.

Done. Working pub-sub with minimal moving parts.

View Comments

AoE on RHEL/CentOS

Fri 13 Jan 2012

Tags: aoe, coraid, centos, rhel

Updated 2012-07-24: packages updated to aoe-80 and aoetools-34, respectively.

I'm a big fan of Coraid and their relatively low-cost storage units. I've been using them for 5+ years now, and they've always been pretty well engineered, reliable, and performant.

They talk ATA-over-Ethernet (AoE), which is a very simple non-routable protocol for transmitting ATA commands directly via Ethernet frames, without the overhead of higher level layers like IP and TCP. So they're a lighter protocol than something like iSCSI, and so theoretically higher performance.

One issue with them on linux is that the in-kernel 'aoe' driver is typically pretty old. Coraid's latest aoe driver is version 78, for instance, while the RHEL6 kernel (2.6.32) comes with aoe v47, and the RHEL5 kernel (2.6.18) comes with aoe v22. So updating to the latest version is highly recommended, but also a bit of a pain, because if you do it manually it has to be recompiled for each new kernel update.

The modern way to handle this is to use a kernel-ABI tracking kmod, which gives you a driver that will work across multiple kernel updates for a given EL generation, without having to recompile each time.

So I've created a kmod-aoe package that seems to work nicely here. It's downloadable below, or you can install it from my yum repository. The kmod depends on the 'aoetools' package, which supplies the command line utilities for managing your AoE devices.

kmod-aoe (v80):

SRPM: http://www.openfusion.com.au/labs/srpms/aoe-kmod-80-1.of.el6.src.rpm
CentOS5-x86_64 RPM: http://www.openfusion.com.au/mrepo/centos5-x86_64/RPMS.of/kmod-aoe-80-1.of.el5.x86_64.rpm
CentOS6-x86_64 RPM: http://www.openfusion.com.au/mrepo/centos6-x86_64/RPMS.of/kmod-aoe-80-1.of.el6.x86_64.rpm

aoetools (v34):

SRPM: http://www.openfusion.com.au/labs/srpms/aoetools-34-1.of.el6.src.rpm
CentOS5-i386 RPM: http://www.openfusion.com.au/mrepo/centos5-i386/RPMS.of/aoetools-34-1.of.el5.i386.rpm
CentOS5-x86_64 RPM: http://www.openfusion.com.au/mrepo/centos5-x86_64/RPMS.of/aoetools-34-1.of.el5.x86_64.rpm
CentOS6-i386 RPM: http://www.openfusion.com.au/mrepo/centos6-i386/RPMS.of/aoetools-34-1.of.el6.i686.rpm
CentOS6-x86_64 RPM: http://www.openfusion.com.au/mrepo/centos6-x86_64/RPMS.of/aoetools-34-1.of.el6.x86_64.rpm

There's an init script in the aoetools package that loads the kernel module, activates any configured LVM volume groups, and mounts any filesystems. All configuration is done via /etc/sysconfig/aoe.

View Comments

OpenLDAP Tips and Tricks

Fri 28 Oct 2011

Tags: ldap, openldap, rhel, centos, linux, sysadmin

Having spent too much of this week debugging problems around migrating ldap servers from RHEL5 to RHEL6, here are some miscellaneous notes to self:

The service is named ldap on RHEL5, and slapd on RHEL6 e.g. you do service ldap start on RHEL5, but service slapd start on RHEL6
On RHEL6, you want all of the following packages installed on your clients:
```
yum install openldap-clients pam_ldap nss-pam-ldapd
```

This seems to be the magic incantation that works for me (with real SSL certificates, though):

authconfig --enableldap --enableldapauth \
  --ldapserver ldap.example.com \
  --ldapbasedn="dc=example,dc=com" \
  --update

Be aware that there are multiple ldap configuration files involved now. All of the following end up with ldap config entries in them and need to be checked:
- /etc/openldap/ldap.conf
- /etc/pam_ldap.conf
- /etc/nslcd.conf
- /etc/sssd/sssd.conf
Note too that /etc/openldap/ldap.conf uses uppercased directives (e.g. URI) that get lowercased in the other files (URI -> uri). Additionally, some directives are confusingly renamed as well - e.g. TLA_CACERT in /etc/openldap/ldap.conf becomes tla_cacertfile in most of the others. :-(
If you want to do SSL or TLS, you should know that the default behaviour is for ldap clients to verify certificates, and give misleading bind errors if they can't validate them. This means:
- if you're using self-signed certificates, add TLS_REQCERT allow to /etc/openldap/ldap.conf on your clients, which means allow certificates the clients can't validate
- if you're using CA-signed certificates, and want to verify them, add your CA PEM certificate to a directory of your choice (e.g. /etc/openldap/certs, or /etc/pki/tls/certs, for instance), and point to it using TLA_CACERT in /etc/openldap/ldap.conf, and tla_cacertfile in /etc/ldap.conf.
RHEL6 uses a new-fangled /etc/openldap/slapd.d directory for the old /etc/openldap/slapd.conf config data, and the RHEL6 Migration Guide tells you to how to convert from one to the other. But if you simply rename the default slapd.d directory, slapd will use the old-style slapd.conf file quite happily, which is much easier to read/modify/debug, at least while you're getting things working.
If you run into problems on the server, there are lots of helpful utilities included with the openldap-servers package. Check out the manpages for slaptest(8), slapcat(8), slapacl(8), slapadd(8), etc.

rpm-find-changes

Tue 18 Oct 2011

Tags: linux, sysadmin, centos, rhel

rpm-find-changes is a little script I wrote a while ago for rpm-based systems (RedHat, CentOS, Mandriva, etc.). It finds files in a filesystem tree that are not owned by any rpm package (orphans), or are modified from the version distributed with their rpm. In other words, any file that has been introduced or changed from it's distributed version.

It's intended to help identify candidates for backup, or just for tracking interesting changes. I run it nightly on /etc on most of my machines, producing a list of files that I copy off the machine (using another tool, which I'll blog about later) and store in a git repository.

I've also used it for tracking changes to critical configuration trees across multiple machines, to make sure everything is kept in sync, and to be able to track changes over time.

Available on github: https://github.com/gavincarr/rpm-find-changes

View Comments

Google Hangout on CentOS 6

Mon 08 Aug 2011

Tags: googleplus, hangout, centos

Kudos to Google for providing linux plugins for their Google Plus+ Hangouts (a multi-way video chat system), for both debian-based and rpm-based systems. The library requirements don't seem to be documented anywhere though, so here's the magic incantation required for installation on CentOS6 x86_64:

yum install libstdc++.i686 gtk2.i686 \
  libXrandr.i686 libXcomposite.i686 libXfixes.i686 \
  pulseaudio-libs.i686 alsa-lib.i686

View Comments

Poor Man's NTP

Wed 29 Jun 2011

Tags: date, hacks, ntp

One of today's annoyances was a third-party complaining about clock skew on a server at their site that they're testing against. No, they don't have a local ntp server, and no, they couldn't allow us to connect out to a designated ntp server externally. All we have is an ssh forward in.

They wanted me to manually set the clock on the server whenever they noticed it was out of synch! Real professionals.

I thought about tunneling an ntp stream out, but that requires udp-to-tcp fudging at each end, or using ssh's Tunnel facility, which requires root at both ends.

In the end, I settled for the low-tech approach - a once a day cron job that resets the time based on a local clock. Ugly, but good enough:

ssh root@server date --utc $(date --utc "+%m%d%H%M%Y.%S")

View Comments

RHEL6 GDM Sessions Workaround

Fri 14 Jan 2011

Tags: linux, rhel6, centos6, gdm

Update: ilaiho has provided a better solution in the comments, which is to install the xorg-x11-xinit-session package, which adds a "User script" session option. This will invoke your (executable) ~/.xsession or ~/.Xclients configs, if selected, and works well, so I'd recommend you go that route instead of using this patch now.

The GDM Greeter in RHEL6 seems to have lost the ability to select 'session types' (or window managers), which apparently means you're stuck using Gnome, even if you have other better options installed. One workaround is to install KDM instead, and set DISPLAYMANAGER=KDE in your /etc/sysconfig/desktop config, as KDM does still support selectable session types.

Since I've become a big fan of tiling window managers in general, and ion in particular, this was pretty annoying, so I wasted a few hours today working through the /etc/X11 scripts and figuring out how they hung together on RHEL6.

So for any other gnome-haters out there who don't want to have to go to KDM, here's a patch to /etc/X11/xinit/Xsession that ignores the default 'gnome-session' set by GDM, which allows proper window manager selection either by user .xsession or .Xclients files, or by the /etc/sysconfig/desktop DISPLAY setting.

diff --git a/xinit/Xsession b/xinit/Xsession
index e12e0ee..ab94d28 100755
--- a/xinit/Xsession
+++ b/xinit/Xsession
@@ -30,6 +30,14 @@ SWITCHDESKPATH=/usr/share/switchdesk
 # Xsession and xinitrc scripts which has been factored out to avoid duplication
 . /etc/X11/xinit/xinitrc-common

+# RHEL6 GDM doesn't seem to support selectable sessions, and always requests a
+# gnome-session. So we unset this default here, to allow things like user
+# .xsession or .Xclients files to be checked, and /etc/sysconfig/desktop
+# settings (via /etc/X11/xinit/Xclients) honoured.
+if [ -n "$GDMSESSION" -a $# -eq 1 -a "$1" = gnome-session ]; then
+  shift
+fi
+
 # This Xsession.d implementation, is intended to obsolte and replace the
 # various mechanisms present in the 'case' statement which follows, and to
 # eventually be able to easily remove all hard coded window manager specific

Apply as root:

cd /etc/X11
patch -p1 < /tmp/xsession.patch

View Comments

Rebuild Inventory

Sun 09 Jan 2011

Tags: linux, sysadmin

Here's what I use to take a quick inventory of a machine before a rebuild, both to act as a reference during the rebuild itself, and in case something goes pear-shaped. The whole chunk after script up to exit is cut-and-pastable.

# as root, where you want your inventory file
script $(hostname).inventory
export PS1='\h:\w\$ '               # reset prompt to avoid ctrl chars
fdisk -l /dev/sd?                   # list partition tables
cat /proc/mdstat                    # list raid devices
pvs                                 # list lvm stuff
vgs
lvs
df -h                               # list mounts
ip addr                             # list network interfaces
ip route                            # list network routes
cat /etc/resolv.conf                # show resolv.conf
exit

# Cleanup control characters in the inventory
perl -i -pe 's/\r//g; s/\033\]\d+;//g; s/\033\[\d+m//g; s/\007/\//g' \
  $(hostname).inventory

# And then copy it somewhere else in case of problems ;-)
scp $(hostname).inventory somewhere:

Anything else useful I've missed?

View Comments

Remote Rebuild, CentOS-style

Fri 08 Oct 2010

Tags: sysadmin, centos, pxe

Problem: you've got a remote server that's significantly hosed, either through a screwup somewhere or a power outage that did nasty things to your root filesystem or something. You have no available remote hands, and/or no boot media anyway.

Preconditions: You have another server you can access on the same network segment, and remote access to the broken server, either through a DRAC or iLO type card, or through some kind of serial console server (like a Cyclades/Avocent box).

Solution: in extremis, you can do a remote rebuild. Here's the simplest recipe I've come up with. I'm rebuilding using centos5-x86_64 version 5.5; adjust as necessary.

Note: dnsmasq, mrepo and syslinux are not core CentOS packages, so you need to enable the rpmforge repository to follow this recipe. This just involves:

wget http://packages.sw.be/rpmforge-release/rpmforge-release-0.5.1-1.el5.rf.x86_64.rpm
rpm -Uvh rpmforge-release-0.5.1-1.el5.rf.x86_64.rpm

1. On your working box (which you're now going to press into service as a build server), install and configure dnsmasq to provide dhcp and tftp services:

# Install dnsmasq
yum install dnsmasq

# Add the following lines to the bottom of your /etc/dnsmasq.conf file
# Note that we don't use the following ip address, but the directive
# itself is required for dnsmasq to turn dhcp functionality on
dhcp-range=ignore,192.168.1.99,192.168.1.99
# Here use the broken server's mac addr, hostname, and ip address
dhcp-host=00:d0:68:09:19:80,broken.example.com,192.168.1.5,net:centos5x
# Point the centos5x tag at the tftpboot environment you're going to setup
dhcp-boot=net:centos5x,/centos5x-x86_64/pxelinux.0
# And enable tftp
enable-tftp
tftp-root = /tftpboot
#log-dhcp

# Then start up dnsmasq
service dnsmasq start

2. Install and configure mrepo to provide your CentOS build environment:

# Install mrepo and syslinux
yum install mrepo syslinux

# Setup a minimal /etc/mrepo.conf e.g.
cat > /etc/mrepo.conf
[main]
srcdir = /var/mrepo
wwwdir = /var/www/mrepo
confdir = /etc/mrepo.conf.d
arch = x86_64
mailto = root@example.com
smtp-server = localhost
pxelinux = /usr/lib/syslinux/pxelinux.0
tftpdir = /tftpboot

[centos5]
release = 5
arch = x86_64
metadata = repomd repoview
name = Centos-$release $arch
#iso = CentOS-$release.5-$arch-bin-DVD-?of2.iso
#iso = CentOS-$release.5-$arch-bin-?of8.iso
^D
# (uncomment one of the iso lines above, either the DVD or the CD one)

# Download the set of DVD or CD ISOs for the CentOS version you want
# There are fewer DVD ISOs, but you need to use bittorrent to download
mkdir -p /var/mrepo/iso
cd /var/mrepo/iso
elinks http://isoredirect.centos.org/centos/5.5/isos/x86_64/

# Once your ISOs are available in /var/mrepo/iso, and the 'iso' line
# in /etc/mrepo.conf updated appropriately, run mrepo itself
mrepo -gvv

3. Finally, finish setting up your tftp environment. mrepo should have copied appropriate pxelinux.0, initrd.img, and vmlinuz files into your /tftpboot/centos5-x86_64 directory, so all you need to supply is an appropriate grub boot config:

cd /tftpboot/centos5-x86_64
ls
mkdir -p pxelinux.cfg

# Setup a default grub config (adjust the serial/console and repo params as needed)
cat > pxelinux.cfg/default
default linux
serial 0,9600n8
label linux
  root (nd)
  kernel vmlinuz
  append initrd=initrd.img console=ttyS0,9600 repo=http://192.168.1.1/mrepo/centos5-x86_64
^D

Now get your server to do a PXE boot (via a boot option or the bios or whatever), and hopefully your broken server will find your dhcp/tftp environment and boot up in install mode, and away you go.

If you have problems with the boot, try checking your /var/log/messages file on the boot server for hints.

View Comments

Dell OMSA

Mon 22 Mar 2010

Tags: dell, omsa, centos, rhel, linux, sysadmin

Following on from my IPMI explorations, here's the next chapter in my getting-down-and-dirty-with-dell-hardware-on-linux adventures. This time I'm setting up Dell's OpenManage Server Administrator software, primarily in order to explore being able to configure bios settings from within the OS. As before, I'm running CentOS 5, but OMSA supports any of RHEL4, RHEL5, SLES9, and SLES10, and various versions of Fedora Core and OpenSUSE.

Here's what I did to get up and running:

# Configure the Dell OMSA repository
wget -O bootstrap.sh http://linux.dell.com/repo/hardware/latest/bootstrap.cgi
# Review the script to make sure you trust it, and then run it
sh bootstrap.sh
# OR, for CentOS5/RHEL5 x86_64 you can just install the following:
rpm -Uvh http://linux.dell.com/repo/hardware/latest/platform_independent/rh50_64/prereq/\
dell-omsa-repository-2-5.noarch.rpm

# Install base version of OMSA, without gui (install srvadmin-all for more)
yum install srvadmin-base

# One of daemons requires /usr/bin/lockfile, so make sure you've got procmail installed
yum install procmail

# If you're running an x86_64 OS, there are a couple of additional 32-bit
#   libraries you need that aren't dependencies in the RPMs
yum install compat-libstdc++-33-3.2.3-61.i386 pam.i386

# Start OMSA daemons
for i in instsvcdrv dataeng dsm_om_shrsvc; do service $i start; done

# Finally, you can update your path by doing logout/login, or just run:
. /etc/profile.d/srvadmin-path.sh

Now to check whether you're actually functional you can try a few of the following (as root):

omconfig about
omreport about
omreport system -?
omreport chassis -?

omreport is the OMSA CLI reporting/query tool, and omconfig is the equivalent update tool. The main documentation for the current version of OMSA is here. I found the CLI User's Guide the most useful.

Here's a sampler of interesting things to try:

# Report system overview
omreport chassis

# Report system summary info (OS, CPUs, memory, PCIe slots, DRAC cards, NICs)
omreport system summary

# Report bios settings
omreport chassis biossetup

# Fan info
omreport chassis fans

# Temperature info
omreport chassis temps

# CPU info
omreport chassis processors

# Memory and memory slot info
omreport chassis memory

# Power supply info
omreport chassis pwrsupplies

# Detailed PCIe slot info
omreport chassis slots

# DRAC card info
omreport chassis remoteaccess

omconfig allows setting object attributes using a key=value syntax, which can get reasonably complex. See the CLI User's Guide above for details, but here are some examples of messing with various bios settings:

# See available attributes and settings
omconfig chassis biossetup -?

# Turn the AC Power Recovery setting to On
omconfig chassis biossetup attribute=acpwrrecovery setting=on

# Change the serial communications setting (on with serial redirection via)
omconfig chassis biossetup attribute=serialcom setting=com1
omconfig chassis biossetup attribute=serialcom setting=com2

# Change the external serial connector
omconfig chassis biossetup attribute=extserial setting=com1
omconfig chassis biossetup attribute=extserial setting=rad

# Change the Console Redirect After Boot (crab) setting
omconfig chassis biossetup attribute=crab setting=enabled
omconfig chassis biossetup attribute=crab setting=disabled

# Change NIC settings (turn on PXE on NIC1)
omconfig chassis biossetup attribute=nic1 setting=enabledwithpxe

Finally, there are some interesting formatting options available to both omreport, for use in scripting e.g.

# Custom delimiter format (default semicolon)
omreport chassis -fmt cdv

# XML format
omreport chassis -fmt xml

# To change the default cdv delimiter
omconfig preferences cdvformat -?
omconfig preferences cdvformat delimiter=pipe

View Comments

IPMI on CentOS/RHEL

Thu 11 Mar 2010

Tags: linux, centos, rhel, ipmi, dell

Spent a few days deep in the bowels of a couple of datacentres last week, and realised I didn't know enough about Dell's DRAC base management controllers to use them properly. In particular, I didn't know how to mess with the drac settings from within the OS. So spent some of today researching that.

Turns out there are a couple of routes to do this. You can use the Dell native tools (e.g. racadm) included in Dell's OMSA product, or you can use vendor-neutral IPMI, which is well-supported by Dell DRACs. I went with the latter as it's more cross-platform, and the tools come native with CentOS, instead of having to setup Dell's OMSA repositories. The Dell-native tools may give you more functionality, but for what I wanted to do IPMI seems to work just fine.

So installation is just:

yum install OpenIPMI OpenIPMI-tools
chkconfig ipmi on
service ipmi start

and then from the local machine you can use ipmitool to access and manipulate all kinds of useful stuff:

# IPMI commands
ipmitool help
man ipmitool

# To check firmware version
ipmitool mc info
# To reset the management controller
ipmitool mc reset [ warm | cold ]

# Show field-replaceable-unit details
ipmitool fru print

# Show sensor output
ipmitool sdr list
ipmitool sdr type list
ipmitool sdr type Temperature
ipmitool sdr type Fan
ipmitool sdr type 'Power Supply'

# Chassis commands
ipmitool chassis status
ipmitool chassis identify [<interval>]   # turn on front panel identify light (default 15s)
ipmitool [chassis] power soft            # initiate a soft-shutdown via acpi
ipmitool [chassis] power cycle           # issue a hard power off, wait 1s, power on
ipmitool [chassis] power off             # issue a hard power off
ipmitool [chassis] power on              # issue a hard power on
ipmitool [chassis] power reset           # issue a hard reset

# Modify boot device for next reboot
ipmitool chassis bootdev pxe
ipmitool chassis bootdev cdrom
ipmitool chassis bootdev bios

# Logging
ipmitool sel info
ipmitool sel list
ipmitool sel elist                       # extended list (see manpage)
ipmitool sel clear

For remote access, you need to setup user and network settings, either at boot time on the DRAC card itself, or from the OS via ipmitool:

# Display/reset password for default root user (userid '2')
ipmitool user list 1
ipmitool user set password 2 <new_password>

# Display/configure lan settings
ipmitool lan print 1
ipmitool lan set 1 ipsrc [ static | dhcp ]
ipmitool lan set 1 ipaddr 192.168.1.101
ipmitool lan set 1 netmask 255.255.255.0
ipmitool lan set 1 defgw ipaddr 192.168.1.254

Once this is configured you should be able to connect using the 'lan' interface to ipmitool, like this:

ipmitool -I lan -U root -H 192.168.1.101 chassis status

which will prompt you for your ipmi root password, or you can do the following:

echo <new_password> > ~/.racpasswd
chmod 600 ~/.racpasswd

and then use that password file instead of manually entering it each time:

ipmitool -I lan -U root -f ~/.racpasswd -H 192.168.1.101 chassis status

I'm using an 'ipmi' alias that looks like this:

alias ipmi='ipmitool -I lan -U root -f ~/.racpasswd -H'

# which then allows you to do the much shorter:
ipmi 192.168.1.101 chassis status
# OR
ipmi <hostname> chassis status

Finally, if you configure serial console redirection in the bios as follows:

Serial Communication -> Serial Communication:       On with Console Redirection via COM2
Serial Communication -> External Serial Connector:  COM2
Serial Communication -> Redirection After Boot:     Disabled

then you can setup standard serial access in grub.conf and inittab on com2/ttyS1 and get serial console access via IPMI serial-over-lan using the 'lanplus' interface:

ipmitool -I lanplus -U root -f ~/.racpasswd -H 192.168.1.101 sol activate

which I typically use via a shell function:

# ipmi serial-over-lan function
isol() {
   if [ -n "$1" ]; then
       ipmitool -I lanplus -U root -f ~/.racpasswd -H $1 sol activate
   else
       echo "usage: sol <sol_ip>"
   fi
}

# used like:
isol 192.168.1.101
isol <hostname>