Wed 09 Apr 2008
I've been playing around with SixApart's
TheSchwartz for the last few days.
TheSchwartz is a lightweight reliable job queue, typically used for
handling relatively high latency jobs that you don't want to try and
handle from a web process e.g. for sending out emails, placing orders
into some external system, etc. Basically interacting with anything
which might be down or slow or which you don't really need right away.
Actually, TheSchwartz is a job queue library rather than a job queue
system, so some assembly is required. Like most Danga/SixApart
software, it's lightweight, performant, and well-designed, but also
pretty light on documentation. If you're not comfortable reading the
(perl) source, it might be a challenging environment to setup.
Notes from the last few days:
Don't use the version on CPAN, get the latest code from
subversion
instead. At the moment the CPAN version is 1.04, but current
svn is at 1.07, and has some significant additional
functionality.
Conceptually TheSchwartz is very simple - jobs with opaque
function names and arguments are inserted into a database
for workers with a particular 'ability'; workers periodically
check the database for jobs matching the abilities they have,
and grab and execute them. Jobs that succeed are marked
completed and removed from the queue; jobs that fail are
logged and left on the queue to be retried after some time
period up to a configurable number of retries.
TheSchwartz has two kinds of clients - those that submit
jobs, and workers that perform jobs. Both are considered
clients, which is confusing if you're thinking in terms of
client-server interaction. TheSchwartz considers both
sides to be clients.
There are three main classes to deal with: TheSchwartz,
which is the main client functionality class;
TheSchwartz::Job, which models the jobs that are submitted
to the job queue; and TheSchwartz::Worker, which is a
role-type class modelling a particular ability that a worker
is able to perform.
New worker abilities are defined by subclassing
TheSchwartz::Worker and defining your new functionality in
a work() method. work() receives the job object from the
queue as its only argument and does its stuff, marking the
job as completed or failed after processing. A useful real
example worker is TheSchwartz::Worker::SendEmail (also by
Brad Fitzpatrick, and available on CPAN) for sending emails from
TheSchwartz.
Depending on your application, it may make sense for workers
to just have a single ability, or for them to have multiple
abilities and service more than one type of job. In the latter
case, TheSchwartz tries to use unused abilities whenever it
can to avoid certain kinds of jobs getting starved.
You can also subclass TheSchwartz itself to modify the standard
functionality, and I've found that useful where I've wanted more
visibility of what workers are doing that you get out of the box.
You don't appear at this point to be able to subclass
TheSchwartz::Job however - TheSchwartz always uses this as the
class when autovivifying jobs for workers.
There are a bunch of other features I haven't played with yet,
including job priorities, the ability to coalesce jobs into
groups to be processed together, and the ability to delay jobs
until a certain time.
I've actually been using it to setup a job queue system for a cluster,
which is a slightly different application that it was intended for,
but so far it's been working really well.
I'm still feeling like I'm still getting to grips with the breadth
of things it could be used for though - more experimentation required.
I'd be interested in hearing of examples of what people are using it
for as well.
Recommended.
Thu 27 Mar 2008
I wasted 15 minutes the other day trying to remember how to do this,
so here it is for the future: to find out if and when a perl module
got added to the core, you want Richard Clamp's excellent
Module::CoreList.
Recent versions have a 'corelist' frontend command, so I typically
use that e.g.
$ corelist File::Basename
File::Basename was first released with perl 5
$ corelist warnings
warnings was first released with perl 5.006
$ corelist /^File::Spec/
File::Spec was first released with perl 5.00405
File::Spec::Cygwin was first released with perl 5.006002
File::Spec::Epoc was first released with perl 5.006001
File::Spec::Functions was first released with perl 5.00504
File::Spec::Mac was first released with perl 5.00405
File::Spec::OS2 was first released with perl 5.00405
File::Spec::Unix was first released with perl 5.00405
File::Spec::VMS was first released with perl 5.00405
File::Spec::Win32 was first released with perl 5.00405
$ corelist URI::Escape
URI::Escape was not in CORE (or so I think)
Mon 17 Mar 2008
Saw this post fly past in the twitter stream today:
http://linuxshellaccount.blogspot.com/2008/03/perl-directory-permissions-difference.html.
It's a script by Mike Golvach to do something like a diff -r, but also
showing differences in permissions and ownership, rather than just content.
I've written a CPAN module to do stuff like this -
File::DirCompare - so
thought I'd check how straightforward this would be using File::DirCompare:
#!/usr/bin/perl
use strict;
use File::Basename;
use File::DirCompare;
use File::Compare qw(compare);
use File::stat;
die "Usage: " . basename($0) . " dir1 dir2\n" unless @ARGV == 2;
my ($dir1, $dir2) = @ARGV;
File::DirCompare->compare($dir1, $dir2, sub {
my ($a, $b) = @_;
if (! $b) {
printf "Only in %s: %s\n", dirname($a), basename($a);
} elsif (! $a) {
printf "Only in %s: %s\n", dirname($b), basename($b);
} else {
my $stata = stat $a;
my $statb = stat $b;
# Return unless different
return unless compare($a, $b) != 0 ||
$stata->mode != $statb->mode ||
$stata->uid != $statb->uid ||
$stata->gid != $statb->gid;
# Report
printf "%04o %s %s %s\t\t%04o %s %s %s\n",
$stata->mode & 07777, basename($a),
(getpwuid($stata->uid))[0], (getgrgid($stata->gid))[0],
$statb->mode & 07777, basename($b),
(getpwuid($statb->uid))[0], (getgrgid($statb->gid))[0];
}
}, { ignore_cmp => 1 });
So this reports all entries that are different in content or permissions or
ownership e.g. given a tree like this (slightly modified from Mike's
example):
$ ls -lR scripts1 scripts2
scripts1:
total 28
-rw-r--r-- 1 gavin gavin 0 Mar 17 16:41 script1
-rw-r--r-- 1 gavin gavin 0 Mar 17 16:41 script1.bak
-rw-r--r-- 1 gavin gavin 0 Mar 17 16:41 script2
-rw-r--r-- 1 gavin gavin 0 Mar 17 16:41 script2.bak
-rw-r--r-- 1 gavin gavin 0 Mar 17 16:41 script3
-rw-r--r-- 1 gavin gavin 0 Mar 17 16:41 script3.bak
-rw-r--r-- 1 gavin gavin 0 Mar 17 16:49 script4
scripts2:
total 28
-rw-r--r-- 1 gavin users 0 Mar 17 16:41 script1
-rw-r--r-- 1 gavin users 0 Mar 17 16:41 script1.bak
-rw-r--r-- 1 gavin gavin 0 Mar 17 16:41 script2
-rw-r--r-- 1 gavin gavin 0 Mar 17 16:41 script2.bak
-rwxr-xr-x 1 gavin gavin 0 Mar 17 16:41 script3*
-rwxr-xr-x 1 gavin gavin 0 Mar 17 16:41 script3.bak*
-rw-r--r-- 1 gavin gavin 0 Mar 17 16:49 script5
it will give output like the following:
$ ./pdiff2 scripts1 scripts2
0644 script1 gavin gavin 0644 script1 gavin users
0644 script1.bak gavin gavin 0644 script1.bak gavin users
0644 script3 gavin gavin 0755 script3 gavin gavin
0644 script3.bak gavin gavin 0755 script3.bak gavin gavin
Only in scripts1: script4
Only in scripts2: script5
This obviously has dependencies that Mike's version doesn't have, but it
comes out much shorter and clearer, I think. It also doesn't fork and parse
an external ls, so it should be more portable and less fragile. I should
probably be caching the getpwuid lookups too, but that would have made it
5 lines longer. ;-)