Parallel Processing Perl Modules

Needed to parallelise some processing in perl the last few days, and did a quick survey of some of the parallel processing modules on CPAN, of which there is the normal bewildering diversity.

As usual, it depends exactly what you're trying to do. In my case I just needed to be able to fork a bunch of processes off, have them process some data, and hand the results back to the parent.

So here are my notes on a random selection of the available modules. The example each time is basically a parallel version of the following map:

my %out = map { $_ ** 2 } 1 .. 50;

Parallel::ForkManager

Object oriented wrapper around 'fork'. Supports parent callbacks. Passing data back to parent uses files, and feels a little bit clunky. Dependencies: none.

use Parallel::ForkManager 0.7.6;

my @num = 1 .. 50;

my $pm = Parallel::ForkManager->new(5);

my %out;
$pm->run_on_finish(sub {    # must be declared before first 'start'
    my ($pid, $exit_code, $ident, $exit_signal, $core_dump, $data) = @_;
    $out{ $data->[0] } = $data->[1];
});

for my $num (@num) {
    $pm->start and next;   # Parent nexts

    # Child
    my $sq = $num ** 2;

    $pm->finish(0, [ $num, $sq ]);   # Child exits
}
$pm->wait_all_children;

[Version 0.7.9]

Parallel::Iterator

Basically a parallel version of 'map'. Dependencies: none.

use Parallel::Iterator qw(iterate);

my @num = 1 .. 50;

my $it = iterate( sub {
    # sub is a closure, return outputs
    my ($id, $num) = @_;
    return $num ** 2;
}, \@num );

my %out = ();
while (my ($num, $square) = $it->()) {
  $out{$num} = $square;
}

[Version 1.00]

Parallel::Loops

Provides parallel versions of 'foreach' and 'while'. It uses 'tie' to allow shared data structures between the parent and children. Dependencies: Parallel::ForkManager.

use Parallel::Loops;

my @num = 1 .. 50;

my $pl = Parallel::Loops->new(5);

my %out;
$pl->share(\%out);

$pl->foreach( \@num, sub {
    my $num = $_;           # note this uses $_, not @_
    $out{$num} = $num ** 2;
});

You can also return values from the subroutine like Iterator, avoiding the explicit 'share':

my %out = $pl->foreach( \@num, sub {
    my $num = $_;           # note this uses $_, not @_
    return ( $num, $num ** 2 );
});

[Version 0.03]

Proc::Fork

Provides an interesting perlish forking interface using blocks. No built-in support for returning data from children, but provides examples using pipes. Dependencies: Exporter::Tidy.

use Proc::Fork;
use IO::Pipe;
use Storable qw(freeze thaw);

my @num = 1 .. 50;
my @children;

for my $num (@num) {
    my $pipe = IO::Pipe->new;

    run_fork{ child {
        # Child
        $pipe->writer;
        print $pipe freeze([ $num, $num ** 2 ]);
        exit;
    } };

    # Parent
    $pipe->reader;
    push @children, $pipe;
}

my %out;
for my $pipe (@children) {
    my $entry = thaw( <$pipe> );
    $out{ $entry->[0] } = $entry->[1];
}

[Version 0.71]

Parallel::Prefork

Like Parallel::ForkManager, but adds better signal handling. Doesn't seem to provide built-in support for returning data from children. Dependencies: Proc::Wait3.

[Version 0.08]

Parallel::Forker

More complex module, loosely based on ForkManager (?). Includes better signal handling, and supports scheduling and dependencies between different groups of subprocesses. Doesn't appear to provide built-in support for passing data back from children.

[Version 1.232]

blog comments powered by Disqus