Parallelising With Perl

Did a talk at the Sydney Perl Mongers group on Tuesday night, called "Parallelising with Perl", covering AnyEvent, MCE, and GNU Parallel.


Parallel Processing Perl Modules

Needed to parallelise some processing in perl the last few days, and did a quick survey of some of the parallel processing modules on CPAN, of which there is the normal bewildering diversity.

As usual, it depends exactly what you're trying to do. In my case I just needed to be able to fork a bunch of processes off, have them process some data, and hand the results back to the parent.

So here are my notes on a random selection of the available modules. The example each time is basically a parallel version of the following map:

my %out = map { $_ ** 2 } 1 .. 50;


Object oriented wrapper around 'fork'. Supports parent callbacks. Passing data back to parent uses files, and feels a little bit clunky. Dependencies: none.

use Parallel::ForkManager 0.7.6;

my @num = 1 .. 50;

my $pm = Parallel::ForkManager->new(5);

my %out;
$pm->run_on_finish(sub {    # must be declared before first 'start'
    my ($pid, $exit_code, $ident, $exit_signal, $core_dump, $data) = @_;
    $out{ $data->[0] } = $data->[1];

for my $num (@num) {
    $pm->start and next;   # Parent nexts

    # Child
    my $sq = $num ** 2;

    $pm->finish(0, [ $num, $sq ]);   # Child exits

[Version 0.7.9]


Basically a parallel version of 'map'. Dependencies: none.

use Parallel::Iterator qw(iterate);

my @num = 1 .. 50;

my $it = iterate( sub {
    # sub is a closure, return outputs
    my ($id, $num) = @_;
    return $num ** 2;
}, \@num );

my %out = ();
while (my ($num, $square) = $it->()) {
  $out{$num} = $square;

[Version 1.00]


Provides parallel versions of 'foreach' and 'while'. It uses 'tie' to allow shared data structures between the parent and children. Dependencies: Parallel::ForkManager.

use Parallel::Loops;

my @num = 1 .. 50;

my $pl = Parallel::Loops->new(5);

my %out;

$pl->foreach( \@num, sub {
    my $num = $_;           # note this uses $_, not @_
    $out{$num} = $num ** 2;

You can also return values from the subroutine like Iterator, avoiding the explicit 'share':

my %out = $pl->foreach( \@num, sub {
    my $num = $_;           # note this uses $_, not @_
    return ( $num, $num ** 2 );

[Version 0.03]


Provides an interesting perlish forking interface using blocks. No built-in support for returning data from children, but provides examples using pipes. Dependencies: Exporter::Tidy.

use Proc::Fork;
use IO::Pipe;
use Storable qw(freeze thaw);

my @num = 1 .. 50;
my @children;

for my $num (@num) {
    my $pipe = IO::Pipe->new;

    run_fork{ child {
        # Child
        print $pipe freeze([ $num, $num ** 2 ]);
    } };

    # Parent
    push @children, $pipe;

my %out;
for my $pipe (@children) {
    my $entry = thaw( <$pipe> );
    $out{ $entry->[0] } = $entry->[1];

[Version 0.71]


Like Parallel::ForkManager, but adds better signal handling. Doesn't seem to provide built-in support for returning data from children. Dependencies: Proc::Wait3.

[Version 0.08]


More complex module, loosely based on ForkManager (?). Includes better signal handling, and supports scheduling and dependencies between different groups of subprocesses. Doesn't appear to provide built-in support for passing data back from children.

[Version 1.232]