Exploring Riak

Been playing with Riak recently, which is one of the modern dynamo-derived nosql databases (the other main ones being Cassandra and Voldemort). We're evaluating it for use as a really large brackup datastore, the primary attraction being the near linear scalability available by adding (relatively cheap) new nodes to the cluster, and decent availability options in the face of node failures.

I've built riak packages for RHEL/CentOS 5, available at my repository, and added support for a riak 'target' to the latest version (1.10) of brackup (packages also available at my repo).

The first thing to figure out is the maximum number of nodes you expect your riak cluster to get to. This you use to size the ring_creation_size setting, which is the number of partitions the hash space is divided into. It must be a power of 2 (64, 128, 256, etc.), and the reason it's important is that it cannot be easily changed after the cluster has been created. The rule of thumb is that for performance you want at least 10 partitions per node/machine, so the default ring_creation_size of 64 is really only useful up to about 6 nodes. 128 scales to 10-12, 256 to 20-25, etc. For more info see the Riak Wiki.

Here's the script I use for configuring a new node on CentOS. The main things to tweak here are the ring_creation_size you want (here I'm using 512, for a biggish cluster), and the interface to use to get the default ip address (here eth0, or you could just hardcode 0.0.0.0 instead of $ip).

#!/bin/sh
# Riak configuration script for CentOS/RHEL

# Install riak (and IO::Interface, for next)
yum -y install riak perl-IO-Interface

# To set app.config:web_ip to use primary ip, do:
perl -MIO::Interface::Simple -i \
  -pe "BEGIN { \$ip = IO::Interface::Simple->new(q/eth0/)->address; }
      s/127\.0\.0\.1/\$ip/" /etc/riak/app.config

# To add a ring_creation_size clause to app.config, do:
perl -i \
  -pe 's/^((\s*)%% riak_web_ip)/$2%% ring_creation_size is the no. of partitions to divide the hash
$2%% space into (default: 64).
$2\{ring_creation_size, 512\},

$1/' /etc/riak/app.config

# To set riak vm_args:name to hostname do:
perl -MSys::Hostname -i -pe 's/127\.0\.0\.1/hostname/e' /etc/riak/vm.args

# display (bits of) config files for checking
echo
echo '********************'
echo /etc/riak/app.config
echo '********************'
head -n30 /etc/riak/app.config
echo
echo '********************'
echo /etc/riak/vm.args
echo '********************'
cat /etc/riak/vm.args

Save this to a file called e.g. riak_configure, and then to configure a couple of nodes you do the following (note that NODE is any old internal hostname you use to ssh to the host in question, but FIRST_NODE needs to use the actual -name parameter defined in /etc/riak/vm.args on your first node):

# First node
NODE=node1
cat riak_configure | ssh $NODE sh
ssh $NODE 'chkconfig riak on; service riak start'
# Run the following until ringready reports TRUE
ssh $NODE riak-admin ringready

# All nodes after the first
FIRST_NODE=riak@node1.example.com
NODE=node2
cat riak_configure | ssh $NODE sh
ssh $NODE "chkconfig riak on; service riak start && riak-admin join $FIRST_NODE"
# Run the following until ringready reports TRUE
ssh $NODE riak-admin ringready

That's it. You should now have a working riak cluster accessible on port 8098 on your cluster nodes.