Jul 072012
 

Sometimes is useful to sync automatically files over the net between 2 or more computers, maybe you want to keep some configuration files aligned on different servers or maybe you have a cluster of web servers and you want to keep their document root aligned so your customer will always see the same result.

You could do this with a network filesystem like NFS, GlusterFS or Coda File system.
But why do complicated things when you could easily do this just keeping in sync the local filesystem ?

In a former article I’ve talked about Unison to do a work like that, and it works, the limit of Unison is that you can have just 2 nodes, but if you have more nodes you have to use a different solution like the one i present you today: csync2




Csync2 is a tool for asynchronous file synchronization in clusters. It can be used to keep files on multiple hosts in a cluster in sync. Csync2 can handle complex setups with much more than just 2 hosts, handle file deletions and can detect conflicts.

It is a good solution for HA-clusters, HPC-clusters, COWs and server farms.

Why you don’t just use rsync ?

This could be a common question, rsync is a good tool and i think that every linux administrator sooner or later has used it, and so you could think that this is enough to handle the sync between filesystems.

So let’s say that you have just 3 servers that must have the filesystem /www/ in sync between them, the webserver of each of these 3 nodes can write there and so you don’t have a master server.

You could run a rsync cron job on every server that syncs the local content with the other 2 nodes, but there are some problems in this setup this create quite some traffic, and will keep the nodes rather busy as well. After all, rsync checks if every file exists on the node, compares the contents, size or last modification date and builds a list of files to be transferred based on that. And every time it needs to connect to each nodes. This is fine for occasional updates, less fine for more regular ones, or if you have a large number of files.

Csync2 keeps a little database (sqlite as default) which contains the state of each file. This means that whenever it gets invoked, it first updates the database – and only starts to connect to the nodes in case any files were added, modified or deleted. A massive win in the number of connections it needs to make to the nodes, as most of the time there won’t be any new files. And It’s also a lot faster in checking than a Rsync.

Naturally the more nodes you have the more gains you’ll have in using csync2.

Installation and configuration

The installation should be easy in most of the Linux distributions, csync2 is included in the repository of Debian, Ubuntu, Fedora, Gentoo and is also available in external repository for Centos and red Hat Enterprise, so in general an install with your package manager should be enough to have it installed.

To have a good starting point for the configuration i suggest to read the linbit paper about csync2, this will give you all the info you need to manage and configure csync2.

But let’s see now what to do once you have the package installed on your nodes, in this examples I’ll use the path of a Debian distribution, if you have a different distribution they could change slightly.

1) Pre-shared Keys

Authentication is performed using the IP addresses and pre-shared-keys in Csync2 . Each synchronization group (a group of hosts that have one or more file in sync) in the config file must have exactly one key record specifying the file containing the preshared-key for this group. It is recommended to use a separate key for each synchronization group and only place a key file on those hosts which actually are members in the corresponding synchronization group.

The key file can be generated with the following command on your first node:

csync2 -k /etc/csync2.key

2) SSL certificate
Next you need to create an SSL certificate for the local Csync2 server. On your first node give these commands:

openssl genrsa -out /etc/csync2_ssl_key.pem 1024
openssl req -batch -new -key /etc/csync2_ssl_key.pem -out /etc/csync2_ssl_cert.csr
openssl x509 -req -days 3600 -in /etc/csync2_ssl_cert.csr -signkey /etc/csync2_ssl_key.pem -out /etc/csync2_ssl_cert.pem

3) Csync2 configuration file

On your first node create the file /etc/csync2.conf, in this example i want to keep in sync just 1 directory of 2 servers (node1 and node2):

group mycluster
{
        host node1;
        host node2;
 
        key /etc/csync2.key;
 
        include /www/htdocs;
        exclude *~ .*;
}

Host lists are specified using the host keyword. You can eighter specify the hosts in a whitespace seperated list or use an extra host statement for
each host. The hostnames used here must be the local hostnames of the cluster nodes.

4) Now copy all the files from the first node (node1) to the other with :

scp /etc/csync2* node2:/etc/

And restart on both nodes inetd (or xinetd if you use it) with the command:

 /etc/init.d/openbsd-inetd restart

5) First Sync

Start synchronization first on node1 then on node2, afther this you can setup a cronjob to do a periodic sync.

csync2 -xv

If you get conflicts or errors use -f option

This setup is enough to have 2 nodes and 1 directory in sync, you’ll have to put on the crontab of both nodes something like this :

*/2 * * * * csync2 -x


Actions following a sync

Each synchronization group may have any number of action sections. These action sections are used to specify shell commands which should be
executed after a file is synchronized that matches any of the specified patterns.The exec statement is used to specify the command which should be executed. Note that if multiple files matching the pattern are synced in one run, this command will only be executed once.

The special token %% in the command string is substituted with the list of files which triggered the command execution.

Example:

group g1 {
  host node1 node2;                          # hosts list
  key /etc/csync2.key_g1;                  # pre-shared key
 
  include /etc/xinetd.d;
 
  action {                                 
    pattern /etc/xinetd.d;
    exec "/etc/init.d/xinetd restart";
    logfile "/var/log/csync2_action.log";
  }

In this example every time a file in the path /etc/xinetd.d is changed we run the command /etc/init.d/xinetd restart

Common tasks of csync2

These are some common options and tasks that you can use from the command line:

Synchronize

csync2 -x

force local file to be newer (has to be followed by csync2 -x for synchronisation)

csync2 -f filename

Test if everything is in sync with all peers.

csync2 -T

As -T, but print the unified diffs.

csync2 -TT

verbose flag for all commands: -v, i.e.

csync2 -xv

dry-run flag for all commands: -d, i.e.

csync2 -xvd

Conclusions

Csync2 is a great tool if you want to keep filesystems in synchronization asynchronously, there are many other options, like declaring an host as slave only or using or not SSL in the connection between the nodes.

Popular Posts:

flattr this!

  6 Responses to “Csync2 a filesystem syncronization tool for Linux”

  1. tool interessante, ma invece di usare il cron si potrebbe usare gli eventi del kernel (o stile dropbox)?

  2. Good job. Thank for succinct overview!

  3. sir, this tutorial is quite good
    can you tell me the changes in csync configuration file if there are more than two hosts,
    i mean if we have 4 hosts and wanted to sync host1 & 2 with each other and host3 & 4 with each other and as a whole this is a group.

  4. nice job…..

 Leave a Reply

(required)

(required)


*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>