You may need to have information replicated between two computers that actually are part of a cluster, in addition to”software” replica mechanisms such as Rsync you can use a product that is stable and included in the standard kernel: DRBD.
DRBD (Distributed Replicated Block Device) is a distributed storage system for the GNU/Linux platform. It consists of a kernel module, several userspace management applications and some shell scripts and is normally used on high availability (HA) clusters. DRBD bears similarities to RAID 1, except that it runs over a network.
DRBD refers to both the software (kernel module and associated userspace tools), and also to specific logical block devices managed by the software. DRBD device and DRBD block device are also often used for the latter.
It is free software released under the terms of the GNU General Public License version 2.
How it work
DRBD refers to block devices designed as a building block to form high availability (HA) clusters. This is done by mirroring a whole block device via an assigned network.
In the illustration above, the two orange boxes represent two servers that form an HA cluster. The boxes contain the usual components of a Linux kernel: file system, buffer cache, disk scheduler, disk drivers, TCP/IP stack and network interface card (NIC) driver. The black arrows illustrate the flow of data between these components.
The orange arrows show the flow of data, as DRBD mirrors the data of a highly available service from the active node of the HA cluster to the standby node of the HA cluster.
Advantages over shared cluster storage
Conventional computer cluster systems typically use some sort of shared storage for data being used by cluster resources. This approach has a number of disadvantages, which DRBD may help offset:
Shared storage resources usually introduce a single point of failure in the cluster setup — while each of the cluster nodes may fail without causing service interruption, storage failure almost inevitably causes service downtime. In DRBD, no such issues exist as the cluster resource data is replicated rather than shared.
Shared storage resources are particularly sensitive to split brain situations, where both cluster nodes are still alive, but lose all network connectivity between them. In such a scenario, each cluster node will assume that it is the only surviving node in the cluster, and take over all cluster resources. This may lead to potentially disastrous results when both nodes, for example, mount and write to file systems concurrently. Cluster administrators must thus carefully implement node fencing policies to avoid this. DRBD substantially mitigates this problem by keeping two replicated sets of data instead of one shared set.
Shared storage resources must typically be addressed over a SAN or NAS, which creates some overhead in read I/O. In DRBD that overhead is greatly reduced as all read operations are carried out locally.
Inclusion in Linux kernel
DRBD’s authors originally submitted the software to the Linux kernel community in July 2007, for possible future inclusion of DRBD into the “vanilla” (standard, without modifications) Linux kernel. After a long time of review and several discussions, Linus Torvalds finally agreed to have DRBD as part of the official Linux kernel. DRBD got merged on 8 December 2009 during the “merge window” for Linux kernel version 2.6.33.