DRBD, raid-1 through the net

Nov 272010

You may need to have information replicated between two computers that actually are part of a cluster, in addition to”software” replica mechanisms such as Rsync you can use a product that is stable and included in the standard kernel: DRBD.

DRBD (Distributed Replicated Block Device) is a distributed storage system for the GNU/Linux platform. It consists of a kernel module, several userspace management applications and some shell scripts and is normally used on high availability (HA) clusters. DRBD bears similarities to RAID 1, except that it runs over a network.

DRBD refers to both the software (kernel module and associated userspace tools), and also to specific logical block devices managed by the software. DRBD device and DRBD block device are also often used for the latter.

It is free software released under the terms of the GNU General Public License version 2.

How it work

DRBD refers to block devices designed as a building block to form high availability (HA) clusters. This is done by mirroring a whole block device via an assigned network.

In the illustration above, the two orange boxes represent two servers that form an HA cluster. The boxes contain the usual components of a Linux kernel: file system, buffer cache, disk scheduler, disk drivers, TCP/IP stack and network interface card (NIC) driver. The black arrows illustrate the flow of data between these components.

The orange arrows show the flow of data, as DRBD mirrors the data of a highly available service from the active node of the HA cluster to the standby node of the HA cluster.

Advantages over shared cluster storage

Conventional computer cluster systems typically use some sort of shared storage for data being used by cluster resources. This approach has a number of disadvantages, which DRBD may help offset:

Shared storage resources usually introduce a single point of failure in the cluster setup — while each of the cluster nodes may fail without causing service interruption, storage failure almost inevitably causes service downtime. In DRBD, no such issues exist as the cluster resource data is replicated rather than shared.

Shared storage resources are particularly sensitive to split brain situations, where both cluster nodes are still alive, but lose all network connectivity between them. In such a scenario, each cluster node will assume that it is the only surviving node in the cluster, and take over all cluster resources. This may lead to potentially disastrous results when both nodes, for example, mount and write to file systems concurrently. Cluster administrators must thus carefully implement node fencing policies to avoid this. DRBD substantially mitigates this problem by keeping two replicated sets of data instead of one shared set.

Shared storage resources must typically be addressed over a SAN or NAS, which creates some overhead in read I/O. In DRBD that overhead is greatly reduced as all read operations are carried out locally.

Inclusion in Linux kernel

DRBD’s authors originally submitted the software to the Linux kernel community in July 2007, for possible future inclusion of DRBD into the “vanilla” (standard, without modifications) Linux kernel. After a long time of review and several discussions, Linus Torvalds finally agreed to have DRBD as part of the official Linux kernel. DRBD got merged on 8 December 2009 during the “merge window” for Linux kernel version 2.6.33.

4 Responses to “DRBD, raid-1 through the net”

Netritious says:

Tuesday November 30th, 2010 at 04:26 AM

My experience with DRBD is horrible. It works great when it works, but when DRBD bombs (survivor-survivor/victim-victim scenarios) it bombs HARD. Every two to three months in production DRBD would crash making the clustered FS (ext3) unavailable. The first time it happened it took me several hours to identify the problem and resolve it. After the third time I canned it and switched to batch+cron+rsync. Works perfectly fine, and if I turn off one machine the other is still working. Not nearly 99.999% uptime, but close enough without the headaches of DRBD involved.

Reply
- linuxari says:
  
  Wednesday December 1st, 2010 at 10:24 PM
  
  I’ve used 4 systems with DRBD for many years and my experience has been much more better. The worst things happened to me it’s a survivor/survivor scenario, but in that case we just elected 1 of the 2 (we did some rsync in dry-run mode to check for the difference between the 2 file systems) and wiped out the machine number 2, worked fine.
  
  it’s a good solution that can save you in some scenario, not all for sure 😉
  A backup solution for a disaster it’s always suggested of course.
  
  Reply
Notes from : Nagios World Conference Europe « Linux « Technology « Theory Report says:

Saturday May 14th, 2011 at 12:13 PM

[…] done a good display of DRBD (i’ve done an introduction of this software here), though i’ve unequivocally not […]
PC says:

Wednesday June 1st, 2011 at 11:50 PM

My experience is better too. My cluster suport 300 users at 24×7. The cluster is running at 8 years. But you have to know the product…

Reply