Currently we are building a fairly rock solid high availability cluster for a client. This has the “usual” ingredients: two locations, two NetApps, two clusters of three vmware ESX servers and a bunch of virtual machines running on top of the ESX servers. Also included in the mix is a VDI (now called View) virtual desktop infrastructure for running virtual windows XP clients.

This is all managed by SRM (site recovery manager) and it is almost working. But that is another story.

What got me thinking is the following.

Last week I did a consultancy job where they had a build a fail over cluster using DRBD. With DRBD you have a disk device /dev/drbd/0 which is transparently replicated. The device file can be used like any other, fdisk, mkfs and mount all work as expected.

Now throw KVM into the mix…

The virtual machine images must be stored on the DRBD device. Suppose we have two servers called master and slave. On master the kvm processes run. In a failover situation the following needs to happen:

  • If master is stil available, kill all kvm processes;
  • if master is still available, set the DRBD device in secondary mode or disable it all together;
  • On slave make the DRBD device primary (so that it will become available in rw mode. If you don’t do this you get Wrong medium type errors;
  • On slave start the kvm processes again.

It would even be cooler if the virtual machines could actually be copied over while still are running, but I don’t know if that would be possible.

Shared storage would be possible by letting one virtual machine export (via NFS/SaMBa/iSCSI) another DRBD device.

So my site recovery manager script (SRM script) will be something along the lines of this:

#!/bin/bash

# when doing a fail over call it on the old site
# (if still available): srm stop
# the other side call it like: srm start

case $1 in
stop)
    /etc/init.d/kvm stop
     drbdadm /dev/drbd/0 secondary

;;

start)
     drbdadm /dev/drbd/0 primary -o
    /etc/init.d/kvm start

;;
esac
exit 0

Is it really that simple?