How To Configure Zenoss for High-Availability Using DRBD and Heartbeat (Zenoss 2.x/3.x)

IMPORTANT NOTE: As of December 2013, Zenoss recommends that all new Resource Manager installs and any upgrades of Resource Manager that involve an upgrade of the host operating system to RHEL Centos 6.4 or later follow the procedure outlined in How to Configure Zenoss Resource Manager for use with Red Hat Cluster Suite instead of following the procedure documented here.


  • root or sudo access to the servers
  • access to repositories to obtain the DRBD and Heartbeat components
  • additional pre-requisites as described in procedure below

Applies To

  • RPM based installs on CentOS or RHEL only
  • Zenoss 2.x
  • Zenoss 3.x


Zenoss can be setup in a highly available (active/passive) configuration rather easily by making use of common components of the Linux HA project including heartbeat and DRBD. DRBD (Distributed Replicated Block Device) is used to provide two servers a constantly synchronized copy of the MySQL and ZODB data. The heartbeat service is used to handle the bringing up/down the slave (passive) node when the master fails.



Certain assumptions have been made to make this article as widely applicable as possible. The following conventions are used throughout the article, and should be replaced with your local settings. These instructions are targeted primarily to CentOS 5, but with some minor modifications can be applied to most Linux distributions:

  • Hostnames
    • primary node: hostnameA
    • secondary node: hostnameB
  • IP Addresses
    • primary node:
    • secondary node:
    • shared cluster:
  • Physical Block Devices
    • /opt/zenoss: /dev/sda2
    • /opt/zenoss/perf: /dev/sda5


You must have the following in place before you can configure the system for high availability. All steps must be performed on both servers destined to be in the highly available cluster.

File System Layout

Three separate separate file systems are required for the best performing setup.

  • / - at least 8GB - stores operating system files and will not be replicated.
  • /opt/zenoss - at least 50GB - stores most Zenoss files and will be replicated.
  • /opt/zenoss/perf -  at least 10GB - stores Zenoss performance data and will be replicated.

After booting up for the first time, unmount the latter two files systems because they will be repurposed for replication. Use the following commands:

umount /opt/zenoss/perf
umount /opt/zenoss

Remove these two file systems from the /etc/fstab file so they are not automatically remounted upon boot.

LABEL=/opt/zenoss /opt/zenoss   ext3  defaults  1 2
LABEL=/opt/zenoss/perf  /opt/zenoss/perf ext3 defaults 1 2

Disk Replication with DRBD

You must install the drbd and kmod-drbd packages to enable disk replication for the cluster. The names of these packages can differ from one Linux distribution to another, but the following will work to install them on CentOS 5.1.

yum install drbd kmod-drbd

After successfully running the command, configure DRBD to replicate the file systems.

NOTE: The format below is a shorthand notation available only in DRBD 8.2.1 and later. If you are running a version prior to 8.2.1, see the DRBD user's guide for the proper format. The guide can be found at:

Replace the existing contents of the /etc/drbd.conf file with the following.

    global {
      usage-count no;

    common {
      protocol C;

      disk {
        on-io-error detach;

      net {
        max-buffers 2048;
        unplug-watermark 2048;

      syncer {
        rate 700000K;
        al-extents 1801;

    resource zenhome {
      device /dev/drbd0;
      disk /dev/sda2;
      meta-disk internal;
      on hostnameA {

      on hostnameB {

    resource zenperf {
      device /dev/drbd1;
      disk /dev/sda5;
      meta-disk internal;
      on hostnameA {

      on hostnameB {

When DRBD is successfully configured, initialize the replicated file systems with the following commands:

dd if=/dev/zero bs=1M count=1 of=/dev/sda2
dd if=/dev/zero bs=1M count=1 of=/dev/sda5
drbdadm create-md zenhome
\ drbdadm create-md zenperf
service drbd start
drbdadm -- -o primary zenhome
\ drbdadm -- -o primary zenperf
mkfs.ext3 /dev/drbd0
mkfs.ext3 /dev/drbd1

You can now mount these two replication file systems with the following commands:

mount /dev/drbd0 /opt/zenoss
mkdir /opt/zenoss/perf
mount /dev/drbd1 /opt/zenoss/perf

MySQL Setup

NOTE: If you are using a stand-alone mysql server you can skip this section because each host will be configured individually to use the "off-box" MySQL server.

Verify the MySQL server package is installed on the system.

yum install mysql-server

It is only necessary to move its data onto the /opt/zenoss file system for it to be replicated. You can do this by performing the following:

service mysqld stop

mv /var/lib/mysql /opt/zenoss

ln -s /opt/zenoss/mysql /var/lib/mysql

service mysqld start

Zenoss Setup

Everything is now in place for the Zenoss installation. Perform a normal Zenoss installation according to the regular instructions.

Post Installation steps

After the installation is complete, reset some permissions that were affected by the file system setup:

chown zenoss:zenoss -R /opt/zenoss/perf

Ensure that the mysql data files are not pushed down to remote collectors (you can skip this step if you do not use remote collectors)

  1. Navigate to: $ZENHOME/ZenPacks/ZenPacks.zenoss.DistributedCollector-VERSION.egg/ZenPacks/zenoss/DistributedCollector/conf
  2. Add the following exclusion to exfiles:
    - mysql

Heartbeat Setup

To setup heartbeat to manage your resources you must first install the package. The mechanism to install the package differs depending on your Linux distribution, but the following will work on CentOS 5.1.

yum install heartbeat

You might need to install the heartbeat service in case this was not done automatically.

chkconfig --add heartbeat

Configure heartbeat to specify your resources so it can properly manage them.

Create /etc/ha.d/ with the following contents.

# Node hostnames
node hostnameA
node hostnameB

# IP addresses of nodes
ucast eth0
ucast eth0

# Enable logging
use_logd yes
debug 1

# Don't fail back to the primary node when it comes back up
# NOTE: Set this to "on" if you want Zenoss to automatically migrate back to
# the primary server when it comes back up. \ auto_failback off

To secure communication between the cluster nodes, create /etc/ha.d/authkeys with the following contents.

auth 1
1 sha1 MySecretClusterPassword

Heartbeat requires that this file have restrictive permissions set on it. Run the following command to set the proper permissions.

chmod 600 /etc/ha.d/authkeys

Create the /etc/ha.d/haresources file with the following contents:

hostnameA \
drbddisk::zenhome \
Filesystem::/dev/drbd0::/opt/zenoss::ext3::defaults \
drbddisk::zenperf \
Filesystem::/dev/drbd1::/opt/zenoss/perf::ext3::noatime,data=writeback \
IPaddr:: \
mysqld \

Preparing for Cluster Startup

With the cluster fully configured, you must shut down the resources to prepare to prime the master, then start the cluster for the first time. Issue the following commands:

service zenoss stop
service mysqld stop
umount /opt/zenoss/perf
umount /opt/zenoss
drbdadm secondary zenhome
drbdadm secondary zenperf
service heartbeat stop

Starting the Cluster

These instructions apply only to the primary cluster node unless otherwise noted. They only need need to be performed to start the cluster for the first time. After that, the heartbeat daemon will manage the resources even in the case of node reboots.

Run the following commands on the primary node to make it the authoritative source for the replicated file systems:

drbdadm -- --overwrite-data-of-peer primary zenhome
drbdadm -- --overwrite-data-of-peer primary zenperf

Run the following command on the primary node to start heartbeat and start managing the shared resources.

service heartbeat start

After you confirm that Zenoss is up and running on the primary node. You can run the same command on the secondary node to have it join the cluster. Your cluster is now up and running. The secondary node will take over in the event of a failure on the primary node.

Usage & Operation

Migrating Resources

The best way to manually migrate Zenoss to the currently inactive cluster node is to stop heartbeat on the active node. Run the following command as the root user on the active node:

service heartbeat stop

If you have auto_failback set to off in your /etc/ha.d/, immediately start the heartbeat service on this node after you confirm that Zenoss is running on the other node.

If you have auto_failback set to on, start the heartbeat service again when you want Zenoss to be migrated back to this node.

Checking the Cluster Status

There are some commands to be aware of that enable you to check on the status of your cluster and the nodes and resources that make it up.

To check on the status of the DRBD re plicated file systems, run the following command:

service drbd status

On the primary node of an active cluster you expect to see the following results from this command. The important columns are:

  • cs = Connection State
  • st = State
  • ds = Data State
m:res CS st ds p mounted fstype
0:zenhome Connected Primary/Secondary UpToDate/UpToDate C /opt/zenoss ext3
1:zenperf Connected Primary/Secondary UpToDate/UpToDate C /opt/zenoss/perf ext3


 You can run a similar command to check on the general health of the heartbeat service.

   service heartbeat status

Use the cl_status tool to get more detailed information about the current state of the cluster. The following are usage examples:

    [root@hostnameA ~]# cl_status hbstatus
    Heartbeat is running on this machine.
    [root@hostnameA ~]# cl_status listnodes
    [root@hostnameA ~]# cl_status nodestatus hostnameB
    [root@hostnameA ~]# cl_status nodestatus hostnameA
    [root@hostnameA ~]# cl_status rscstatus


This section outlines some common failure modes and the steps required to correct them.

DRBD Split-Brain

It is possible for the replicated file systems to get into a state where neither node can determine which one has the authoritative source of data. This state is known as split-brain. To resolve the problem, choose the node with the older, invalid data and run the following commands on it:

    drbdadm secondary zenhome
    drbdadm -- --discard-my-data connect zenhome
    drbdadm secondary zenperf
    drbdadm -- --discard-my-data connect zenperf

After running the commands on the node with older data, run the following commands on the node with the newer, valid data:

    drbdadm connect zenhome
    drbdadm connect zenperf

MySQL Database does not start after a failover

If the MySQL Database does not start after performing an HA failover, log files from /var/log/mysqld.log might show the following:

   090825 12:11:08 InnoDB: Starting shutdown...
    090825 12:11:11 InnoDB: Shutdown completed; log sequence number 0 440451886
    090825 12:11:11 [Note] /usr/libexec/mysqld: Shutdown complete

    090825 12:11:11 mysqld ended

    100330 22:52:21 mysqld started
    InnoDB: Error: log file ./ib_logfile0 is of different size 0 524288000 bytes
    InnoDB: than specified in the .cnf file 0 5242880 bytes!
    100330 22:52:21 [Note] /usr/libexec/mysqld: ready for connections.
    Version: '5.0.45' socket: '/var/lib/mysql/mysql.sock' port: 3306 Source distribution
    100330 22:52:50 [ERROR] /usr/libexec/mysqld: Incorrect information in file: './events/heartbeat.frm'

As noted in the MySQL forum, message 247923, open the my.cnf file on the system and add the following line:

    innodb_log_file_size = 524288000

Updating the hub host to point at a floating ip or hostname

As the zenoss user at at command line enter into the zendmd python interpreter:

>>> dmd.Monitors.Hub.localhost.hostname = "MY_FQDN OR IP"
>>> dmd.Monitors.Hub.localhost._isLocalHost = False
>>> commit()

Was this article helpful?
0 out of 0 found this helpful
Have more questions? Submit a request


Powered by Zendesk