IMPORTANT NOTE: As of December 2013, Zenoss recommends that all new Resource Manager installs and any upgrades of Resource Manager that involve an upgrade of the host operating system to RHEL Centos 6.4 or later follow the procedure outlined in How to Configure Zenoss Resource Manager for use with Red Hat Cluster Suite instead of following the procedure documented here.
Prerequisites
- root or sudo access to the servers
- access to repositories to obtain the DRBD and Heartbeat components
- additional pre-requisites as described in procedure below
Applies To
- RPM based installs on CentOS or RHEL only
- Zenoss 2.x
- Zenoss 3.x
Summary
Zenoss can be setup in a highly available (active/passive) configuration rather easily by making use of common components of the Linux HA project including heartbeat and DRBD. DRBD (Distributed Replicated Block Device) is used to provide two servers a constantly synchronized copy of the MySQL and ZODB data. The heartbeat service is used to handle the bringing up/down the slave (passive) node when the master fails.
Procedure
Conventions
Certain assumptions have been made to make this article as widely applicable as possible. The following conventions are used throughout the article, and should be replaced with your local settings. These instructions are targeted primarily to CentOS 5, but with some minor modifications can be applied to most Linux distributions:
- Hostnames
- primary node: hostnameA
- secondary node: hostnameB
- IP Addresses
- primary node: 10.10.10.5
- secondary node: 10.10.10.6
- shared cluster: 10.10.10.4
- Physical Block Devices
- /opt/zenoss: /dev/sda2
- /opt/zenoss/perf: /dev/sda5
Prerequisites
You must have the following in place before you can configure the system for high availability. All steps must be performed on both servers destined to be in the highly available cluster.
File System Layout
Three separate separate file systems are required for the best performing setup.
- / - at least 8GB - stores operating system files and will not be replicated.
- /opt/zenoss - at least 50GB - stores most Zenoss files and will be replicated.
- /opt/zenoss/perf - at least 10GB - stores Zenoss performance data and will be replicated.
After booting up for the first time, unmount the latter two files systems because they will be repurposed for replication. Use the following commands:
umount /opt/zenoss/perf
umount /opt/zenoss
Remove these two file systems from the /etc/fstab file so they are not automatically remounted upon boot.
LABEL=/opt/zenoss /opt/zenoss ext3 defaults 1 2
LABEL=/opt/zenoss/perf /opt/zenoss/perf ext3 defaults 1 2
Disk Replication with DRBD
You must install the drbd and kmod-drbd packages to enable disk replication for the cluster. The names of these packages can differ from one Linux distribution to another, but the following will work to install them on CentOS 5.1.
yum install drbd kmod-drbd
After successfully running the command, configure DRBD to replicate the file systems.
NOTE: The format below is a shorthand notation available only in DRBD 8.2.1 and later. If you are running a version prior to 8.2.1, see the DRBD user's guide for the proper format. The guide can be found at:
http://www.drbd.org/users-guide/s-configure-resource.html
Replace the existing contents of the /etc/drbd.conf file with the following.
global { usage-count no; } common { protocol C; disk { on-io-error detach; no-disk-flushes; no-md-flushes; } net { max-buffers 2048; unplug-watermark 2048; } syncer { rate 700000K; al-extents 1801; } } resource zenhome { device /dev/drbd0; disk /dev/sda2; meta-disk internal; on hostnameA { address 10.10.10.5:7789; } on hostnameB { address 10.10.10.6:7789; } } resource zenperf { device /dev/drbd1; disk /dev/sda5; meta-disk internal; on hostnameA { address 10.10.10.5:7788; } on hostnameB { address 10.10.10.6:7788; } }
When DRBD is successfully configured, initialize the replicated file systems with the following commands:
dd if=/dev/zero bs=1M count=1 of=/dev/sda2
dd if=/dev/zero bs=1M count=1 of=/dev/sda5
sync
drbdadm create-md zenhome
\ drbdadm create-md zenperf
service drbd start
drbdadm -- -o primary zenhome
\ drbdadm -- -o primary zenperf
mkfs.ext3 /dev/drbd0
mkfs.ext3 /dev/drbd1
You can now mount these two replication file systems with the following commands:
mount /dev/drbd0 /opt/zenoss
mkdir /opt/zenoss/perf
mount /dev/drbd1 /opt/zenoss/perf
MySQL Setup
NOTE: If you are using a stand-alone mysql server you can skip this section because each host will be configured individually to use the "off-box" MySQL server.
Verify the MySQL server package is installed on the system.
yum install mysql-server
It is only necessary to move its data onto the /opt/zenoss file system for it to be replicated. You can do this by performing the following:
service mysqld stop
mv /var/lib/mysql /opt/zenoss
ln -s /opt/zenoss/mysql /var/lib/mysql
service mysqld start
Zenoss Setup
Everything is now in place for the Zenoss installation. Perform a normal Zenoss installation according to the regular instructions.
Post Installation steps
After the installation is complete, reset some permissions that were affected by the file system setup:
chown zenoss:zenoss -R /opt/zenoss/perf
Ensure that the mysql data files are not pushed down to remote collectors (you can skip this step if you do not use remote collectors)
- Navigate to: $ZENHOME/ZenPacks/ZenPacks.zenoss.DistributedCollector-VERSION.egg/ZenPacks/zenoss/DistributedCollector/conf
- Add the following exclusion to exfiles:
- mysql
Heartbeat Setup
To setup heartbeat to manage your resources you must first install the package. The mechanism to install the package differs depending on your Linux distribution, but the following will work on CentOS 5.1.
yum install heartbeat
You might need to install the heartbeat service in case this was not done automatically.
chkconfig --add heartbeat
Configure heartbeat to specify your resources so it can properly manage them.
Create /etc/ha.d/ha.cf with the following contents.
# Node hostnames node hostnameA node hostnameB # IP addresses of nodes ucast eth0 10.10.10.5 ucast eth0 10.10.10.6 # Enable logging use_logd yes debug 1 # Don't fail back to the primary node when it comes back up # NOTE: Set this to "on" if you want Zenoss to automatically migrate back to
# the primary server when it comes back up. \ auto_failback off
To secure communication between the cluster nodes, create /etc/ha.d/authkeys with the following contents.
auth 1
1 sha1 MySecretClusterPassword
Heartbeat requires that this file have restrictive permissions set on it. Run the following command to set the proper permissions.
chmod 600 /etc/ha.d/authkeys
Create the /etc/ha.d/haresources file with the following contents:
hostnameA \ drbddisk::zenhome \ Filesystem::/dev/drbd0::/opt/zenoss::ext3::defaults \ drbddisk::zenperf \ Filesystem::/dev/drbd1::/opt/zenoss/perf::ext3::noatime,data=writeback \ IPaddr::10.10.10.4/24 \ mysqld \ zenoss
Preparing for Cluster Startup
With the cluster fully configured, you must shut down the resources to prepare to prime the master, then start the cluster for the first time. Issue the following commands:
service zenoss stop service mysqld stop umount /opt/zenoss/perf umount /opt/zenoss drbdadm secondary zenhome drbdadm secondary zenperf service heartbeat stop
Starting the Cluster
These instructions apply only to the primary cluster node unless otherwise noted. They only need need to be performed to start the cluster for the first time. After that, the heartbeat daemon will manage the resources even in the case of node reboots.
Run the following commands on the primary node to make it the authoritative source for the replicated file systems:
drbdadm -- --overwrite-data-of-peer primary zenhome drbdadm -- --overwrite-data-of-peer primary zenperf
Run the following command on the primary node to start heartbeat and start managing the shared resources.
service heartbeat start
After you confirm that Zenoss is up and running on the primary node. You can run the same command on the secondary node to have it join the cluster. Your cluster is now up and running. The secondary node will take over in the event of a failure on the primary node.
Usage & Operation
Migrating Resources
The best way to manually migrate Zenoss to the currently inactive cluster node is to stop heartbeat on the active node. Run the following command as the root user on the active node:
service heartbeat stop
If you have auto_failback set to off in your /etc/ha.d/ha.cf, immediately start the heartbeat service on this node after you confirm that Zenoss is running on the other node.
If you have auto_failback set to on, start the heartbeat service again when you want Zenoss to be migrated back to this node.
Checking the Cluster Status
There are some commands to be aware of that enable you to check on the status of your cluster and the nodes and resources that make it up.
To check on the status of the DRBD re plicated file systems, run the following command:
service drbd status
On the primary node of an active cluster you expect to see the following results from this command. The important columns are:
- cs = Connection State
- st = State
- ds = Data State
m:res | CS | st | ds | p | mounted | fstype |
0:zenhome | Connected | Primary/Secondary | UpToDate/UpToDate | C | /opt/zenoss | ext3 |
1:zenperf | Connected | Primary/Secondary | UpToDate/UpToDate | C | /opt/zenoss/perf | ext3 |
You can run a similar command to check on the general health of the heartbeat service.
service heartbeat status
Use the cl_status tool to get more detailed information about the current state of the cluster. The following are usage examples:
[root@hostnameA ~]# cl_status hbstatus Heartbeat is running on this machine. [root@hostnameA ~]# cl_status listnodes hostnameB hostnameA [root@hostnameA ~]# cl_status nodestatus hostnameB dead [root@hostnameA ~]# cl_status nodestatus hostnameA active [root@hostnameA ~]# cl_status rscstatus
all
Troubleshooting
This section outlines some common failure modes and the steps required to correct them.
DRBD Split-Brain
It is possible for the replicated file systems to get into a state where neither node can determine which one has the authoritative source of data. This state is known as split-brain. To resolve the problem, choose the node with the older, invalid data and run the following commands on it:
drbdadm secondary zenhome drbdadm -- --discard-my-data connect zenhome drbdadm secondary zenperf drbdadm -- --discard-my-data connect zenperf
After running the commands on the node with older data, run the following commands on the node with the newer, valid data:
drbdadm connect zenhome drbdadm connect zenperf
MySQL Database does not start after a failover
If the MySQL Database does not start after performing an HA failover, log files from /var/log/mysqld.log might show the following:
090825 12:11:08 InnoDB: Starting shutdown... 090825 12:11:11 InnoDB: Shutdown completed; log sequence number 0 440451886 090825 12:11:11 [Note] /usr/libexec/mysqld: Shutdown complete 090825 12:11:11 mysqld ended 100330 22:52:21 mysqld started InnoDB: Error: log file ./ib_logfile0 is of different size 0 524288000 bytes InnoDB: than specified in the .cnf file 0 5242880 bytes! 100330 22:52:21 [Note] /usr/libexec/mysqld: ready for connections. Version: '5.0.45' socket: '/var/lib/mysql/mysql.sock' port: 3306 Source distribution 100330 22:52:50 [ERROR] /usr/libexec/mysqld: Incorrect information in file: './events/heartbeat.frm'
As noted in the MySQL forum, message 247923, open the my.cnf file on the system and add the following line:
innodb_log_file_size = 524288000
Updating the hub host to point at a floating ip or hostname
As the zenoss user at at command line enter into the zendmd python interpreter:
$zendmd
>>> dmd.Monitors.Hub.localhost.hostname = "MY_FQDN OR IP"
>>> dmd.Monitors.Hub.localhost._isLocalHost = False
>>> commit()
Comments