How to troubleshoot Resource Manager 5.x services that fail to start – Zenoss

Applies To

Zenoss Resource Manager 5.x

Summary

If a Resource Manager service does not start or perform correctly in Zenoss Resource Manager, administrators should begin the troubleshooting process by consulting the various logs that are available to help pinpoint the problem.

Note: A variety of troubleshooting procedures are presented in this document; they are not necessarily intended to be followed in sequence.

Procedures

The following procedures offer a variety of methods for retrieving system logs from both the host system and / or from individual service containers.

Host System Journal

Consult the host system's journal to begin troubleshooting:

Become root or a user with root authority on the system hosting the problematic service:
```
sudo su
```
Issue the following command to output the journal:
```
journalctl -u serviced -f -o cat
```
Search through the journal output for items that relate to the failing service. These can provide clues for how to proceed.
Stop the journal output:
```
 <control>c
```

Investigating Services

There are various approaches available to the administrator to investigate an ailing service, depending on how the service behaves:

Service is not running / fails to start - attempt a restart & review control center logs
Service is running but failing - attach to the container to view the log in real time or search Kibana logs in Control Center

Scenario 1: Service is not running (review control center logs)

Login to the Control Center.
Navigate to the Zenoss.resmgr page:
Under Applications, click Zenoss.resmgr.
Scroll down to the Services section.
Search for services that are not running. These are indicated by a grey icon with a dash (-) inside.

If you discover a service that is not running, try to start the service. Click the start icon.
Search for failed healthchecks near the service in the Control Center UI. They are indicated by red circles with exclamation marks (!) inside.

If you find a failed health check, perform the following:
1. Hover the mouse over the failed service to display an informational pop-up, in the graphic below, zenperfsnmp:
2. If the service is failing because of a dependency on another service, begin troubleshooting with the 'root' service that is causing other services to fail. In this example the root service that failed is zenhub.
3. Scroll to find the zenhub service name.
4. Hover the mouse over the zenhub service to display an informational pop-up.
5. Click on the zenhub service name, to display the page for the service.
6. Scroll down to Instances.
7. Click the Log link under Instances to view or download the log.
8. Click Download
9. Open or Save (download) the file for analysis.

Scenario 2: Service is running but failing (attach to containers to view logs)

Attach to the container of the failed service and view the logs.
For example, if you are troubleshooting a particular zenhub instance:

Become root or a user with root authority:
```
sudo su -
```
Attach to the service:
```
serviced service attach zenhub/0
```
Become the zenoss user:
```
su - zenoss
```
Use the tail command to output the log and view what happens when the service is restarted:
1. View the event log for the service. For example, output the last 10 lines of the log for the zenhub service, zenhub.log:
```
tail -10 $ZENHOME/log/zenhub.log 
```
2. Attempt to start the service, either through the Control Center UI or via command line. For example:
3. Watch the log output from the tailed file for clues.

Administrators can also use Kibana to search through logs for the troubled service.