- Analytics & Optimization 4.2.2
Administrators may encounter a condition where the Analytics server fails to start or complete the importing of performance data from one or more Zenoss collectors. Before describing the steps necessary for administrators to troubleshoot this condition, it may be instructive to review the steps the Zenoss software must complete to extract, transform, and load (“ETL”) performance data from Resource Manager into the Analytics and Optimization server.
On a predefined schedule (every 8 hours by default), the ZenPerfETL daemon on a collector will extract the data from the collector’s RRD files and transform them into CSV format. The CSV data is then posted (via HTTP or ssh) to the /opt/zenoss_analytics/work/Catalina/localhost/reporting directory of the Analytics server to await loading into the Analytics data warehouse by the Analytics server.
Performance batches will typically go through the following states:
|UNSTARTED||The batch has been scheduled, but not picked up by the ZenPerfETL daemon on the appropriate extractor (collector) yet. Note that by default, extractors will not start batches until 15 minutes after the batch extract end window time.|
|EXTRACTING||The extractor is currently extracting the batch (i.e., pulling information from the RRD files to create a CSV file).|
|STAGING||The extract completed successfully and data has been sent to the Analytics server for subsequent load into the data warehouse. Note that if the extract process was successful, but no data was extracted, this stage is skipped. The extractor will set the batch state directly to COMPLETED.
Note: as of Analytics 4.2.3, there is no way to tell through the UI or database tables that a batch is currently in the process of being loaded (i.e., there is no “LOADING” batch state). This issue is anticipated to be addressed in version 4.3.
|COMPLETED||The data was successfully loaded into the data warehouse (or there was no data to load).|
The zenoss_analytics service on the Analytics server checks for new data at the interval defined in the Analytics data warehouse (reporting) database, which is hosted by the ZenDS (MySQL) instance on the Analytics database server. The value is specified in the QRTZ_SIMPLE_TRIGGERS.REPEAT_INTERVAL column where TRIGGER_NAME = 'PerformanceTrigger'. This check interval is typically every two seconds (note that the database value is in milliseconds). When the Analytics server finds new CSV files available for importation, it will load the data into the reporting database.
Begin the troubleshooting process by logging in to the Analytics server command line.
- Switch to the zenoss user:
# su - zenoss
- Launch the ZenDS CLI:
$ zends -u root
- Change to the reporting database:
- Run the following SQL command to show the current import batches that are in the UNSTARTED, EXTRACTING, or STAGING state which will reveal batches that may not have started, are being processed, or are awaiting load:
SELECT me.extractor_name, me.extractor_fqdn, mb.batch_state, mb.batch_begin, mb.batch_end, NOW() FROM meta_batch AS mb JOIN meta_extractor AS me ON me.extractor_key = mb.extractor_key WHERE mb.batch_state IN ('UNSTARTED', 'STAGING', 'EXTRACTING') AND me.extractor_type = 'PERFORMANCE' ORDER BY mb.batch_begin, me.extractor_name, mb.batch_state;
If some batches appear in the results as UNSTARTED and the time shown in the batch_end column is more than 15 minutes after the current time on the database server (shown in the NOW() column), there may be problems with the zenperfetl daemon on the collector that should have picked up the batch for extract.
- To investigate, connect to the command line of the collector in question. The collector' host name will be identified in the extractor_fqdn field of the ZenDS output in step 4.
- Once logged on to the collector, switch to the zenoss user:
# su - zenoss
- Change to the log directory:
$ cd /opt/zenoss/log/extractor_name/
where extractor_name is the collector ID as listed in the output from step 4 above.
- Check the time the zenperfetl log file was most recently updated:
$ ls -l zenperfetl.log
If the timestamp on the file is older than the last expected ETL, then the zenperfetl daemon may not be operating correctly. By default the ETL process runs every 8 hours, so a timestamp older than 8 hours would likely indicate that the zenperfetl process has not run on schedule.
- If the timestamp is out of date, determine whether zenperfetl is running:
$ ps -ef | grep zenperfetl
- If the daemon is not running, start it:
$ zenperfetl start
- You may want to monitor the zenperfetl log file to be sure the daemon is running correctly. The following command will output the contents of the log file as it grows until you press Ctrl+C:
$ tail -f zenperfetl.log
If zenperfetl is operating correctly, you should see log entries indicating that it is picking up batches for extract and subsequently uploading files to the Analytics server. To verify this, you may wish to rerun the SQL query listed in step 4 above to verify that the number of “UNSTARTED” queries for the collector in question has begun to decline.
- Alternatively, you can get a summary of all of the batches at particular states using the following query at the ZenDS prompt:
SELECT batch_state, COUNT(*)
GROUP BY batch_state;
Once batches have successfully extracted, you should see the Analytics server start to load them and they should change from the STAGING to the COMPLETED batch_state and they should also disappear from the list generated by the query above.
- If this is not the case, you will need to troubleshoot the load process. Running $ tail on the /opt/zenoss_analytics/logs/zenoss_analytics.out log on the Analytics server will show whether batches are being picked up for load.
- If you need to investigate the state of a particular processes further, you can set the “pager” feature at the ZenDS command line to search for a particular term (in this case “query” so you can identify batches currently in process), and then run a command to show active processes. To do so, enter the following at the ZenDS command line:
pager grep -i query; show full processlist;
Running this command in the absence of activity in zenoss_analytics.out is currently the only definitive way to see whether data is actually in the process of loading. You should see processes related to data load listed (mentioning LOAD, stg_fct, or insert ignore/replace into SQL statements).
- After examining your results, be sure to turn the pager feature off by entering the following:
If no load activity is found via this method, additional debugging assistance may be necessary from Zenoss Support.