Applies To
- Zenoss 4.x
- Zenoss 3.x
Pre-Requisites
- A working Zenoss installation
Summary
By default, Zenoss generates events when some monitored data points exceed defined boundaries (in Zenoss these are called thresholds).
Events are created when the value of a data point crosses a defined threshold. When events are generated, they appear in the event console.
Zenoss administrators might want to add additional thresholds or edit existing ones by modifying monitoring templates.
The following scenarios explain how to configure a threshold to generate an event so administrators can become aware when a particular threshold is breached. For example, when CPU utilization on particular devices exceeds a maximum value that signals heavy use, administrators may wish to be notified via a Zenoss event. This document provides four different examples of how to establish or edit CPU utilization thresholds for device classes.
About Performance Monitoring and Templates
In Zenoss, monitoring templates define how performance is monitored and what data is collected. In general, monitoring templates consist of:
- Data point(s) - one or more data points for collection.
- Threshold(s) - one or more values that define minimum and/or maximum values for collected data points and create events if values cross the pre-defined value boundary.
- Graph Definition(s) - one or more graphical depictions of the collected data points over a period of time for specific devices or device classes.
Monitoring templates vary by device class to accommodate each class' unique monitoring requirements. They specify how and what information Zenoss collects. A monitoring template must be associated with or bound to a specific device or device class for the template to be used. Zenoss includes a set of pre-defined bindings between templates and device classes by default. There are however, a variety of options for customizing template bindings and / or creating entirely new templates. Those options include:
- Editing the default template without changing the default template bindings. This will affect a 'global' change where the change will apply to all devices in a device class.
- Changing the existing template bindings such that existing templates are bound to classes other than the defaults.
- Creating new device classes and binding new templates (or edited versions of default templates) to them.
- Creating device specific clones of default templates (a process known of as "overriding" in Zenoss) then customizing them.
When editing templates, it is important to decide whether you want to alter a monitoring template for all devices in a class, to make the change for an individual device (overriding the template), or to make the change only for a subset of devices in a class (best accomplished by creating a new child class, overriding the template to the child class, then editing the new template). If you edit a template for a class, the changes affect all devices in that class. Ensure that you make changes to and apply or associate your template at the intended level. Be aware that changes made at the wrong level can result in high event rates or unintentional exclusion of devices from monitoring of one or more data points.
The following scenarios help illustrate the creation of thresholds.
Procedures
Example Scenario 1:
Audit the Default Server/Linux Device Class Monitoring Template (As An Example To Guide You In Creating Your Own Thresholds)
The default Server/Linux template polls various data points on Linux servers using the SNMP protocol. The default template includes a threshold that will generate an event should any server in the class experience a condition wherein the CPU idle percentage drops below 2%. The template uses the ssCpuIdle data point because the ssCpuIdle value reflects the inactive percentage time for the entire CPU rather than the percentage utilization of any CPU core in particular (which would be less helpful). To audit the template as a guide for how you might want to create your own threshold, complete the following (you can also follow these steps if you want to edit the default value of 2% idle):
View Or Edit an Existing Threshold
- Navigate to Advanced →Monitoring Templates→ Device → Server/Linux.
- Select the threshold (low CPU idle) in the Thresholds list.
- Click the action wheel just above to open the Edit Threshold dialog and view the details.
- From the Data Points list verify ssCpuIdle_ssCpuIdle is located in the right-hand (selected) column. If not:
- Use the → arrow to move it into the right-hand (selected) column.
- If desired, edit the value for the minimum threshold in the Minimum Value field. The default setting is 2 (this means 2% CPU idle time or 98 % busy). For example, if you wish to receive an event when Linux servers' CPU % utilized exceeds 90%, use a value of 10 (the inactive processor percentage value).
- Verify or change the default event Severity setting.
- Click [SAVE] to save and apply your settings. This edits the template and applies it to all devices.
When your defined threshold is reached for a device, an event is generated. This event has the severity level you specified when the threshold was created or edited. This event is for any device with an idle value that drops below your specified minimum setting.
Optional: Add the Threshold to the Graph
To add the threshold to the existing CPU Utilization graph, perform the following:
- In the Graph Definitions pane, select CPU Utilization.
- Click the action wheel and choose Manage Graph Points to display the dialog.
- Click the [+] (plus sign) and choose Threshold from the drop-down list to display the Add Threshold dialog.
- From the drop-down list choose the low CPU idle threshold.
- Click [SUBMIT].
- Click [SAVE].
To see your threshold in the CPU Utilization graph:
- Select Infrastructure → Devices → Server → Linux.
- Select a device from the Device list in the right pane.
- Click Graphs in the left side-bar.
- View the graphs in the Performance Graphs pane.
Example Scenario 2 :
Edit The Server/SSH/Linux Template to Add a Threshold
The Server/SSH/Linux template monitors the CPU Idle value by default. The ssCpuIdle value reflects the percentage each server's CPU is inactive.
Create a Threshold
To create a threshold using ssCpuIdle, perform the following:
- Navigate to Advanced →Monitoring Templates→Device →Server/SSH/Linux.
- Click the [+] (plus sign) in the top-right Thresholds field to generate a new threshold.
- Choose MinMaxThreshold, name it Low CPU idle and click [ADD].
- Select the new threshold (Low CPU idle) in the Thresholds list.
- Click the action wheel just above to open the Edit Threshold dialog and view the details.
- From the Data Points list select CPU_ssCpuIdleand use the → arrow to move it into the right-hand (selected) column.
- Enter a value for the minimum threshold in the Minimum Value field.
For example, to set a threshold for when the CPU usage exceeds 90%, enter a value of 10 (the inactive processor percentage value). For a threshold of 98%, use a value of 2. - Verify or change the default event Severity setting.
- Click [SAVE] to save and apply your settings. This edits the template and applies it to all devices in the /Devices/Server/SSH/Linux class. When your defined threshold is reached for a device, an event is generated. This event has the severity level you specified when the threshold was created or edited.
Optional: Add the Threshold to the Graph
To add the new threshold to a graph, perform the following:
- In the Graph Definitions pane, select CPU Utilization.
- Click the action wheel and choose Manage Graph Points to display the dialog.
- Click the [+] (plus sign) and choose Threshold from the drop-down list to display the Add Threshold dialog.
- From the drop-down list choose the low CPU idle threshold you created.
- Click [SUBMIT].
- Click [SAVE].
To see your threshold in the CPU Utilization graph:
- Select Infrastructure → Devices → Server → SSH → Linux.
- Select a device from the Device list in the right pane.
- Click Graphs in the left side-bar.
- View the graphs in the Performance Graphs pane.
Example Scenario 3:
Edit the Device /Server/Windows/WMI Monitoring Template to Add a Threshold
In the Device → /Server/Windows/WMI monitoring template, the polled counter for the CPU returns the percentage the CPU is used instead of the percent CPU idle value described in Scenarios #1 and #2 for Linux servers.
Create a Threshold
- Navigate to Advanced →Monitoring Templates → Device_WMI → Server/Windows.
- Click the [+] (plus sign) in the top-right Thresholds field to generate a new threshold.
- Choose MinMaxThreshold, name it CPU_Utilization_High and click [ADD].
- Select the threshold (CPU_Utilization_High) in the Thresholds list.
- Click the action wheel just above to open the Edit Threshold dialog and view the details.
- From the Data Points list select ProcessorTotalProcesstime and use the arrow to move it into the right-hand (selected) column.
- Enter the maximum CPU usage threshold value in the Maximum Value field. For example, to set a threshold to generate an event when the CPU usage exceeds 90%, enter the value 90.
- Specify /Perf/CPU in the Event Class field.
- Verify or change the default event Severity setting.
- Click [SAVE] to save and apply your settings. This edits the template and applies it to all devices.
When your defined threshold is reached for a device, an event is generated. This event has the severity level you specified when the threshold was created or edited. This event is for any device with a CPU usage value that breaches your specified maximum setting.
Optional: Add the Threshold to the Graph
To add the new threshold to a graph, perform the following:
- In the Graph Definitions pane, select CPU Utilization.
- Click the action wheel and choose Manage Graph Points to display the dialog.
- Click the [+] (plus sign) and choose Threshold from the drop-down list to display the Add Threshold dialog.
- From the drop-down list choose the CPU_Utilization_High threshold you created.
- Click [SUBMIT].
- Click [SAVE].
To see your threshold in the CPU Utilization graph:
- Select Infrastructure → Devices → Server →Windows → WMI.
- Select a device from the Device list in the right pane.
- Click Graphs in the left side-bar.
- View the graphs in the Performance Graphs pane.
Example Scenario 4:
Create a Template Clone for Individual Device Use
Although you can edit and save changes to (overwrite) default templates, best practice dictates leaving the original template in pristine condition and creating a clone of it to manipulate and apply instead. There are two broad approaches that can be taken. If the change is desired for more than one device, a new device class can be created. The default template can be overridden to the new class, then edited. If the change is desired only for a single device, then the template can be overridden directly to the device, and the edits made thereafter. The override process creates a local clone of the original template, leaving the original intact and unchanged. This scenario explains how to clone a template and apply it to an individual device, then edit its existing threshold to specify a new value.
Create a Template Clone
- Select Infrastructure → Devices.
- Select the device from the Device List to display the device information.
Note that the currently bound template is shown below, circled in green.
- Click the Action Wheel at the bottom of the Devices (left) pane.
- Select the Override Template Here option.
- From the Override Templates pop-up window, click the ↓ to display the device template list.
- Click to select the device template name.
- Click Submit.
The new (cloned) template is displayed in the left pane with the annotation Locally Defined, shown in the following figure, circled in red.
Create a Threshold
To create or edit a threshold in the new template, complete the following:
- Select the template by clicking on its name in the left pane.
- Select the threshold (low CPU idle) in the Thresholds list.
- Click the action wheel just above to open the Edit Threshold dialog and view the details.
- From the Data Points list verify ssCpuIdle_ssCpuIdle is located in the right-hand (selected) column. If not:
- Use the → arrow to move it into the right-hand (selected) column.
- Enter a new value for the minimum threshold in the Minimum Value field. The default setting is 2 (this means 2% CPU idle time or 98 % busy). To set a different threshold, for example, change the value to a threshold for when the CPU usage exceeds 90%, use a value of 10 (the inactive processor percentage value).
- Verify or change the default event Severity setting.
- Click [SAVE] to save and apply your settings.
Optional: Add the Threshold to the Graph
To verify the threshold is added to a graph, perform the following:
- In the Graph Definitions pane, select CPU Utilization.
- Click the action wheel and choose Manage Graph Points to display the dialog.
- Click the [+] (plus sign) and choose Threshold from the drop-down list to display the Add Threshold dialog.
- From the drop-down list choose the low CPU idle threshold.
- Click [SUBMIT].
- Click [SAVE].
To see your threshold in the CPU Utilization graph:
- Select Infrastructure → Devices → Server → Linux.
- Select a device from the Device list in the right pane.
- Click Graphs in the left side-bar.
- View the graphs in the Performance Graphs pane.
Comments