1. About This Guide
Welcome to the OpenNMS Horizon Administrators Guide. This documentation provides information and procedures on setup, configuration, and use of the OpenNMS Horizon platform. Using a task-based approach, chapters appear in a recommended order for working with OpenNMS Horizon:
-
Opt in or out of usage statistics collection (requirement during first login).
-
Create users and security roles.
-
Provision your system.
1.1. Audience
This guide is suitable for administrative users and those who will use OpenNMS Horizon to monitor their network.
1.2. Related Documentation
Installation Guide: how to install OpenNMS Horizon
Developers Guide: information and procedures on developing for the OpenNMS Horizon project
OpenNMS 101: a series of video training tutorials that build on each other to get you up and running with OpenNMS Horizon
OpenNMS 102: a series of stand-alone video tutorials on OpenNMS features
OpenNMS Helm: a guide to OpenNMS Helm, an application for creating flexible dashboards to interact with data stored by OpenNMS
Architecture for Learning Enabled Correlation (ALEC): guide to this framework for logically grouping related faults (alarms) into higher level objects (situations) with OpenNMS.
1.3. Typographical Conventions
This guide uses the following typographical conventions:
Convention |
Meaning |
---|---|
bold |
Indicates UI elements to click or select in a procedure, and the names of UI elements like dialogs or icons. |
italics |
Introduces a defined or special word. Also used for the titles of publications. |
|
Anything you must type or enter, and the names for code-related elements (classes, methods, commands). |
1.4. Need Help?
-
join the OpenNMS Discussion chat
-
join our community on Discourse
-
contact sales@opennms.com to purchase customer support
2. Data Choices
The first time a user with the Admin
role logs into the system, a prompt appears for permission to allow the Data Choices
module to collect and publish anonymous usage statistics to https://stats.opennms.org.
The OpenNMS Group uses this information to help determine product usage, and improve the OpenNMS Horizon software.
Click Show me what is being sent to see what information is being collected. Statisitcs collection and publication happen only once an admin user opts-in.
When enabled, the following anonymous statistics are collected and published on system startup and every 24 hours after:
-
System ID (a randomly generated UUID)
-
OpenNMS Horizon Release
-
OpenNMS Horizon Version
-
OS Architecture
-
OS Name
-
OS Version
-
Number of alarms in the
alarms
table -
Number of events in the
events
table -
Number of IP interfaces in the
ipinterface
table -
Number of nodes in the
node
table -
Number of nodes, grouped by System OID
You can enable or disable usage statistics collection at any time by choosing admin>Configure OpenNMS>Additional Tools>Data Choices and choosing Opt-in or Opt-out in the UI. |
3. User Management
Managing users involves the following tasks:
3.1. First-Time Login and Data Choices
Access the OpenNMS Horizon web application at http://<ip-or-fqdn-of-your-server>:8980/opennms.
The default user login is admin
with the password admin
.
The first time you log in we prompt for permission to allow the Data Choices
module to collect and publish anonymous usage statistics to https://stats.opennms.org.
The OpenNMS Group uses this information to help determine product usage and to improve the OpenNMS Horizon software.
Click Show me what is being sent to see what information we collect. Statisitcs collection and publication happen only if an admin user opts in.
Admin users can enable or disable usage statistics collection at any time by logging into the UI, clicking the gear icon, selecting Data Choices in the Additional Tools area, and clicking Opt-in or Opt-out. |
Data Collection
When enabled, the Data Choices
module collects the following anonymous statistics and publishes them on system startup and every 24 hours after:
-
System ID (a randomly generated universally unique identifier (UUID))
-
OpenNMS Horizon Release
-
OpenNMS Horizon Version
-
OS Architecture
-
OS Name
-
OS Version
-
Number of alarms in the
alarms
table -
Number of events in the
events
table -
Number of IP interfaces in the
ipinterface
table -
Number of nodes in the
node
table -
Number of nodes, grouped by System OID
3.1.1. Admin User Setup
After logging in for the first time, make sure to change the default admin user password to a secure one:
-
Click the gear icon in the top right.
-
Choose Configure OpenNMS → Configure Users, Groups and On-Call roles and select Configure Users.
-
Click Modify beside the admin user.
-
In the User Password area, click Reset Password, update the password and click OK.
-
Click Finish at the bottom of the Modify User screen to save changes.
We recommend not using the default admin
user, but instead creating specific users with the admin role and/or other permissions.
This helps to keep track of who has performed tasks such as clearing alarms or creating notifications.
Do not delete the default admin and rtc users. The rtc user is used for the communication of the Real-Time Console on the start page to calculate the node and service availability. |
3.2. User Creation and Configuration
Only a user with admin privileges can create users and assign security roles to them.
We recommend creating a new user with admin privileges instead of using the default admin
(see Admin User Setup).
Ideally, each user account corresponds to a person, to help track who performs tasks in your OpenNMS Horizon system. Assigning different security roles to each user helps restrict what tasks the user can perform.
In addition to local users, you can configure external authentication services including LDAP / LDAPS, RADIUS, and SSO. Configuration specifics for these services are outside the scope of this documentation.
Do not delete the default admin and rtc users. The rtc user is used for the communication of the Real-Time Console on the start page to calculate the node and service availability. |
3.2.1. Creating a User
-
Log in as a user with administrative permissions.
-
Click the gear icon in the top right.
-
Choose Configure OpenNMS → Configure Users, Groups and On-Call roles and select Configure Users.
-
Click Add new user and specify a user ID, password, password confirmation and click OK.
-
Optional: add user information in the appropriate fields.
-
Optional: assign user permissions.
By default a new user has the following permissions: Acknowledge and work with alarms and noficiations. Cannot access the configure OpenNMS administration menu. Add the ROLE_ADMIN role to create a new admin. -
Optional: specify where to send messages to the user in the notification information area.
-
Optional: set a schedule for when a user should receive notifications.
-
Click Finish to save changes.
3.2.2. Create User Duty Schedule
A duty schedule specifies the days and times a user (or group) receives notifications, on a per-week basis. This feature allows you to customize a schedule based on your team’s hours of operation. Schedules are additive: a user could have a regular work schedule, and a second schedule for days or weeks when they are on call.
If OpenNMS Horizon needs to notify an individual user, but that user is not on duty at the time, it will never send the notification to that user.
Notifications sent to users in groups are different:
-
group on duty at time of notification – all users also on duty receive notification
-
group on duty, no member users on duty – notification is queued and sent to the next user who comes on duty
-
off-duty group – notification never sent
To add a duty schedule for a user (or group), follow these steps:
-
Log in as a user with administrative permissions.
-
Click the gear icon in the top-right.
-
Choose Configure OpenNMS → Configure Users, Groups and On-Call roles and select Configure Users (Configure Groups).
-
Choose the user (or group) you want to modify.
-
In the Duty Schedule area, select the number of schedules you want to add from the drop-down and click Add Schedule.
-
Specify the days and times during which you want the user (or group) to receive notifications.
-
Click Finish.
3.2.3. Assigning User Permissions
Create user permissions by assigning security roles. These roles regulate access to the web UI and the REST API to exchange monitoring and inventory information. In a distributed installation the Minion instance requires the ROLE_MINION permission to interact with OpenNMS Horizon.
Available security roles (those with an asterisk are the most commonly used):
Security Role Name | Description |
---|---|
ROLE_ADMIN* |
Permissions to create, read, update, and delete in the web UI and the ReST API. |
ROLE_ASSET_EDITOR |
Permissions only to update the asset records from nodes. |
ROLE_DASHBOARD |
Allow user access only to the dashboard. |
ROLE_DELEGATE |
Allow actions (such as acknowledging an alarm) to be performed on behalf of another user. |
ROLE_FLOW_MANAGER |
Allow user to edit flow classifications. |
ROLE_JMX |
Allow retrieving JMX metrics but does not allow executing MBeans of the OpenNMS Horizon JVM, even if they just return simple values. |
ROLE_MINION |
Minimum required permissions for a Minion to operate. |
ROLE_MOBILE |
Allow user to use OpenNMS COMPASS mobile application to acknowledge alarms and notifications via the REST API. |
ROLE_PROVISION |
Allow user to use the provisioning system and configure SNMP in OpenNMS Horizon to access management information from devices. |
ROLE_READONLY* |
User limited to reading information in the web UI; unable to change alarm states or notifications. |
ROLE_REPORT_DESIGNER |
Permissions to manage reports in the web UI and REST API. |
ROLE_REST |
Allow users to interact with the entire OpenNMS Horizon REST API. |
ROLE_RTC* |
Exchange information with the OpenNMS Horizon Real-Time Console for availability calculations. |
ROLE_USER* |
Default permissions for a new user to interact with the web UI: can escalate and acknowledge alarms and notifications. |
-
Log in as a user with administrative permissions.
-
Click the gear icon in the top right.
-
Choose Configure OpenNMS → Configure Users, Groups and On-Call roles and select Configure Users.
-
Click the modify icon next to the user you want to update.
-
Select the role from Available Roles in the Security Roles section.
-
Click Add to assign the security role to the user.
-
Click Finish to apply the changes.
-
Log out and log in to apply the new security role settings.
3.2.4. Creating custom securitry roles
To create a custom security role you need to define the name and specify the security permissions.
-
Create a file called
$OPENNMS_HOME/etc/security-roles.properties
. -
Add a property called
roles
, and for its value, a comma-separated list of the custom security roles, for example:
roles=operator,stage
The new custom security roles will appear in the web UI:
To define permissions associated with the custom security role, manually update the application context of the Spring Security here:
/opt/opennms/jetty-webapps/opennms/WEB-INF/applicationContext-spring-security.xml
3.3. Groups
A group is a collection of users. Organizing users into groups helps with notifications and allows you to assign a set of users to on-call roles to build more complex notification workflows.
3.3.1. Creating a User Group
-
Log in as a user with administrative permissions.
-
Click the gear icon in the top right.
-
Choose Configure OpenNMS → Configure Users, Groups and On-Call roles and select Configure Groups.
-
Specify a group name and description and click OK.
-
Add users to the group by selecting them from the Available Users column and using the arrows to move them to the Currently in Group column.
-
(Optional) Assign categories of responsibility to the group, such as Routers, Switches, Servers, etc.
-
(Optional) Create a duty schedule.
-
Click Finish.
Users will receive notifications in the order in which the user appears in the group. |
If you delete a user group, no one receives notification that the group has been deleted. If the group is associated with a schedule, that schedule will no longer exist, and users associated with that group will no longer recieve notifications previously specified in the schedule. |
The on-call roles feature allows you to assign a predefined duty schedule to an existing group of users. A common use case is to have system engineers in on-call rotations with a given schedule.
Each on-call role includes a user designated as a supervisor, who receives notifications when no one is on duty to receive OpenNMS Horizon notifications.
The supervisor must have admin
privileges.
3.4. Assigning a Group to an On-Call Role
Before assigning a group to an on-call role, you must create a group.
-
Log in as a user with administrative permissions.
-
Click the gear icon in the top right.
-
Choose Configure OpenNMS → Configure Users, Groups and On-Call roles and select Configure On-Call Roles.
-
Click Add New On-Call Role and specify a name, supervisor, group and description.
-
Click Save.
-
In the calendar, click the plus (+) icon on the day for which you want to create a schedule.
-
Specify the user, date, and time the user should be on call and click Save:
-
Repeat for other days and users.
-
Click Done to apply the changes.
3.5. User Maintenance
User maintenance describes additional tasks and information related to users.
3.5.1. Passwords
-
Log in as a User with administrative permissions.
-
Click the gear icon in the top right.
-
Choose Configure OpenNMS → Configure Users, Groups and On-Call roles and select Configure Users.
-
Click the Modify icon next to an existing user and select Reset Password.
-
Type a new Password, Confirm Password, and click OK.
-
Click Finish.
-
Log in with user name and old password.
-
Choose Change Password from the drop-down below your login name.
-
Specify your current password then set the new password and confirm it.
-
Click Submit.
-
Log out and log in with your new password.
3.5.2. Deleting users and groups
-
Log in as a user with administrative permissions.
-
Click the gear icon in the top right.
-
Choose Configure OpenNMS → Configure Users, Groups and On-Call roles and select Configure Users (Configure Groups).
-
Click the trash bin icon beside the user (or group) you want to delete.
-
Confirm delete request with OK.
When you delete a group no one receives notification that the group has been deleted. Be aware that deleting a group or user also removes any schedules associated with that group or user, meaning they will not receive notifcations specified as part of a schedule. |
3.5.3. Advanced Configuration
OpenNMS Horizon persists the user, password, and other detail descriptions in the users.xml
file.
3.6. Web UI Pre-Authentication
It is possible to configure OpenNMS Horizon to run behind a proxy that provides authentication, and then pass the pre-authenticated user to the OpenNMS Horizon webapp using a header.
Define the pre-authentication configuration in $OPENNMS_HOME/jetty-webapps/opennms/WEB-INF/spring-security.d/header-preauth.xml
. This file is automatically included in the Spring Security context, but is not enabled by default.
DO NOT configure OpenNMS Horizon this way unless you are certain the web UI is accessible only to the proxy and not to end users. Otherwise, malicious attackers can craft queries that include the pre-authentication header and get full control of the web UI and REST APIs. |
3.6.1. Enabling Pre-Authentication
Edit the header-preauth.xml
file, and set the enabled
property:
<beans:property name="enabled" value="true" />
3.6.2. Configuring Pre-Authentication
You can also set the following properties to change the behavior of the pre-authentication plugin:
Property | Description | Default |
---|---|---|
|
Whether the pre-authentication plugin is active. |
|
|
If true, disallow login if the header is not set or the user does not exist. If false, fall through to other mechanisms (basic auth, form login, etc.) |
|
|
The HTTP header that will specify the user to authenticate as. |
|
|
A comma-separated list of additional credentials (roles) the user should have. |
4. Administrative Webinterface
4.1. Surveillance View
When networks are larger and contain devices of different priority, it becomes interesting to show at a glance how the "whole system" is working. The surveillance view aims to do that. By using categories, you can define a matrix which allows to aggregate monitoring results. Imagine you have 10 servers with 10 internet connections and some 5 PCs with DSL lines:
Servers | Internet Connections | |
---|---|---|
Super important |
1 of 10 |
0 of 10 |
Slightly important |
0 of 10 |
0 of 10 |
Vanity |
4 of 10 |
0 of 10 |
The whole idea is to give somebody at a glance a hint on where the trouble is. The matrix-type of display allows a significantly higher aggregation than the simple list. In addition, the surveillance view shows nodes rather than services - an important tidbit of information when you look at categories. At a glance, you want to know how many of my servers have an issue rather than how many services in this category have an issue.
The visual indication for outages in the surveillance view cells is defined as the following:
-
No services down: green as normal
-
One (1) service down: yellow as warning
-
More than one (1) services down: red as critical
This Surveillance View model also builds the foundation of the Dashboard View.
4.1.1. Default Surveillance View Configuration
Surveillance Views are defined in the surveillance-views.xml
file.
This file resides in the OpenNMS Horizon etc
directory.
This file can be modified in a text editor and is reread every time the Surveillance View page is loaded. Thus, changes to this file do not require OpenNMS Horizon to be restarted. |
The default configuration looks like this:
<?xml version="1.0" encoding="UTF-8"?>
<surveillance-view-configuration
xmlns:this="http://www.opennms.org/xsd/config/surveillance-views"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.opennms.org/xsd/config/surveillance-views http://www.opennms.org/xsd/config/surveillance-views.xsd"
default-view="default" >
<views >
<view name="default" refresh-seconds="300" >
<rows>
<row-def label="Routers" >
<category name="Routers"/>
</row-def>
<row-def label="Switches" >
<category name="Switches" />
</row-def>
<row-def label="Servers" >
<category name="Servers" />
</row-def>
</rows>
<columns>
<column-def label="PROD" >
<category name="Production" />
</column-def>
<column-def label="TEST" >
<category name="Test" />
</column-def>
<column-def label="DEV" >
<category name="Development" />
</column-def>
</columns>
</view>
</views>
</surveillance-view-configuration>
Please note, that the old report-category attribute is deprecated and is no longer supported.
|
4.1.2. Configuring Surveillance Views
The Surveillance View configuration can also be modified using the Surveillance View Configurations editor on the OpenNMS Horizon Admin page.
This page gives an overview of the configured Surveillance Views and allows the user to edit, remove or even preview the defined Surveillance View. Furthermore, the default Surveillance View can be selected using the checkbox in the DEFAULT column.
When editing a Surveillance View the user has to define the view’s title and the time in seconds between successive refreshes. On the left side of this dialog the defined rows, on the right side the defined columns are listed. Beside adding new entries an user can modify or delete existing entries. Furthermore, the position of an entry can be modified using the up/down buttons.
Editing row or column definitions require to choose an unique label for this entry and at least one OpenNMS Horizon category. When finished you can hit the Save button to persist your modified configuration or Cancel to close this dialog.
4.1.3. Categorizing Nodes
In order to categorize nodes in the Surveillance View, choose a node and click Edit beside Surveillance Category Memberships. Recalling from your Surveillance View, choose two categories that represent a column and a row, for example, Servers and Test, then click Add.
4.1.4. Creating Views for Users and Groups
You can use user and group names for Surveillance Views. When the Surveillance View page is invoked the following criteria selects the proper Surveillance View to be displayed. The first matching item wins:
-
Surveillance View name equal to the user name they used when logging into OpenNMS Horizon.
-
Surveillance View name equal to the user’s assigned OpenNMS Horizon group name
-
Surveillance View name equal to the
default-view
attribute in thesurveillance-views.xml
configuration file.
4.2. Dashboard
In Network Operation Centers NOC an overview about issues in the network is important and often described as Dashboards. Large networks have people (Operator) with different responsibilities and the Dashboard should show only information for a given monitoring context. Network or Server operator have a need to customize or filter information on the Dashboard. A Dashboard as an At-a-glance overview is also often used to give an entry point for more detailed diagnosis through the information provided by the monitoring system. The Surveillance View allows to reduce the visible information by selecting rows, columns and cells to quickly limit the amount of information to navigate through.
4.2.1. Components
The Dashboard is built with five components:
-
Surveillance View: Allows to model a monitoring context for the Dashboard.
-
Alarms: Shows unacknowledged Alarms which should be escalated by an Operator.
-
Notifications: Shows outstanding and unacknowledged notifications sent to Engineers.
-
Node Status: Shows all ongoing network Outages.
-
Resource Graph Viewer: Shows performance time series reports for performance diagnosis.
The following screenshot shows a configured Dashboard and which information are displayed in the components.
The following section describe the information shown in each component. All other components display information based on the Surveillance View.
Surveillance View
The Surveillance View has multiple functions.
-
Allows to model the monitoring context and shows service and node Outages in compact matrix view.
-
Allows to limit the number of information in the Dashboard by selecting rows, columns and cells.
You can select columns, rows, single cells and of course all entries in a Surveillance View. Please refer to the Surveillance View Section for details on how to configure Surveillance Views.
Alarms
The Alarms component gives an overview about all unacknowledged Alarms with a severity higher than Normal(1). Acknowledged Alarms will be removed from the responsibility of the Operator. The following information are shown in:
-
Node: Node label of the node the Alarm is associated
-
Severity: Severity of the Alarm
-
UEI: Shows the UEI of the Alarm
-
Count: Number of Alarms deduplicated by the reduction key of the Alarm
-
Last Time: Time for the last occurrence of the Alarm
-
Log Msg: The log message from the Event which is the source for this Alarm. It is specified in the event configuration file in
<logmsg />
The Alarms component shows the most recent Alarms and allows the user to scroll through the last 100 Alarms.
Notifications
To inform people on a duty schedule notifications are used and force action to fix or reconfigure systems immediately. In OpenNMS Horizon it is possible to acknowledge notifications to see who is working on a specific issue. The Dashboard should show outstanding notifications in the NOC to provide an overview and give the possibility for intervention.
-
Node: Label of the monitored node the notification is associated with
-
Service: Name of the service the notification is associated with
-
Message: Message of the notification
-
Sent Time: Time when the notification was sent
-
Responder: User name who acknowledged the notification
-
Response Time: Time when the user acknowledged the notification
The Notifications component shows the most recent unacknowledged notifications and allows the user to scroll through the last 100 Notifications.
Node Status
An acknowledged Alarm doesn’t mean necessarily the outage is solved. To give an overview information about ongoing Outages in the network, the Dashboard shows an outage list in the Node Status component.
-
Node: Label of the monitored node with ongoing outages.
-
Current Outages: Number of services on the node with outages and total number of monitored services, e.g. with the natural meaning of "3 of 3 services are affected".
-
24 Hour Availability: Availability of all services provided by the node calculated by the last 24 hours.
Resource Graph Viewer
To give a quick entry point diagnose performance issues a Resource Graph Viewer allows to navigate to time series data reports which are filtered in the context of the Surveillance View.
It allows to navigate sequentially through resource graphs provided by nodes filtered by the Surveillance View context and selection and shows one graph report at a time.
4.2.2. Advanced configuration
The Surveillance View component allows to model multiple views for different monitoring contexts. It gives the possibility to create special view as example for network operators or server operators. The Dashboard shows only one configured Surveillance View. To give different users the possibility using their Surveillance View fitting there requirements it is possible to map a logged in user to a given Surveillance View used in the Dashboard.
The selected nodes from the Surveillance View are also aware of User Restriction Filter. If you have a group of users, which should see just a subset of nodes the Surveillance View will filter nodes which are not related to the assigned user group.
The Dashboard is designed to focus, and therefore also restrict, a user’s view to devices of their interest. To do this, a new role was added that can be assigned to a user that restricts them to viewing only the Dashboard if that is intended.
Using the Dashboard role
The following example illustrates how this Dashboard role can be used.
For instance the user drv4doe
is assigned the dashboard role.
So, when logging in as drv4doe
, the user is taking directly to the Dashboard page and is presented with a custom Dashboard based on the drv4doe
Surveillance View definition.
Step 1: Create an user
The following example assigns a Dashboard to the user "drv4doe" (a router and switch jockey) and restricts the user for navigation to any other link in the OpenNMS Horizon WebUI.
drv4doe
using the OpenNMS Horizon WebUIStep 2: Change Security Roles
Now, add the ROLE_PROVISION
role to the user through the WebUI or by manually editing the users.xml
file in the /opt/opennms/etc
directory for the user drv4doe
.
drv4doe
using the OpenNMS Horizon WebUI<user>
<user-id>drv4doe</user-id>
<full-name>Dashboard User</full-name>
<password salt="true">6FOip6hgZsUwDhdzdPUVV5UhkSxdbZTlq8M5LXWG5586eDPa7BFizirjXEfV/srK</password>
<role>ROLE_DASHBOARD</role>
</user>
Step 3: Define Surveillance View
Edit the $OPENNMS_HOME/etc/surveilliance-view.xml
file to add a definition for the user drv4doe, which you created in step 1.
<?xml version="1.0" encoding="UTF-8"?>
<surveillance-view-configuration
xmlns:this="http://www.opennms.org/xsd/config/surveillance-views"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.opennms.org/xsd/config/surveillance-views http://www.opennms.org/xsd/config/surveillance-views.xsd"
default-view="default" >
<views >
<view name="drv4doe" refresh-seconds="300" >
<rows>
<row-def label="Servers" >
<category name="Servers"/>
</row-def>
</rows>
<columns>
<column-def label="PROD" >
<category name="Production" />
</column-def>
<column-def label="TEST" >
<category name="Test" />
</column-def>
</columns>
</view>
<!-- default view here -->
<view name="default" refresh-seconds="300" >
<rows>
<row-def label="Routers" >
<category name="Routers"/>
</row-def>
<row-def label="Switches" >
<category name="Switches" />
</row-def>
<row-def label="Servers" >
<category name="Servers" />
</row-def>
</rows>
<columns>
<column-def label="PROD" >
<category name="Production" />
</column-def>
<column-def label="TEST" >
<category name="Test" />
</column-def>
<column-def label="DEV" >
<category name="Development" />
</column-def>
</columns>
</view>
</views>
</surveillance-view-configuration>
This configuration and proper assignment of node categories will produce a default Dashboard for all users, other than drv4doe
.
You can hide the upper navigation on any page by specifying ?quiet=true; adding it to the end of the OpenNMS Horizon URL.
This is very handy when using the dashboard on a large monitor or tv screen for office wide viewing.
|
However, when logging in as drv4doe
, the user is taking directly to the Dashboard page and is presented with a Dashboard based on the custom Surveillance View definition.
The drv4doe user is not allowed to navigate to URLs other than the dashboard.jsp URL.
Doing so will result in an Access Denied error.
|
Anonymous dashboards
You can modify the configuration files for the security framework to give you access to one or more dashboards without logging in.
At the end you’ll be able to point a browser at a special URL like http://…/opennms/dashboard1
or http://…/opennms/dashboard2
and see a dashboard without any authentication.
First, configure surveillance views and create dashboard users as above.
For example, make two dashboards and two users called dashboard1
and dashboard2
.
Test that you can log in as each of the new users and see the correct dashboard.
Now create some aliases you can use to distinguish between dashboards.
In /opt/opennms/jetty-webapps/opennms/WEB-INF
, edit web.xml
.
Just before the first <servlet-mapping>
tag, add the following servlet entries:
<servlet>
<servlet-name>dashboard1</servlet-name>
<jsp-file>/dashboard.jsp</jsp-file>
</servlet>
<servlet>
<servlet-name>dashboard2</servlet-name>
<jsp-file>/dashboard.jsp</jsp-file>
</servlet>
Just before the first <error-page>
tag, add the following servlet-mapping entries:
<servlet-mapping>
<servlet-name>dashboard1</servlet-name>
<url-pattern>/dashboard1</url-pattern>
</servlet-mapping>
<servlet-mapping>
<servlet-name>dashboard2</servlet-name>
<url-pattern>/dashboard2</url-pattern>
</servlet-mapping>
After the last <filter-mapping>
tag, add the following filter-mapping entries:
<filter-mapping>
<filter-name>AddRefreshHeader-120</filter-name>
<url-pattern>/dashboard.jsp</url-pattern>
</filter-mapping>
<filter-mapping>
<filter-name>AddRefreshHeader-120</filter-name>
<url-pattern>/dashboard1</url-pattern>
</filter-mapping>
<filter-mapping>
<filter-name>AddRefreshHeader-120</filter-name>
<url-pattern>/dashboard2</url-pattern>
</filter-mapping>
Next edit applicationContext-acegi-security.xml
to enable anonymous authentication for the /dashboard1
and /dashboard2
aliases.
Near the top of the file, find <bean id="filterChainProxy" …>
.
Below the entry for /rss.jsp*
, add an entry for each of the dashboard aliases:
<bean id="filterChainProxy" class="org.acegisecurity.util.FilterChainProxy">
<property name="filterInvocationDefinitionSource">
<value>
CONVERT_URL_TO_LOWERCASE_BEFORE_COMPARISON
PATTERN_TYPE_APACHE_ANT
/rss.jsp*=httpSessionContextIntegrationFilter,logoutFilter,authenticationProcessingFilter,basicProcessingFilter,securityContextHolderAwareRequestFilter,anonymousProcessingFilter,basicExceptionTranslationFilter,filterInvocationInterceptor
/dashboard1*=httpSessionContextIntegrationFilter,logoutFilter,securityContextHolderAwareRequestFilter,dash1AnonymousProcessingFilter,filterInvocationInterceptor
/dashboard2*=httpSessionContextIntegrationFilter,logoutFilter,securityContextHolderAwareRequestFilter,dash2AnonymousProcessingFilter,filterInvocationInterceptor
/**=httpSessionContextIntegrationFilter,logoutFilter,authenticationProcessingFilter,basicProcessingFilter,securityContextHolderAwareRequestFilter,anonymousProcessingFilter,exceptionTranslationFilter,filterInvocationInterceptor
...
About halfway through the file, look for <bean id="filterInvocationInterceptor" …>
.
Below the entry for /dashboard.jsp
, add an entry for each of the aliases:
<bean id="filterInvocationInterceptor" class="org.acegisecurity.intercept.web.FilterSecurityInterceptor">
...
/frontpage.htm=ROLE_USER,ROLE_DASHBOARD
/dashboard.jsp=ROLE_USER,ROLE_DASHBOARD
/dashboard1=ROLE_USER,ROLE_DASHBOARD
/dashboard2=ROLE_USER,ROLE_DASHBOARD
/gwt.js=ROLE_USER,ROLE_DASHBOARD
...
Finally, near the bottom of the page, add a new instance of AnonymousProcessingFilter
for each alias.
<!-- Set the anonymous username to dashboard1 so the dashboard page
can match it to a surveillance view of the same name. -->
<bean id="dash1AnonymousProcessingFilter" class="org.acegisecurity.providers.anonymous.AnonymousProcessingFilter">
<property name="key"><value>foobar</value></property>
<property name="userAttribute"><value>dashboard1,ROLE_DASHBOARD</value></property>
</bean>
<bean id="dash2AnonymousProcessingFilter" class="org.acegisecurity.providers.anonymous.AnonymousProcessingFilter">
<property name="key"><value>foobar</value></property>
<property name="userAttribute"><value>dashboard2,ROLE_DASHBOARD</value></property>
</bean>
Restart OpenNMS Horizon and you should bring up a dashboard at http://…/opennms/dashboard1
without logging in.
There’s no way to switch dashboards without closing the browser (or deleting the JSESSIONID session cookie). |
If you accidentally click a link that requires full user privileges (e.g. Node List), you’ll be given a login form.
Once you get to the login form, there’s no going back to the dashboard without restarting the browser.
If this problem bothers you, you can set ROLE_USER in addition to ROLE_DASHBOARD in your userAttribute property.
However this will give full user access to anonymous browsers.
|
4.3. Grafana Dashboard Box
Grafana provides an API key which gives access for 3rd party application like OpenNMS Horizon. The Grafana Dashboard Box on the start page shows dashboards related to OpenNMS Horizon. To filter relevant dashboards, you can use a tag for dashboards and make them accessible. If no tag is provided all dashboards from Grafana will be shown.
The feature is by default deactivated and is configured through opennms.properties
. Please note that this feature
works with the Grafana API v2.5.0.
Name | Type | Description | Default |
---|---|---|---|
|
Boolean |
This setting controls whether a grafana box showing the
available dashboards is placed on the landing page. The two
valid options for this are |
|
|
String |
If the box is enabled you also need to specify hostname of the Grafana server |
|
|
Integer |
The port of the Grafana server ReST API |
|
|
String |
The Grafana base path to be used |
|
|
String |
The API key is needed for the ReST calls to work |
|
|
String |
When a tag is specified only dashboards with this given tag will be displayed. When no tag is given all dashboards will be displayed |
|
|
String |
The protocol for the ReST call can also be specified |
|
|
Integer |
Timeout in milliseconds for getting information from the Grafana server |
|
|
Integer |
Socket timeout |
|
|
Integer |
Maximum number of entries to be displayed (0 for unlimited) |
|
If you have Grafana behind a proxy it is important the org.opennms.grafanaBox.hostname is reachable.
This host name is used to generate links to the Grafana dashboards.
|
The process to generate an Grafana API Key can be found in the HTTP API documentation.
Copy the API Key to opennms.properties
as org.opennms.grafanaBox.apiKey
.
4.4. Operator Board
In a network operation center (NOC) the Ops Board can be used to visualize monitoring information. The monitoring information for various use-cases are arranged in configurable Dashlets. To address different user groups it is possible to create multiple Ops Boards.
There are two visualisation components to display Dashlets:
-
Ops Panel: Shows multiple Dashlets on one screen, e.g. on a NOC operators workstation
-
Ops Board: Shows one Dashlet at a time in rotation, e.g. for a screen wall in a NOC
4.4.1. Configuration
To create and configure Ops Boards administration permissions are required. The configuration section is in admin area of OpenNMS Horizon and named Ops Board Config Web Ui.
Create or modify Ops Boards is described in the following screenshot.
-
Create a new Ops Board to organize and arrange different Dashlets
-
The name to identify the Ops Board
-
Add a Dashlet to show OpenNMS Horizon monitoring information
-
Show a preview of the whole Ops Board
-
List of available Dashlets
-
Priority for this Dashlet in Ops Board rotation, lower priority means it will be displayed more often
-
Duration in seconds for this Dashlet in the Ops Board rotation
-
Change Priority if the Dashlet is in alert state, this is optional and maybe not available in all Dashlets
-
Change Duration if the Dashlet is in alert state, it is optional and maybe not available in all Dashlets
-
Configuration properties for this Dashlet
-
Remove this Dashlet from the Ops Board
-
Order Dashlets for the rotation on the Ops Board and the tile view in the Ops Panel
-
Show a preview for the whole Ops Board
The configured Ops Board can be used by navigating in the main menu to Dashboard → Ops Board.
4.4.2. Dashlets
Visualization of information is implemented in Dashlets. The different Dashlets are described in this section with all available configuration parameter.
To allow filter information the Dashlet can be configured with a generic Criteria Builder.
Alarm Details
This Alarm-Details Dashlet shows a table with alarms and some detailed information.
Field | Description |
---|---|
Alarm ID |
OpenNMS Horizon ID for the alarm |
Severity |
Alarm severity (Cleared, Indeterminate, Normal, Warning, Minor, Major, Critical) |
Node label |
Node label of the node where the alarm occurred |
Alarm count |
Alarm count based on reduction key for deduplication |
Last Event Time |
Last time the alarm occurred |
Log Message |
Reason and detailed log message of the alarm |
The Alarm Details Dashlet can be configured with the following parameters.
Boost support |
|
Configuration |
Alarms
This Alarms Dashlet shows a table with a short alarm description.
Field | Description |
---|---|
Time |
Absolute time since the alarm appeared |
Node label |
Node label of the node where the alarm occurred |
UEI |
OpenNMS Horizon Unique Event Identifier for this alarm |
The Alarms Dashlet can be configured with the following parameters.
Boost support |
|
Configuration |
Charts
This Dashlet displays an existing Chart.
Boost support |
false |
|
Name of the existing chart to display |
|
Rescale the image to fill display width |
|
Rescale the image to fill display height |
Grafana
This Dashlet shows a Grafana Dashboard for a given time range.
The Grafana Dashboard Box configuration defined in the opennms.properties
file is used to access the Grafana instance.
Boost support |
false |
|
Title of the Grafana dashboard to be displayed |
|
URI to the Grafana Dashboard to be displayed |
|
Start of time range |
|
End of time range |
Image
This Dashlet displays an image by a given URL.
Boost support |
false |
|
URL with the location of the image to show in this Dashlet |
|
Rescale the image to fill display width |
|
Rescale the image to fill display height |
KSC
This Dashlet shows an existing KSC report. The view is exact the same as the KSC report is build regarding order, columns and time spans.
Boost support |
false |
|
Name of the KSC report to show in this Dashlet |
Map
This Dashlet displays the geographical map.
Boost support |
false |
|
Predefined search for a subset of nodes shown in the geographical map in this Dashlet |
RRD
This Dashlet shows one or multiple RRD graphs. It is possible to arrange and order the RRD graphs in multiple columns and rows. All RRD graphs are normalized with a given width and height.
Boost support |
false |
|
Number of columns within the Dashlet |
|
Number of rows with the Dashlet |
|
Import RRD graphs from an existing KSC report and re-arrange them. |
|
Generic width for all RRD graphs in this Dashlet |
|
Generic height for all RRD graphs in this Dashlet |
|
Number of the given |
|
Minute, Hour, Day, Week, Month and Year for all RRD graphs |
RTC
This Dashlet shows the configured SLA categories from the OpenNMS Horizon start page.
Boost support |
false |
|
- |
Summary
This Dashlet shows a trend of incoming alarms in given time frame.
Boost support |
|
|
Time slot in seconds to evaluate the trend for alarms by severity and UEI. |
Surveillance
This Dashlet shows a given Surveillance View.
Boost support |
false |
|
Name of the configured Surveillance View |
Topology
This Dashlet shows a Topology Map. The Topology Map can be configured with the following parameter.
Boost support |
false |
|
Which node(s) is in focus for the topology |
|
Which topology should be displayed, e.g. Linkd, VMware |
|
Set the zoom level for the topology |
URL
This Dashlet shows the content of a web page or other web application, e.g. other monitoring systems by a given URL.
Boost support |
false |
|
Optional password if a basic authentication is required |
|
URL to the web application or web page |
|
Optional username if a basic authentication is required |
4.4.3. Boosting Dashlet
The behavior to boost a Dashlet describes the behavior of a Dashlet showing critical monitoring information. It can raise the priority in the Ops Board rotation to indicate a problem. This behavior can be configured with the configuration parameter Boost Priority and Boost Duration. These to configuration parameter effect the behavior on the Ops Board in rotation.
-
Boost Priority: Absolute priority of the Dashlet with critical monitoring information.
-
Boost Duration: Absolute duration in seconds of the Dashlet with critical monitoring information.
4.4.4. Criteria Builder
The Criteria Builder is a generic component to filter information of a Dashlet. Some Dashlets use this component to filter the shown information on a Dashlet for certain use case. It is possible to combine multiple Criteria to display just a subset of information in a given Dashlet.
Restriction | Property | Value 1 | Value 2 | Description |
---|---|---|---|---|
|
- |
- |
- |
ascending order |
|
- |
- |
- |
descending order |
|
database attribute |
String |
String |
Subset of data between value 1 and value 2 |
|
database attribute |
String |
- |
Select all data which contains a given text string in a given database attribute |
|
database attribute |
- |
- |
Select a single instance |
|
database attribute |
String |
- |
Select data where attribute equals ( |
|
database attribute |
String |
- |
Select data where attribute is greater equals than ( |
|
database attribute |
String |
- |
Select data where attribute is greater than ( |
|
database attribute |
String |
- |
unknown |
|
database attribute |
String |
- |
unknown |
|
database attribute |
String |
- |
Select data where attribute matches an given IPLIKE expression |
|
database attribute |
- |
- |
Select data where attribute is null |
|
database attribute |
- |
- |
Select data where attribute is not null |
|
database attribute |
- |
- |
Select data where attribute is not null |
|
database attribute |
String |
- |
Select data where attribute is less equals than ( |
|
database attribute |
String |
- |
Select data where attribute is less than ( |
|
database attribute |
String |
- |
Select data where attribute is less equals than ( |
|
database attribute |
String |
- |
Select data where attribute is like a given text value similar to SQL |
|
- |
Integer |
- |
Limit the result set by a given number |
|
database attribute |
String |
- |
Select data where attribute is not equals ( |
|
database attribute |
String |
- |
unknown difference between |
|
database attribute |
- |
- |
Order the result set by a given attribute |
For date values, absolute value can be specified in ISO format, e.g. 2019-06-20T20:45:15.123-05:00. Relative times can be specified by +seconds and -seconds. |
4.5. JMX Configuration Generator
OpenNMS Horizon implements the JMX protocol to collect long term performance data for Java applications. There are a huge variety of metrics available and administrators have to select which information should be collected. The JMX Configuration Generator Tools is build to help generating valid complex JMX data collection configuration and RRD graph definitions for OpenNMS Horizon.
This tool is available as CLI and a web based version.
4.5.1. Web based utility
Complex JMX data collection configurations can be generated from a web based tool. It collects all available MBean Attributes or Composite Data Attributes from a JMX enabled Java application.
The workflow of the tool is:
-
Connect with JMX or JMXMP against a MBean Server provided of a Java application
-
Retrieve all MBean and Composite Data from the application
-
Select specific MBeans and Composite Data objects which should be collected by OpenNMS Horizon
-
Generate JMX Collectd configuration file and RRD graph definitions for OpenNMS Horizon as downloadable archive
The following connection settings are supported:
-
Ability to connect to MBean Server with RMI based JMX
-
Authentication credentials for JMX connection
-
Optional: JMXMP connection
The web based configuration tool can be used in the OpenNMS Horizon Web Application in administration section Admin → JMX Configuration Generator.
Configure JMX Connection
At the beginning the connection to an MBean Server of a Java application has to be configured.
-
Service name: The name of the service to bind the JMX data collection for Collectd
-
Host: IP address or FQDN connecting to the MBean Server to load MBeans and Composite Data into the generation tool
-
Port: Port to connect to the MBean Server
-
Authentication: Enable / Disable authentication for JMX connection with username and password
-
Skip non-number values: Skip attributes with non-number values
-
JMXMP: Enable / Disable JMX Messaging Protocol instead of using JMX over RMI
By clicking the arrow ( > ) the MBeans and Composite Data will be retrieved with the given connection settings. The data is loaded into the MBeans Configuration screen which allows to select metrics for the data collection configuration.
Select MBeans and Composite
The MBeans Configuration section is used to assign the MBean and Composite Data attributes to RRD domain specific data types and data source names.
The left sidebar shows the tree with the JMX Domain, MBeans and Composite Data hierarchy retrieved from the MBean Server. To select or deselect all attributes use Mouse right click → select/deselect.
The right panel shows the MBean Attributes with the RRD specific mapping and allows to select or deselect specific MBean Attriubtes or Composite Data Attributes for the data collection configuration.
-
MBean Name or Composite Alias: Identifies the MBean or the Composite Data object
-
Selected: Enable/Disable the MBean attribute or Composite Member to be included in the data collection configuration
-
Name: Name of the MBean attribute or Composite Member
-
Alias: the data source name for persisting measurements in RRD or JRobin file
-
Type: Gauge or Counter data type for persisting measurements in RRD or JRobin file
The MBean Name, Composite Alias and Name are validated against special characters. For the Alias inputs are validated to be not longer then 19 characters and have to be unique in the data collection configuration.
Download and include configuration
The last step is generating the following configuration files for OpenNMS Horizon:
-
collectd-configuration.xml: Generated sample configuration assigned to a service with a matching data collection group
-
jmx-datacollection-config.xml: Generated JMX data collection configuration with the selected MBeans and Composite Data
-
snmp-graph.properties: Generated default RRD graph definition files for all selected metrics
The content of the configuration files can be copy & pasted or can be downloaded as ZIP archive.
If the content of the configuration file exceeds 2,500 lines, the files can only be downloaded as ZIP archive. |
4.5.2. CLI based utility
The command line (CLI) based tool is not installed by default. It is available as Debian and RPM package in the official repositories.
Installation
yum install opennms-jmx-config-generator
apt-get install opennms-jmx-config-generator
It is required to have the Java 8 Development Kit with Apache Maven installed.
The mvn
binary has to be in the path environment.
After cloning the repository you have to enter the source folder and compile an executable JAR.
cd opennms/features/jmx-config-generator
mvn package
Inside the newly created target
folder a file named jmxconfiggenerator-<VERSION>-onejar.jar
is present.
This file can be invoked by:
java -jar target/jmxconfiggenerator-27.0.4-onejar.jar
Usage
After installing the the JMX Config Generator the tool’s wrapper script is located in the ${OPENNMS_HOME}/bin
directory.
$ cd /path/to/opennms/bin
$ ./jmx-config-generator
When invoked without parameters the usage and help information is printed. |
The JMX Config Generator uses sub-commands for the different configuration generation tasks. Each of these sub-commands provide different options and parameters. The command line tool accepts the following sub-commands.
Sub-command | Description |
---|---|
|
Queries a MBean Server for certain MBeans and attributes. |
|
Generates a valid |
|
Generates a RRD graph definition file with matching graph definitions for a given |
The following global options are available in each of the sub-commands of the tool:
Option/Argument | Description | Default |
---|---|---|
|
Show help and usage information. |
false |
|
Enables verbose mode for debugging purposes. |
false |
Sub-command: query
This sub-command is used to query a MBean Server for it’s available MBean objects.
The following example queries the server myserver
with the credentials myusername/mypassword
on port 7199
for MBean objects in the java.lang
domain.
./jmx-config-generator query --host myserver --username myusername --password mypassword --port 7199 "java.lang:*"
java.lang:type=ClassLoading
description: Information on the management interface of the MBean
class name: sun.management.ClassLoadingImpl
attributes: (5/5)
TotalLoadedClassCount
id: java.lang:type=ClassLoading:TotalLoadedClassCount
description: TotalLoadedClassCount
type: long
isReadable: true
isWritable: false
isIs: false
LoadedClassCount
id: java.lang:type=ClassLoading:LoadedClassCount
description: LoadedClassCount
type: int
isReadable: true
isWritable: false
isIs: false
<output omitted>
The following command line options are available for the query sub-command.
Option/Argument | Description | Default |
---|---|---|
|
A filter criteria to query the MBean Server for.
The format is |
- |
|
Hostname or IP address of the remote JMX host. |
- |
|
Only show the ids of the attributes. |
false |
|
Set |
- |
|
Include attribute values. |
false |
|
Use JMXMP and not JMX over RMI. |
false |
|
Password for JMX authentication. |
- |
|
Port of JMX service. |
- |
|
Only lists the available domains. |
true |
|
Includes MBeans, even if they do not have attributes.
Either due to the |
false |
|
Custom connection URL |
- |
|
Username for JMX authentication. |
- |
|
Show help and usage information. |
false |
|
Enables verbose mode for debugging purposes. |
false |
Sub-command: generate-conf
This sub-command can be used to generate a valid jmx-datacollection-config.xml
for a given set of MBean objects queried from a MBean Server.
The following example generate a configuration file myconfig.xml
for MBean objects in the java.lang
domain of the server myserver
on port 7199
with the credentials myusername/mypassword
.
You have to define either an URL or a hostname and port to connect to a JMX server.
jmx-config-generator generate-conf --host myserver --username myusername --password mypassword --port 7199 "java.lang:*" --output myconfig.xml
Dictionary entries loaded: '18'
The following options are available for the generate-conf sub-command.
Option/Argument | Description | Default |
---|---|---|
|
A list of attribute Ids to be included for the generation of the configuration file. |
- |
|
Path to a dictionary file for replacing attribute names and part of MBean attributes. The file should have for each line a replacement, e.g. Auxillary:Auxil. |
- |
|
Hostname or IP address of JMX host. |
- |
|
Use JMXMP and not JMX over RMI. |
false |
|
Output filename to write generated |
- |
|
Password for JMX authentication. |
- |
|
Port of JMX service |
- |
|
Prints the used dictionary to STDOUT.
May be used with |
false |
|
The Service Name used as JMX data collection name. |
anyservice |
|
Skip default JavaVM Beans. |
false |
|
Skip attributes with non-number values |
false |
|
Custom connection URL |
- |
|
Username for JMX authentication |
- |
|
Show help and usage information. |
false |
|
Enables verbose mode for debugging purposes. |
false |
The option --skipDefaultVM offers the ability to ignore the MBeans provided as standard by the JVM and just create configurations for the MBeans provided by the Java Application itself.
This is particularly useful if an optimized configuration for the JVM already exists.
If the --skipDefaultVM option is not set the generated configuration will include the MBeans of the JVM and the MBeans of the Java Application.
|
Check the file and see if there are alias names with more than 19 characters.
This errors are marked with NAME_CRASH_AS_19_CHAR_VALUE
|
Sub-command: generate-graph
This sub-command generates a RRD graph definition file for a given configuration file.
The following example generates a graph definition file mygraph.properties
using the configuration in file myconfig.xml
.
./jmx-config-generator generate-graph --input myconfig.xml --output mygraph.properties
reports=java.lang.ClassLoading.MBeanReport, \
java.lang.ClassLoading.0TotalLoadeClassCnt.AttributeReport, \
java.lang.ClassLoading.0LoadedClassCnt.AttributeReport, \
java.lang.ClassLoading.0UnloadedClassCnt.AttributeReport, \
java.lang.Compilation.MBeanReport, \
<output omitted>
The following options are available for this sub-command.
Option/Argument | Description | Default |
---|---|---|
|
Configuration file to use as input to generate the graph properties file |
- |
|
Output filename for the generated graph properties file. |
- |
|
Prints the default template. |
false |
|
Template file using Apache Velocity template engine to be used to generate the graph properties. |
- |
|
Show help and usage information. |
false |
|
Enables verbose mode for debugging purposes. |
false |
Graph Templates
The JMX Config Generator uses a template file to generate the graphs.
It is possible to use a user-defined template.
The option --template
followed by a file lets the JMX Config Generator use the external template file as base for the graph generation.
The following example illustrates how a custom template mytemplate.vm
is used to generate the graph definition file mygraph.properties
using the configuration in file myconfig.xml
.
./jmx-config-generator generate-graph --input myconfig.xml --output mygraph.properties --template mytemplate.vm
The template file has to be an Apache Velocity template. The following sample represents the template that is used by default:
reports=#foreach( $report in $reportsList )
${report.id}#if( $foreach.hasNext ), \
#end
#end
#foreach( $report in $reportsBody )
#[[###########################################]]#
#[[##]]# $report.id
#[[###########################################]]#
report.${report.id}.name=${report.name}
report.${report.id}.columns=${report.graphResources}
report.${report.id}.type=interfaceSnmp
report.${report.id}.command=--title="${report.title}" \
--vertical-label="${report.verticalLabel}" \
#foreach($graph in $report.graphs )
DEF:${graph.id}={rrd${foreach.count}}:${graph.resourceName}:AVERAGE \
AREA:${graph.id}#${graph.coloreB} \
LINE2:${graph.id}#${graph.coloreA}:"${graph.description}" \
GPRINT:${graph.id}:AVERAGE:" Avg \\: %8.2lf %s" \
GPRINT:${graph.id}:MIN:" Min \\: %8.2lf %s" \
GPRINT:${graph.id}:MAX:" Max \\: %8.2lf %s\\n" \
#end
#end
The JMX Config Generator generates different types of graphs from the jmx-datacollection-config.xml
.
The different types are listed below:
Type | Description |
---|---|
AttributeReport |
For each attribute of any MBean a graph will be generated. Composite attributes will be ignored. |
MbeanReport |
For each MBean a combined graph with all attributes of the MBeans is generated. Composite attributes will be ignored. |
CompositeReport |
For each composite attribute of every MBean a graph is generated. |
CompositeAttributeReport |
For each composite member of every MBean a combined graph with all composite attributes is generated. |
4.6. Heatmap
The Heatmap can be either be used to display unacknowledged alarms or to display ongoing outages of nodes. Each of this visualizations can be applied on categories, foreign sources or services of nodes. The sizing of an entity is calculated by counting the services inside the entity. Thus, a node with fewer services will appear in a smaller box than a node with more services.
The feature is by default deactivated and is configured through opennms.properties
.
Name | Type | Description | Default |
---|---|---|---|
|
String |
There exist two options for using the heatmap: |
|
|
String |
This option defines which Heatmap is displayed by default.
Valid options are |
|
|
String |
The following option is used to filter for categories to be
displayed in the Heatmap. This option uses the Java regular
expression syntax. The default is |
|
|
String |
The following option is used to filter for foreign sources
to be displayed in the Heatmap. This option uses the Java
regular expression syntax. The default is |
|
|
String |
The following option is used to filter for services to be
displayed in the Heatmap. This option uses the Java regular
expression syntax. The default is |
|
|
Boolean |
This option configures whether only unacknowledged alarms will be taken into account when generating the alarm-based version of the Heatmap. |
|
|
String |
You can also place the Heatmap on the landing page by
setting this option to |
|
You can use negative lookahead expressions for excluding categories you wish not to be displayed in the heatmap,
e.g. by using an expression like ^(?!XY).* you can filter out entities with names starting with XY .
|
4.7. Trend
The Trend feature allows to display small inline charts of database-based statistics.
These chart are accessible in the Status menu of the OpenNMS' web application.
Furthermore it is also possible to configure these charts to be displayed on the OpenNMS' landing page.
To achieve this alter the org.opennms.web.console.centerUrl
property to also include the entry /trend/trend-box.htm
.
These charts can be configured and defined in the trend-configuration.xml
file in your OpenNMS' etc
directory.
The following sample defines a Trend chart for displaying nodes with ongoing outages.
<trend-definition name="nodes">
<title>Nodes</title> (1)
<subtitle>w/ Outages</subtitle> (2)
<visible>true</visible> (3)
<icon>fa-fire</icon> (4)
<trend-attributes> (5)
<trend-attribute key="sparkWidth" value="100%"/>
<trend-attribute key="sparkHeight" value="35"/>
<trend-attribute key="sparkChartRangeMin" value="0"/>
<trend-attribute key="sparkLineColor" value="white"/>
<trend-attribute key="sparkLineWidth" value="1.5"/>
<trend-attribute key="sparkFillColor" value="#88BB55"/>
<trend-attribute key="sparkSpotColor" value="white"/>
<trend-attribute key="sparkMinSpotColor" value="white"/>
<trend-attribute key="sparkMaxSpotColor" value="white"/>
<trend-attribute key="sparkSpotRadius" value="3"/>
<trend-attribute key="sparkHighlightSpotColor" value="white"/>
<trend-attribute key="sparkHighlightLineColor" value="white"/>
</trend-attributes>
<descriptionLink>outage/list.htm?outtype=current</descriptionLink> (6)
<description>${intValue[23]} NODES WITH OUTAGE(S)</description> (7)
<query> (8)
<![CDATA[
select (
select
count(distinct nodeid)
from
outages o, events e
where
e.eventid = o.svclosteventid
and iflostservice < E
and (ifregainedservice is null
or ifregainedservice > E)
) from (
select
now() - interval '1 hour' * (O + 1) AS S,
now() - interval '1 hour' * O as E
from
generate_series(0, 23) as O
) I order by S;
]]>
</query>
</trend-definition>
1 | title of the Trend chart, see below for supported variable substitutions |
2 | subtitle of the Trend chart, see below for supported variable substitutions |
3 | defines whether the chart is visible by default |
4 | icon for the chart, see Icons for viable options |
5 | options for inline chart, see jQuery Sparklines for viable options |
6 | the description link |
7 | the description text, see below for supported variable substitutions |
8 | the SQL statement for querying the chart’s values |
Don’t forget to limit the SQL query’s return values! |
It is possible to use values or aggregated values in the title, subtitle and description fields. The following table describes the available variable substitutions.
Name | Type | Description |
---|---|---|
|
Integer |
integer maximum value |
|
Double |
maximum value |
|
Integer |
integer minimum value |
|
Double |
minimum value |
|
Integer |
integer average value |
|
Double |
average value |
|
Integer |
integer sum of values |
|
Double |
sum of value |
|
Integer |
array of integer result values for the given SQL query |
|
Double |
array of result values for the given SQL query |
|
Integer |
array of integer value changes for the given SQL query |
|
Double |
array of value changes for the given SQL query |
|
Integer |
last integer value |
|
Double |
last value |
|
Integer |
last integer value change |
|
Double |
last value change |
You can also display a single graph in your JSP files by including the file /trend/single-trend-box.jsp
and specifying the name
parameter.
<jsp:include page="/trend/single-trend-box.jsp" flush="false">
<jsp:param name="name" value="example"/>
</jsp:include>
5. Service Assurance
This section will cover the basic functionalities how OpenNMS Horizon tests if a service or device available and measure his latency.
In OpenNMS Horizon this task is provided by a Service Monitor framework. The main component is Pollerd which provides the following functionality:
-
Track the status of a management resource or an application for availability calculations
-
Measure response times for service quality
-
Correlation of node and interface outages based on a Critical Service
The following image shows the model and representation of availability and response time.
This information is based on Service Monitors which are scheduled and executed by Pollerd. A Service can have any arbitrary name and is associated with a Service Monitor. For example, we can define two Services with the name HTTP and HTTP-8080, both are associated with the HTTP Service Monitor but use a different TCP port configuration parameter. The following figure shows how Pollerd interacts with other components in OpenNMS and applications or agents to be monitored.
The availability is calculated over the last 24 hours and is shown in the Surveillance Views, SLA Categories and the Node Detail Page. Response times are displayed as Resource Graphs of the IP Interface on the Node Detail Page. Configuration parameters of the Service Monitor can be seen in the Service Page by clicking on the Service Name on the Node Detail Page. The status of a Service can be Up or Down.
The Service Page also includes timestamps indicating the last time at which the service was polled and found to to be Up (Last Good) or Down (Last Fail). These fields can be used to validate that Pollerd is polling the services as expected. |
When a Service Monitor detects an outage, Pollerd sends an Event which is used to create an Alarm. Events can also be used to generate Notifications for on-call network or server administrators. The following images shows the interaction of Pollerd in OpenNMS Horizon.
Pollerd can generate the following Events in OpenNMS Horizon:
Event name | Description |
---|---|
|
Critical Services are still up, just this service is lost. |
|
Service came back up |
|
Critical Service on an IP interface is down or all services are down. |
|
Critical Service on that interface came back up again |
|
All critical services on all IP interfaces are down from node. The whole host is unreachable over the network. |
|
Some of the Critical Services came back online. |
The behavior to generate interfaceDown and nodeDown events is described in the Critical Service section.
This assumes that node-outage processing is enabled. |
5.1. Pollerd Configuration
File | Description |
---|---|
|
Configuration file for monitors and global daemon configuration |
|
Log file for all monitors and the global Pollerd |
|
RRD graph definitions for service response time measurements |
|
Event definitions for Pollerd, i.e. nodeLostService, interfaceDown or nodeDown |
To change the behavior for service monitoring, the poller-configuration.xml
can be modified.
The configuration file is structured in the following parts:
-
Global daemon config: Define the size of the used Thread Pool to run Service Monitors in parallel. Define and configure the Critical Service for Node Event Correlation.
-
Polling packages: Package to allow grouping of configuration parameters for Service Monitors.
-
Downtime Model: Configure the behavior of Pollerd to run tests in case of an Outage is detected.
-
Monitor service association: Based on the name of the service, the implementation for application or network management protocols are assigned.
<poller-configuration threads="30" (1)
pathOutageEnabled="false" (2)
serviceUnresponsiveEnabled="false"> (3)
1 | Size of the Thread Pool to run Service Monitors in parallel. |
2 | Enable or Disable Path Outage functionality based on a Critical Node in a network path. |
3 | In case of unresponsive service services a serviceUnresponsive event is generated and not an outage. This prevents the application of the Downtime Model in retesting the service after 30 seconds to help prevent false alarms. |
Configuration changes are applied by restarting OpenNMS and Pollerd. It is also possible to send an Event to Pollerd reloading the configuration. An Event can be sent on the CLI or the Web User Interface.
cd $OPENNMS_HOME/bin
./send-event.pl uei.opennms.org/internal/reloadDaemonConfig --parm 'daemonName Pollerd'
5.1.1. Metadata DSL
The Metadata DSL(domain specific language) allows you to use dynamic configuration in parameter values to interpolate metadata into the parameter.
The syntax allows for the use of patterns in an expression, whereby the metadata is replaced with a corresponding value during the collection process.
During evaluation of an expression, the following scopes are available:
-
Node metadata
-
Interface metadata
-
Service metadata
5.2. Critical Service
Monitoring services on an IP network can be resource expensive, especially in cases where many of these services are not available. When a service is offline, or unreachable, the monitoring system spends most of it’s time waiting for retries and timeouts.
In order to improve efficiency, OpenNMS Horizon deems all services on a interface to be Down if the critical service is Down. By default OpenNMS Horizon uses ICMP as the critical service.
The following image shows, how a Critical Services is used to generate these events.
-
(1) Critical services are all Up on the Node and just a nodeLostService is sent.
-
(2) Critical service of one of many IP interface is Down and interfaceDown is sent. All other services are not tested and no events are sent, the services are assumed as unreachable.
-
(3) All Critical services on the Node are Down and just a nodeDown is sent. All other services on the other IP Interfaces are not tested and no events are sent, these services are assumed as unreachable.
The Critical Service is used to correlate outages from Services to a nodeDown or interfaceDown event.
It is a global configuration of Pollerd defined in poller-configuration.xml
.
The OpenNMS Horizon default configuration enables this behavior.
<poller-configuration threads="30"
pathOutageEnabled="false"
serviceUnresponsiveEnabled="false">
<node-outage status="on" (1)
pollAllIfNoCriticalServiceDefined="true"> (2)
<critical-service name="ICMP" /> (3)
</node-outage>
1 | Enable Node Outage correlation based on a Critical Service |
2 | Optional: In case of nodes without a Critical Service this option controls the behavior.
If set to true then all services will be polled.
If set to false then the first service in the package that exists on the node will be polled until service is restored, and then polling will resume for all services. |
3 | Define Critical Service for Node Outage correlation |
5.3. Downtime Model
By default the monitoring interval for a service is 5 minutes. To detect also short services outages, caused for example by automatic network rerouting, the downtime model can be used. On a detected service outage, the interval is reduced to 30 seconds for 5 minutes. If the service comes back within 5 minutes, a shorter outage is documented and the impact on service availability can be less than 5 minutes. This behavior is called Downtime Model and is configurable.
In figure Outages and Downtime Model there are two outages. The first outage shows a short outage which was detected as up after 90 seconds. The second outage is not resolved now and the monitor has not detected an available service and was not available in the first 5 minutes (10 times 30 second polling). The scheduler changed the polling interval back to 5 minutes.
<downtime interval="30000" begin="0" end="300000" /><!-- 30s, 0, 5m -->(1)
<downtime interval="300000" begin="300000" end="43200000" /><!-- 5m, 5m, 12h -->(2)
<downtime interval="600000" begin="43200000" end="432000000" /><!-- 10m, 12h, 5d -->(3)
<downtime interval="3600000" begin="432000000" delete="never"/><!-- 1h, 5d -->(4)
1 | from 0 seconds after an outage is detected until 5 minutes, the polling interval will be set to 30 seconds |
2 | after 5 minutes of an ongoing outage until 12 hours, the polling interval will be set to 5 minutes |
3 | after 12 hours of an ongoing outage until 5 days, the polling interval will be set to 10 minutes |
4 | after 5 days of an ongoing outage the service will be polled only once a hour and we do not delete services |
The last downtime interval can have an attribute delete
and allows you to influence the service lifecycle.
It defines the behavior that happens if a service doesn’t come back online after 5 days.
The following downtime attributes for delete
can be used:
Value | description |
---|---|
|
services will never be deleted automatically |
|
only managed services will be deleted |
|
managed and unmanaged services will be deleted |
not set |
if |
5.4. Path Outages
An outage of a central network component can cause a lot of node outages.
Path Outages can be used to suppress Notifications based on how Nodes depend on each other in the network which are defined in a Critical Path.
The Critical Path needs to be configured from the network perspective of the monitoring system.
By default the Path Outage feature is disabled and has to be enabled in the poller-configuration.xml
.
The following image shows an example network topology.
From the perspective of the monitoring system, a Router named default-gw-01 is on the Critical Path to reach two networks. If Router default-gw-01 is down, it is not possible to reach any node in the two networks behind and they will be all unreachable as well. In this case an administrator would like to have just one notification for default-gw-01 and not for all the other Nodes behind. Building this configuration in OpenNMS Horizon requires the following information:
-
Parent Foreign Source: The Foreign Source where the parent node is defined.
-
Parent Foreign ID: The Foreign ID of the parent Node where this node depends on.
-
The IP Interface selected as Primary is used as Critical IP
In this example we have created all Nodes in a Provisioning Requisition named Network-ACME
and we use as the Foreign ID the same as the Node Label.
In the Web UI go to Admin → Configure OpenNMS → Manage Provisioning Requisitions → Edit the Requisition → Edit the Node → Path Outage to configure the network path by setting the Parent Foreign Source, Parent Foreign ID and Provisioned Node.
Parent Foreign Source | Parent Foreign ID | Provisioned Node |
---|---|---|
not defined |
not defined |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
The IP Interface which is set to Primary is selected as the Critical IP. In this example it is important the IP interface on default-gw-01 in the network 192.168.1.0/24 is set as Primary interface. The IP interface in the network 172.23.42.0/24 on default-gw-02 is set as Primary interface. |
5.5. Poller Packages
To define more complex monitoring configuration it is possible to group Service configurations into Polling Packages. They allow to assign to Nodes different Service Configurations. To assign a Polling Package to nodes the Rules/Filters syntax can be used. Each Polling Package can have its own Downtime Model configuration.
Multiple packages can be configured, and an interface can exist in more than one package. This gives great flexibility to how the service levels will be determined for a given device.
<package name="example1">(1)
<filter>IPADDR != '0.0.0.0'</filter>(2)
<include-range begin="1.1.1.1" end="254.254.254.254" />(3)
<include-range begin="::1" end="ffff:ffff:ffff:ffff:ffff:ffff:ffff:ffff" />(3)
1 | Unique name of the polling package. |
2 | Filter can be based on IP address, categories or asset attributes of Nodes based on Rules/Filters. The filter is evaluated first and is required. This package is used for all IP Interfaces which don’t have 0.0.0.0 as an assigned IP address and is required. |
3 | Allow to specify if the configuration of Services is applied on a range of IP Interfaces (IPv4 or IPv6). |
Instead of the include-range
it is possible to add one or more specific IP-Interfaces with:
<specific>192.168.1.59</specific>
It is also possible to exclude IP Interfaces with:
<exclude-range begin="192.168.0.100" end="192.168.0.104"/>
5.5.1. Response Time Configuration
The definition of Polling Packages allows to configure similar services with different polling intervals. All the response time measurements are persisted in RRD Files and require a definition. Each Polling Package contains a RRD definition
<package name="example1">
<filter>IPADDR != '0.0.0.0'</filter>
<include-range begin="1.1.1.1" end="254.254.254.254" />
<include-range begin="::1" end="ffff:ffff:ffff:ffff:ffff:ffff:ffff:ffff" />
<rrd step="300">(1)
<rra>RRA:AVERAGE:0.5:1:2016</rra>(2)
<rra>RRA:AVERAGE:0.5:12:1488</rra>(3)
<rra>RRA:AVERAGE:0.5:288:366</rra>(4)
<rra>RRA:MAX:0.5:288:366</rra>(5)
<rra>RRA:MIN:0.5:288:366</rra>(6)
</rrd>
1 | Polling interval for all services in this Polling Package is reflected in the step of size 300 seconds. All services in this package have to polled in 5 min interval, otherwise response time measurements are not correct persisted. |
2 | 1 step size is persisted 2016 times: 1 * 5 min * 2016 = 7 d, 5 min accuracy for 7 d. |
3 | 12 steps average persisted 1488 times: 12 * 5 min * 1488 = 62 d, aggregated to 60 min for 62 d. |
4 | 288 steps average persisted 366 times: 288 * 5 min * 366 = 366 d, aggregated to 24 h for 366 d. |
5 | 288 steps maximum from 24 h persisted for 366 d. |
6 | 288 steps minimum from 24 h persisted for 366 d. |
The RRD configuration and the service polling interval has to be aligned. In other cases the persisted response time data is not correct displayed in the response time graph. |
If the polling interval is changed afterwards, existing RRD files needs to be recreated with the new definitions. |
5.5.2. Overlapping Services
With the possibility of specifying multiple Polling Packages it is possible to use the same Service like ICMP multiple times.
The order how Polling Packages in the poller-configuration.xml
are defined is important when IP Interfaces match multiple Polling Packages with the same Service configuration.
The following example shows which configuration is applied for a specific service:
<package name="less-specific">
<filter>IPADDR != '0.0.0.0'</filter>
<include-range begin="1.1.1.1" end="254.254.254.254" />
<include-range begin="::1" end="ffff:ffff:ffff:ffff:ffff:ffff:ffff:ffff" />
<rrd step="300">(1)
<rra>RRA:AVERAGE:0.5:1:2016</rra>
<rra>RRA:AVERAGE:0.5:12:1488</rra>
<rra>RRA:AVERAGE:0.5:288:366</rra>
<rra>RRA:MAX:0.5:288:366</rra>
<rra>RRA:MIN:0.5:288:366</rra>
</rrd>
<service name="ICMP" interval="300000" user-defined="false" status="on">(2)
<parameter key="retry" value="5" />(3)
<parameter key="timeout" value="10000" />(4)
<parameter key="rrd-repository" value="/var/lib/opennms/rrd/response" />
<parameter key="rrd-base-name" value="icmp" />
<parameter key="ds-name" value="icmp" />
</service>
<downtime interval="30000" begin="0" end="300000" />
<downtime interval="300000" begin="300000" end="43200000" />
<downtime interval="600000" begin="43200000" end="432000000" />
</package>
<package name="more-specific">
<filter>IPADDR != '0.0.0.0'</filter>
<include-range begin="192.168.1.1" end="192.168.1.254" />
<include-range begin="2600::1" end="2600:::ffff" />
<rrd step="30">(1)
<rra>RRA:AVERAGE:0.5:1:20160</rra>
<rra>RRA:AVERAGE:0.5:12:14880</rra>
<rra>RRA:AVERAGE:0.5:288:3660</rra>
<rra>RRA:MAX:0.5:288:3660</rra>
<rra>RRA:MIN:0.5:288:3660</rra>
</rrd>
<service name="ICMP" interval="30000" user-defined="false" status="on">(2)
<parameter key="retry" value="2" />(3)
<parameter key="timeout" value="3000" />(4)
<parameter key="rrd-repository" value="/var/lib/opennms/rrd/response" />
<parameter key="rrd-base-name" value="icmp" />
<parameter key="ds-name" value="icmp" />
</service>
<downtime interval="10000" begin="0" end="300000" />
<downtime interval="300000" begin="300000" end="43200000" />
<downtime interval="600000" begin="43200000" end="432000000" />
</package>
1 | Polling interval in the packages are 300 seconds and 30 seconds |
2 | Different polling interval for the service ICMP |
3 | Different retry settings for the service ICMP |
4 | Different timeout settings for the service ICMP |
The last Polling Package on the service will be applied. This can be used to define a less specific catch all filter for a default configuration. A more specific Polling Package can be used to overwrite the default setting. In the example above all IP Interfaces in 192.168.1/24 or 2600:/64 will be monitored with ICMP with different polling, retry and timeout settings.
Which Polling Packages are applied to the IP Interface and Service can be found in the Web User Interface. The IP Interface and Service page show which Polling Package and Service configuration is applied for this specific service.
5.5.3. Service Patterns
Usually, the Poller used to monitor a Service is found by the matching the pollers name with the service name.
In addition, a matching poller can be found if an additional element pattern
is specified for the poller.
If so, the poller is used for all services matching the RegEx pattern, too.
The RegEx pattern allows to specify named capture groups. There can be multiple capture groups inside of a pattern, but each must have a unique name. Please note, that the RegEx must be escaped or be wrapped in a CDATA-Tag inside the configuration XML to make it a valid property.
If a poller is matched using its pattern, the parts of the service name which matches the capture groups of the pattern are available as parameters to the Metadata DSL using the context pattern
and the capture group name as key.
Examples:
<pattern><![CDATA[^HTTP-(?<vhost>.*)$]]></pattern>
-
Matches all services which names starts with
HTTP-
followed by a host name. If the services is calledHTTP-www.example.com
, the Metadata DSL expression${pattern:vhost}
will resolved towww.example.com
. <pattern><![CDATA[^HTTP-(?<vhost>.*?):(?<port>[0-9]+)$]]></pattern>"
-
Matches all services which names starts with
HTTP-
followed by a hostname and a port. There will be two variables (${pattern:vhost}
and${pattern:port}
) which can be used in the poller parameters.
The service pattern mechanism can be used to whenever there are multiple instances of a service on the same interface. By specifying a distinct service name to each instance, the services is identifiable, but there is no need to add a poller definition per service. Common use-cases for such services are HTTP Virtual Hosts, where multiple web applications run on the same web-server or BGP session monitoring where each router has multiple neighbours.
5.5.4. Test Services on manually
For troubleshooting it is possible to run a test via the Karaf Shell:
ssh -p 8101 admin@localhost
Once in the shell, you can print show the commands help as follows:
opennms> opennms:poll --help
DESCRIPTION
opennms:poll
Used to invoke a monitor against a host at a specified location
SYNTAX
opennms:poll [options] host [attributes]
ARGUMENTS
host
Hostname or IP Address of the system to poll
(required)
attributes
Monitor specific attributes in key=value form
OPTIONS
--help
Display this help message
-l, --location
Location
(defaults to Default)
-s, --system-id
System ID
-t, --ttl
Time to live
-P, --package
Poller Package
-S, --service
Service name
-n, --node-id
Node Id for Service
-c, --class
Monitor Class
The following example runs the ICMP monitor on a specific IP Interface.
opennms> opennms:poll -S ICMP -P example1 10.23.42.1
The output is verbose which allows debugging of Monitor configurations. Important output lines are shown as the following:
Package: example1 (1)
Service: ICMP (2)
Monitor: org.opennms.netmgt.poller.monitors.IcmpMonitor (3)
Parameter ds-name: icmp (4)
Parameter retry: 2 (5)
Parameter rrd-base-name: icmp (4)
Parameter rrd-repository: /opt/opennms/share/rrd/response (4)
Parameter timeout: 3000 (5)
Service is Up on 192.168.31.100 using org.opennms.netmgt.poller.monitors.IcmpMonitor: (6)
response-time: 407,0000 (7)
1 | Service and Package of this test |
2 | Applied Service configuration from Polling Package for this test |
3 | Service Monitor used for this test |
4 | RRD configuration for response time measurement |
5 | Retry and timeout settings for this test |
6 | Polling result for the service polled against the IP address |
7 | Response time |
5.5.5. Test filters on Karaf Shell
Filters are ubiquitous in opennms configurations with <filter> syntax. This karaf shell can be used to verify filters. For more info, refer to Filters.
ssh -p 8101 admin@localhost
Once in the shell, print command help as follows
opennms> opennms:filter --help
DESCRIPTION
opennms:filter
Enumerates nodes/interfaces that match a give filter
SYNTAX
opennms:filter filterRule
ARGUMENTS
filterRule
A filter Rule
For ex: Run a filter rule that match a location
opennms:filter "location='MINION'"
Output is displayed as follows
nodeId=2 nodeLabel=00000000-0000-0000-0000-000000ddba11 location=MINION
IpAddresses:
127.0.0.1
Another ex: Run a filter that match a node location and for a given IP Address range. Refer to IPLIKE for more info on using IPLIKE syntax.
opennms:filter "location='Default' & (IPADDR IPLIKE 172.*.*.*)"
Output is displayed as follows
nodeId=3 nodeLabel=label1 location=Default
IpAddresses:
172.10.154.1
172.20.12.12
172.20.2.14
172.01.134.1
172.20.11.15
172.40.12.18
nodeId=5 nodeLabel=label2 location=Default
IpAddresses:
172.17.0.111
nodeId=6 nodeLabel=label3 location=Default
IpAddresses:
172.20.12.22
172.17.0.123
Node info displayed will have nodeId, nodeLabel, location and optional fileds like foreignId, foreignSource, categories when they exist. |
5.6. Service monitors
To support several specific applications and management agents, Pollerd executes Service Monitors. This section describes all available built-in Service Monitors which are available and can be configured to allow complex monitoring. For information how these can be extended, see Development Guide of the OpenNMS documentation.
5.6.1. Common Configuration Parameters
Application or Device specific Monitors are based on a generic API which provide common configuration parameters. These minimal configuration parameters are available in all Monitors and describe the behavior for timeouts, retries, etc.
Parameter | Description | Required | Default value |
---|---|---|---|
|
Number of attempts to test a Service to be up or down. |
optional |
|
|
Timeout for the isReachable method, in milliseconds. |
optional |
|
|
Invert the up/down behavior of the monitor |
optional |
|
In case the Monitor is using the SNMP Protocol the default configuration for timeout and retry are used from the SNMP Configuration (snmp-config.xml ).
|
Minion Configuration Parameters
When nodes are configured with a non-default location, the associated Service Monitors are executed on a Minion configured with that same location. If there are many Minions at a given location, the Service Monitor may be executed on any of the Minions that are currently available. Users can choose to execute a Service Monitor on a specific Minion, by specifying the System ID of the Minion. This mechanism is used for monitoring the Minions individually.
The following parameters can be used to override this behavior and control where the Service Monitors are executed.
Parameter | Description | Required | Default value |
---|---|---|---|
|
Specify the location at which the Service Monitor should be executed. |
optional |
(The location of the associated node) |
|
Specify the System ID on which the Service Monitor should be executed |
optional |
(None) |
|
Use the foreign id of the associated node as the System ID |
optional |
|
When specifying a System ID the location should also be set to the corresponding location for that system. |
5.6.2. Using Placeholders in Parameters
Some monitor parameters support placeholder substitution.
You can reference some node, interface, and asset record properties by enclosing them in {
and }
.
The supported properties are:
-
nodeId
-
nodeLabel
-
foreignSource
-
foreignId
-
ipAddr
(oripAddress
) -
all node asset record fields (e.g.
username
,password
)
Parameters that support placeholder substitution are marked 'Yes' in the 'Placeholder substitution' column of the Configuaration and Usage section of the monitor documentation.
5.6.3. ActiveMQMonitor
This monitor tests the availablity of an ActiveMQ Broker. The service is considered available if a successful connection is made.
Monitor facts
Class Name |
|
Configuration and Usage
Parameter | Description | Required | Default value |
---|---|---|---|
|
The ActiveMQ Broker URL to connect to. |
required |
|
|
The user name used to login to the ActiveMQ broker. |
optional |
|
|
The password used to authenticate the user on the ActiveMQ broker. |
optional |
|
|
A boolean to enable using the nodelabel when connecting to the ActiveMQ broker. |
optional |
|
|
A boolean to enable creating a JMS Session when connecting to the ActiveMQ broker. |
optional |
|
|
The client ID to use when connecting to the ActiveMQ broker. |
optional |
|
This monitor implements the Common Configuration Parameters.
Examples
Some example configuration how to configure the monitor in the poller-configuration.xml
.
<parameter key="broker-url" value="failover://auto+ssl://192.168.1.1:61616/"/>
<parameter key="use-nodelabel" value="true"/>
5.6.4. AvailabilityMonitor
This monitor tests reachability of a node by using the isReachable method of the InetAddress java class. The service is considered available if isReachable returns true. See Oracle’s documentation for more details.
Monitor facts
Class Name |
|
Remote Enabled |
true |
Configuration and Usage
This monitor implements the Common Configuration Parameters.
Examples
<service name="AVAIL" interval="300000" user-defined="false" status="on">
<parameter key="retry" value="2"/>
<parameter key="timeout" value="5000"/>
</service>
<monitor service="AVAIL" class-name="org.opennms.netmgt.poller.monitors.AvailabilityMonitor"/>
IcmpMonitor vs AvailabilityMonitor
This monitor has been developed in a time when the IcmpMonitor monitor wasn’t remote enabled, to circumvent this limitation. Now, with the JNA ICMP implementation, the IcmpMonitor monitor is remote enabled under most configurations and this monitor shouldn’t be needed.
5.6.5. BgpSessionMonitor
This monitor checks if a BGP-Session to a peering partner (peer-ip) is functional. To monitor the BGP-Session the RFC1269 SNMP MIB is used and test the status of the session using the following OIDs is used:
BGP_PEER_STATE_OID = .1.3.6.1.2.1.15.3.1.2.<peer-ip> BGP_PEER_ADMIN_STATE_OID = .1.3.6.1.2.1.15.3.1.3.<peer-ip> BGP_PEER_REMOTEAS_OID = .1.3.6.1.2.1.15.3.1.9.<peer-ip> BGP_PEER_LAST_ERROR_OID = .1.3.6.1.2.1.15.3.1.14.<peer-ip> BGP_PEER_FSM_EST_TIME_OID = .1.3.6.1.2.1.15.3.1.16.<peer-ip>
The <peer-ip>
is the far end IP address of the BGP session end point.
A SNMP get request for BGP_PEER_STATE_OID
returns a result between 1
to 6
.
The servicestates for OpenNMS Horizon are mapped as follows:
Result | State description | Monitor state in OpenNMS Horizon |
---|---|---|
|
Idle |
DOWN |
|
Connect |
DOWN |
|
Active |
DOWN |
|
OpenSent |
DOWN |
|
OpenConfirm |
DOWN |
|
Established |
UP |
Monitor facts
Class Name |
|
Remote Enabled |
false |
To define the mapping I used the description from RFC1771 BGP Finite State Machine.
Configuration and Usage
Parameter | Description | Required | Default value |
---|---|---|---|
|
IP address of the far end BGP peer session |
required |
|
This monitor implements the Common Configuration Parameters.
Examples
To monitor the session state Established it is necessary to add a service to your poller configuration in '$OPENNMS_HOME/etc/poller-configuration.xml', for example:
<!-- Example configuration poller-configuration.xml -->
<service name="BGP-Peer-99.99.99.99-AS65423" interval="300000"
user-defined="false" status="on">
<parameter key="retry" value="2" />
<parameter key="timeout" value="3000" />
<parameter key="port" value="161" />
<parameter key="bgpPeerIp" value="99.99.99.99" />
</service>
<monitor service="BGP-Peer-99.99.99.99-AS65423" class-name="org.opennms.netmgt.poller.monitors.BgpSessionMonitor" />
Error code mapping
The BGP_PEER_LAST_ERROR_OID gives an error in HEX-code. To make it human readable a codemapping table is implemented:
Error code | Error Message |
---|---|
|
Message Header Error |
|
Message Header Error - Connection Not Synchronized |
|
Message Header Error - Bad Message Length |
|
Message Header Error - Bad Message Type |
|
OPEN Message Error |
|
OPEN Message Error - Unsupported Version Number |
|
OPEN Message Error - Bad Peer AS |
|
OPEN Message Error - Bad BGP Identifier |
|
OPEN Message Error - Unsupported Optional Parameter |
|
OPEN Message Error (deprecated) |
|
OPEN Message Error - Unacceptable Hold Time |
|
UPDATE Message Error |
|
UPDATE Message Error - Malformed Attribute List |
|
UPDATE Message Error - Unrecognized Well-known Attribute |
|
UPDATE Message Error - Missing Well-known Attribute |
|
UPDATE Message Error - Attribute Flags Error |
|
UPDATE Message Error - Attribute Length Error |
|
UPDATE Message Error - Invalid ORIGIN Attribute |
|
UPDATE Message Error (deprecated) |
|
UPDATE Message Error - Invalid NEXT_HOP Attribute |
|
UPDATE Message Error - Optional Attribute Error |
|
UPDATE Message Error - Invalid Network Field |
|
UPDATE Message Error - Malformed AS_PATH |
|
Hold Timer Expired |
|
Finite State Machine Error |
|
Cease |
|
Cease - Maximum Number of Prefixes Reached |
|
Cease - Administrative Shutdown |
|
Cease - Peer De-configured |
|
Cease - Administrative Reset |
|
Cease - Connection Rejected |
|
Cease - Other Configuration Change |
|
Cease - Connection Collision Resolution |
|
Cease - Out of Resources |
Instead of HEX-Code the error message will be displayed in the service down logmessage. To give some additional informations the logmessage contains also
BGP-Peer Adminstate BGP-Peer Remote AS BGP-Peer established time in seconds
Debugging
If you have problems to detect or monitor the BGP Session you can use the following command to figure out where the problem come from.
snmpwalk -v 2c -c <myCommunity> <myRouter2Monitor> .1.3.6.1.2.1.15.3.1.2.99.99.99.99
Replace 99.99.99.99
with your BGP-Peer IP.
The result should be an Integer between 1
and 6
.
5.6.6. BSFMonitor
This monitor runs a Bean Scripting Framework BSF compatible script to determine the status of a service. Users can write scripts to perform highly custom service checks. This monitor is not optimised for scale. It’s intended for a small number of custom checks or prototyping of monitors.
BSFMonitor vs SystemExecuteMonitor
The BSFMonitor avoids the overhead of fork(2) that is used by the SystemExecuteMonitor. BSFMonitor also grants access to a selection of OpenNMS Horizon internal methods and classes that can be used in the script.
Monitor facts
Class Name |
|
Remote Enabled |
false |
Configuration and Usage
Parameter | Description | Required | Default value |
---|---|---|---|
|
Path to the script file. |
required |
|
|
The BSF Engine to run the script in different languages like |
required |
|
|
one of |
optional |
|
|
The BSF language class, like |
optional |
file-name extension is interpreted by default |
|
comma-separated list |
optional |
|
This monitor implements the Common Configuration Parameters.
Variable | Type | Description |
---|---|---|
|
Map<String, Object> |
The map contains all various parameters passed to the monitor
from the service definition it the |
|
String |
The IP address that is currently being polled. |
|
int |
The Node ID of the node the |
|
String |
The Node Label of the node the |
|
String |
The name of the service that is being polled. |
|
BSFMonitor |
The instance of the BSFMonitor object calling the script. Useful for logging via its log(String sev, String fmt, Object... args) method. |
|
HashMap<String, String> |
The script is expected to put its results into this object.
The status indication should be set into the entry with key |
|
LinkedHashMap<String, Number> |
The script is expected to put one or more response times into this object. |
Additionally every parameter added to the service definition in poller-configuration.xml
is available as a String object in the script.
The key attribute of the parameter represents the name of the String object and the value attribute represents the value of the String object.
Please keep in mind, that these parameters are also accessible via the map bean. |
Avoid non-character names for parameters to avoid problems in the script languages. |
Response Codes
The script has to provide a status code that represents the status of the associated service. The following status codes are defined:
Code | Description |
---|---|
OK |
Service is available |
UNK |
Service status unknown |
UNR |
Service is unresponsive |
NOK |
Service is unavailable |
Response time tracking
By default the BSFMonitor tracks the whole time the script file consumes as the response time. If the response time should be persisted the response time add the following parameters:
poller-configuration.xml
<!-- where in the filesystem response times are stored -->
<parameter key="rrd-repository" value="/opt/opennms/share/rrd/response" />
<!-- name of the rrd file -->
<parameter key="rrd-base-name" value="minimalbshbase" />
<!-- name of the data source in the rrd file -->
<!-- by default "response-time" is used as ds-name -->
<parameter key="ds-name" value="myResponseTime" />
It is also possible to return one or many response times directly from the script.
To add custom response times or override the default one, add entries to the times object.
The entries are keyed with a String that names the datasource and have as values a number that represents the response time.
To override the default response time datasource add an entry into times
named response-time
.
Timeout and Retry
The BSFMonitor does not perform any timeout or retry processing on its own. If retry and or timeout behaviour is required, it has to be implemented in the script itself.
Requirements for the script (run-types)
Depending on the run-type
the script has to provide its results in different ways.
For minimal scripts with very simple logic run-type
eval
is the simple option.
Scripts running in eval
mode have to return a String matching one of the status codes
.
If your script is more than a one-liner, run-type
exec
is essentially required.
Scripts running in exec
mode need not return anything, but they have to add a status
entry with a status code
to the results object.
Additionally, the results object can also carry a "reason":"message" entry that is used in non OK
states.
Commonly used language settings
The BSF supports many languages, the following table provides the required setup for commonly used languages.
Language | lang-class | bsf-engine | required library |
---|---|---|---|
beanshell |
|
supported by default |
|
groovy |
|
groovy-all-[version].jar |
|
jython |
|
jython-[version].jar |
Example Bean Shell
poller-configuration.xml
<service name="MinimalBeanShell" interval="300000" user-defined="true" status="on">
<parameter key="file-name" value="/tmp/MinimalBeanShell.bsh"/>
<parameter key="bsf-engine" value="bsh.util.BeanShellBSFEngine"/>
</service>
<monitor service="MinimalBeanShell" class-name="org.opennms.netmgt.poller.monitors.BSFMonitor" />
MinimalBeanShell.bsh
script filebsf_monitor.log("ERROR", "Starting MinimalBeanShell.bsf", null);
File testFile = new File("/tmp/TestFile");
if (testFile.exists()) {
return "OK";
} else {
results.put("reason", "file does not exist");
return "NOK";
}
Example Groovy
To use the Groovy language an additional library is required.
Copy a compatible groovy-all.jar into to opennms/lib
folder and restart OpenNMS Horizon.
That makes Groovy available for the BSFMonitor.
poller-configuration.xml
with default run-type
set to eval
<service name="MinimalGroovy" interval="300000" user-defined="true" status="on">
<parameter key="file-name" value="/tmp/MinimalGroovy.groovy"/>
<parameter key="bsf-engine" value="org.codehaus.groovy.bsf.GroovyEngine"/>
</service>
<monitor service="MinimalGroovy" class-name="org.opennms.netmgt.poller.monitors.BSFMonitor" />
MinimalGroovy.groovy
script file for run-type
eval
bsf_monitor.log("ERROR", "Starting MinimalGroovy.groovy", null);
File testFile = new File("/tmp/TestFile");
if (testFile.exists()) {
return "OK";
} else {
results.put("reason", "file does not exist");
return "NOK";
}
poller-configuration.xml
with run-type
set to exec
<service name="MinimalGroovy" interval="300000" user-defined="true" status="on">
<parameter key="file-name" value="/tmp/MinimalGroovy.groovy"/>
<parameter key="bsf-engine" value="org.codehaus.groovy.bsf.GroovyEngine"/>
<parameter key="run-type" value="exec"/>
</service>
<monitor service="MinimalGroovy" class-name="org.opennms.netmgt.poller.monitors.BSFMonitor" />
MinimalGroovy.groovy
script file for run-type
set to exec
bsf_monitor.log("ERROR", "Starting MinimalGroovy", null);
def testFile = new File("/tmp/TestFile");
if (testFile.exists()) {
results.put("status", "OK")
} else {
results.put("reason", "file does not exist");
results.put("status", "NOK");
}
Example Jython
To use the Jython (Java implementation of Python) language an additional library is required.
Copy a compatible jython-x.y.z.jar
into the opennms/lib
folder and restart OpenNMS Horizon.
That makes Jython available for the BSFMonitor.
poller-configuration.xml
with run-type
exec
<service name="MinimalJython" interval="300000" user-defined="true" status="on">
<parameter key="file-name" value="/tmp/MinimalJython.py"/>
<parameter key="bsf-engine" value="org.apache.bsf.engines.jython.JythonEngine"/>
<parameter key="run-type" value="exec"/>
</service>
<monitor service="MinimalJython" class-name="org.opennms.netmgt.poller.monitors.BSFMonitor" />
MinimalJython.py
script file for run-type
set to exec
from java.io import File
bsf_monitor.log("ERROR", "Starting MinimalJython.py", None);
if (File("/tmp/TestFile").exists()):
results.put("status", "OK")
else:
results.put("reason", "file does not exist")
results.put("status", "NOK")
We have to use run-type exec here because Jython chokes on the import keyword in eval mode.
|
As proof that this is really Python, notice the substitution of Python’s None value for Java’s null in the log call. |
Advanced examples
The following example references all beans that are exposed to the script, including a custom parameter.
poller-configuration.xml
<service name="MinimalGroovy" interval="30000" user-defined="true" status="on">
<parameter key="file-name" value="/tmp/MinimalGroovy.groovy"/>
<parameter key="bsf-engine" value="org.codehaus.groovy.bsf.GroovyEngine"/>
<!-- custom parameters (passed to the script) -->
<parameter key="myParameter" value="Hello Groovy" />
<!-- optional for response time tracking -->
<parameter key="rrd-repository" value="/opt/opennms/share/rrd/response" />
<parameter key="rrd-base-name" value="minimalgroovybase" />
<parameter key="ds-name" value="minimalgroovyds" />
</service>
<monitor service="MinimalGroovy" class-name="org.opennms.netmgt.poller.monitors.BSFMonitor" />
bsf_monitor.log("ERROR", "Starting MinimalGroovy", null);
//list of all available objects from the BSFMonitor
Map<String, Object> map = map;
bsf_monitor.log("ERROR", "---- map ----", null);
bsf_monitor.log("ERROR", map.toString(), null);
String ip_addr = ip_addr;
bsf_monitor.log("ERROR", "---- ip_addr ----", null);
bsf_monitor.log("ERROR", ip_addr, null);
int node_id = node_id;
bsf_monitor.log("ERROR", "---- node_id ----", null);
bsf_monitor.log("ERROR", node_id.toString(), null);
String node_label = node_label;
bsf_monitor.log("ERROR", "---- node_label ----", null);
bsf_monitor.log("ERROR", node_label, null);
String svc_name = svc_name;
bsf_monitor.log("ERROR", "---- svc_name ----", null);
bsf_monitor.log("ERROR", svc_name, null);
org.opennms.netmgt.poller.monitors.BSFMonitor bsf_monitor = bsf_monitor;
bsf_monitor.log("ERROR", "---- bsf_monitor ----", null);
bsf_monitor.log("ERROR", bsf_monitor.toString(), null);
HashMap<String, String> results = results;
bsf_monitor.log("ERROR", "---- results ----", null);
bsf_monitor.log("ERROR", results.toString(), null);
LinkedHashMap<String, Number> times = times;
bsf_monitor.log("ERROR", "---- times ----", null);
bsf_monitor.log("ERROR", times.toString(), null);
// reading a parameter from the service definition
String myParameter = myParameter;
bsf_monitor.log("ERROR", "---- myParameter ----", null);
bsf_monitor.log("ERROR", myParameter, null);
// minimal example
def testFile = new File("/tmp/TestFile");
if (testFile.exists()) {
bsf_monitor.log("ERROR", "Done MinimalGroovy ---- OK ----", null);
return "OK";
} else {
results.put("reason", "file does not exist");
bsf_monitor.log("ERROR", "Done MinimalGroovy ---- NOK ----", null);
return "NOK";
}
5.6.7. CiscoIpSlaMonitor
This monitor can be used to monitor IP SLA configurations on your Cisco devices. This monitor supports the following SNMP OIDS from CISCO-RTT-MON-MIB:
RTT_ADMIN_TAG_OID = .1.3.6.1.4.1.9.9.42.1.2.1.1.3 RTT_OPER_STATE_OID = .1.3.6.1.4.1.9.9.42.1.2.9.1.10 RTT_LATEST_OPERSENSE_OID = .1.3.6.1.4.1.9.9.42.1.2.10.1.2 RTT_ADMIN_THRESH_OID = .1.3.6.1.4.1.9.9.42.1.2.1.1.5 RTT_ADMIN_TYPE_OID = .1.3.6.1.4.1.9.9.42.1.2.1.1.4 RTT_LATEST_OID = .1.3.6.1.4.1.9.9.42.1.2.10.1.1
The monitor can be run in two scenarios. The first one tests the RTT_LATEST_OPERSENSE which is a sense code for the completion status of the latest RTT operation. If the RTT_LATEST_OPERSENSE returns ok(1) the service is marked as up.
The second scenario is to monitor the configured threshold in the IP SLA config. If the RTT_LATEST_OPERSENSE returns with overThreshold(3) the service is marked down.
Monitor facts
Class Name |
|
Remote Enabled |
false |
Configuration and Usage
Parameter | Description | Required | Default value |
---|---|---|---|
|
The |
required |
|
|
Boolean indicates if just the status or configured threshold should be monitored. |
required |
`` |
This monitor implements the Common Configuration Parameters.
Example for HTTP and ICMP echo reply
In this example we configure an IP SLA entry to monitor Google’s website with HTTP GET from the Cisco device.
We use 8.8.8.8 as our DNS resolver.
In our example our SLA says we should reach Google’s website within 200ms.
To advise co-workers that this monitor entry is used for monitoring, I set the owner to OpenNMS.
The tag
is used to identify the entry later in the SNMP table for monitoring.
ip sla monitor 1
type http operation get url http://www.google.de name-server 8.8.8.8
timeout 3000
threshold 200
owner OpenNMS
tag Google Website
ip sla monitor schedule 3 life forever start-time now
In the second example we configure a IP SLA to test if the IP address from www.opennms.org is reachable with ICMP from the perspective of the Cisco device. Like the example above we have a threshold and a timeout.
ip sla 1
icmp-echo 64.146.64.212
timeout 3000
threshold 150
owner OpenNMS
tag OpenNMS Host
ip sla schedule 1 life forever start-time now
It´s not possible to reconfigure an IP SLA entry. If you want to change parameters, you have to delete the whole configuration and reconfigure it with your new parameters. Backup your Cisco configuration manually or take a look at RANCID. |
To monitor both of the entries the configuration in poller-configuration.xml
requires two service definition entries:
<service name="IP-SLA-WEB-Google" interval="300000"
user-defined="false" status="on">
<parameter key="retry" value="2" />
<parameter key="timeout" value="3000" />
<parameter key="admin-tag" value="Google Website" />
<parameter key="ignore-thresh" value="false" />(1)
</service>
<service name="IP-SLA-PING-OpenNMS" interval="300000"
user-defined="false" status="on">
<parameter key="retry" value="2" />
<parameter key="timeout" value="3000" />
<parameter key="admin-tag" value="OpenNMS Host" />
<parameter key="ignore-thresh" value="true" />(2)
</service>
<monitor service="IP-SLA-WEB-Google" class-name="org.opennms.netmgt.poller.monitors.CiscoIpSlaMonitor" />
<monitor service="IP-SLA-PING-OpenNMS" class-name="org.opennms.netmgt.poller.monitors.CiscoIpSlaMonitor" />
1 | Service is up if the IP SLA state is ok(1) |
2 | Service is down if the IP SLA state is overThreshold(3) |
5.6.8. CiscoPingMibMonitor
This poller monitor’s purpose is to create conceptual rows (entries) in the ciscoPingTable on Cisco IOS devices that support the CISCO-PING-MIB. These entries direct the remote IOS device to ping an IPv4 or IPv6 address with a configurable set of parameters. After the IOS device has completed the requested ping operations, the poller monitor queries the IOS device to determine the results. If the results indicate success according to the configured parameters in the service configuration, then the monitored service is reported as available and the results are available for optional time-series (RRD) storage. If the results indicate failure, the monitored service is reported unavailable with a descriptive reason code. If something goes wrong during the setup of the entry or the subsequent querying of its status, the monitored service is reported to be in an unknown state.
Unlike most poller monitors, the CiscoPingMibMonitor does not interpret the timeout and retries parameters to determine when a poll attempt has timed out or whether it should be attempted again.
The packet-count and packet-timeout parameters instead service this purpose from the perspective of the remote IOS device.
|
ciscoPingEntry 1.3.6.1.4.1.9.9.16.1.1.1
ciscoPingSerialNumber 1.3.6.1.4.1.9.9.16.1.1.1.1
ciscoPingProtocol 1.3.6.1.4.1.9.9.16.1.1.1.2
ciscoPingAddress 1.3.6.1.4.1.9.9.16.1.1.1.3
ciscoPingPacketCount 1.3.6.1.4.1.9.9.16.1.1.1.4
ciscoPingPacketSize 1.3.6.1.4.1.9.9.16.1.1.1.5
ciscoPingPacketTimeout 1.3.6.1.4.1.9.9.16.1.1.1.6
ciscoPingDelay 1.3.6.1.4.1.9.9.16.1.1.1.7
ciscoPingTrapOnCompletion 1.3.6.1.4.1.9.9.16.1.1.1.8
ciscoPingSentPackets 1.3.6.1.4.1.9.9.16.1.1.1.9
ciscoPingReceivedPackets 1.3.6.1.4.1.9.9.16.1.1.1.10
ciscoPingMinRtt 1.3.6.1.4.1.9.9.16.1.1.1.11
ciscoPingAvgRtt 1.3.6.1.4.1.9.9.16.1.1.1.12
ciscoPingMaxRtt 1.3.6.1.4.1.9.9.16.1.1.1.13
ciscoPingCompleted 1.3.6.1.4.1.9.9.16.1.1.1.14
ciscoPingEntryOwner 1.3.6.1.4.1.9.9.16.1.1.1.15
ciscoPingEntryStatus 1.3.6.1.4.1.9.9.16.1.1.1.16
ciscoPingVrfName 1.3.6.1.4.1.9.9.16.1.1.1.17
Prerequisites
-
One or more Cisco devices running an IOS image of recent vintage; any 12.2 or later image is probably fine. Even very low-end devices appear to support the CISCO-PING-MIB.
-
The IOS devices that will perform the remote pings must be configured with an SNMP write community string whose source address access-list includes the address of the OpenNMS Horizon server and whose MIB view (if any) includes the OID of the ciscoPingTable.
-
The corresponding SNMP write community string must be specified in the
write-community
attribute of either the top-level<snmp-config>
element ofsnmp-config.xml
or a<definition>
child element that applies to the SNMP-primary interface of the IOS device(s) that will perform the remote pings.
Scalability concerns
This monitor spends a fair amount of time sleeping while it waits for the remote IOS device to complete the requested ping operations.
The monitor is pessimistic in calculating the delay between creation of the ciscoPingTable entry and its first attempt to retrieve the results of that entry’s ping operations — it will always wait at least (packet-count * (packet-timeout + packet-delay)
) milliseconds before even checking whether the remote pings have completed.
It’s therefore prone to hogging poller threads if used with large values for the packet-count
, packet-timeout
, and/or packet-delay
parameters.
Keep these values as small as practical to avoid tying up poller threads unnecessarily.
This monitor always uses the current time in whole seconds since the UNIX epoch as the instance identifier of the ciscoPingTable entries that it creates. The object that holds this identifier is a signed 32-bit integer type, precluding a finer resolution. It’s probably a good idea to mix in the least-significant byte of the millisecond-accurate time as a substitute for that of the whole-second-accurate value to avoid collisions. IOS seems to clean up entries in this table within a manner of minutes after their ping operations have completed.
Monitor facts
Class Name |
|
Remote Enabled |
false |
Configuration and Usage
Parameter | Description | Required | Default value |
---|---|---|---|
|
SNMP protocol version (1, 2c, or 3) to use for operations performed by this service monitor. Do not use with out a very good reason to do so. |
optional |
from |
|
Number of ping packets that the remote IOS device should send. |
optional |
|
|
Size, in bytes, of each ping packet that the remote IOS device should send. |
optional |
|
|
Timeout, in milliseconds, of each ping packet sent by the remote IOS device. |
optional |
|
|
Delay, in milliseconds, between ping packets sent by the remote IOS device. |
optional |
|
|
String value to set as the value of ciscoPingEntryOwner of entries created for this service. |
optional |
|
|
String value to set as the VRF (VLAN) name in whose context the remote IOS device should perform the pings for this service. |
optional |
empty String |
|
Numeric database identifier of the node whose primary SNMP interface should be used
as the proxy for this service. If specified along with the related
|
optional |
|
|
|
optional |
|
|
IP address of the interface that should be used as the proxy for this service.
Effective only if none of |
optional |
|
|
IP address that the remote IOS device should ping. A value of |
optional |
|
|
A whole-number percentage of pings that must succeed (from the perspective of the
remote IOS device) in order for this service to be considered available. As an
example, if |
optional |
|
|
Base directory of an RRD repository in which to store this service monitor’s response-time samples |
optional |
|
|
Name of the RRD datasource (DS) name in which to store this service monitor’s
response-time samples; rrd-base-name Base name of the RRD file (minus the |
optional |
|
This monitor implements the Common Configuration Parameters.
This is optional just if you can use variables in the configuration.
Variable | Description |
---|---|
|
This value will be substituted with the IP address of the interface on which the monitored service appears. |
Example: Ping the same non-routable address from all routers of customer Foo
A service provider’s client, Foo Corporation, has network service at multiple locations. At each Foo location, a point-of-sale system is statically configured at IPv4 address 192.168.255.1. Foo wants to be notified any time a point-of-sale system becomes unreachable. Using an OpenNMS Horizon remote location monitor is not feasible. All of Foo Corporation’s CPE routers must be Cisco IOS devices in order to achieve full coverage in this scenario.
One approach to this requirement is to configure all of Foo Corporation’s premise routers to be in the surveillance categories Customer_Foo, CPE, and Routers, and to use a filter to create a poller package that applies only to those routers.
We will use the special value ${ipaddr}
for the proxy-ip-addr
parameter so that the remote pings will be provisioned on each Foo CPE router.
Since we want each Foo CPE router to ping the same IP address 192.168.255.1, we statically list that value for the target-ip-addr
address.
<package name="ciscoping-foo-pos">
<filter>catincCustomer_Foo & catincCPE & catincRouters & nodeSysOID LIKE '.1.3.6.1.4.1.9.%'</filter>
<include-range begin="0.0.0.0" end="254.254.254.254" />
<rrd step="300">
<rra>RRA:AVERAGE:0.5:1:2016</rra>
<rra>RRA:AVERAGE:0.5:12:1488</rra>
<rra>RRA:AVERAGE:0.5:288:366</rra>
<rra>RRA:MAX:0.5:288:366</rra>
<rra>RRA:MIN:0.5:288:366</rra>
</rrd>
<service name="FooPOS" interval="300000" user-defined="false" status="on">
<parameter key="rrd-repository" value="/opt/opennms/share/rrd/response" />
<parameter key="rrd-base-name" value="ciscoping" />
<parameter key="ds-name" value="ciscoping" />
<parameter key="proxy-ip-addr" value="${ipaddr}" />
<parameter key="target-ip-addr" value="192.168.255.1" />
</service>
<downtime interval="30000" begin="0" end="300000" /><!-- 30s, 0, 5m -->
<downtime interval="300000" begin="300000" end="43200000" /><!-- 5m, 5m, 12h -->
<downtime interval="600000" begin="43200000" end="432000000" /><!-- 10m, 12h, 5d -->
<downtime begin="432000000" delete="true" /><!-- anything after 5 days delete -->
</package>
<monitor service="FooPOS" class-name="org.opennms.netmgt.poller.monitors.CiscoPingMibMonitor" />
Example: Ping from a single IOS device routable address of each router of customer Bar
A service provider’s client, Bar Limited, has network service at multiple locations. While OpenNMS Horizon' world-class service assurance is generally sufficient, Bar also wants to be notified any time a premise router at one of their locations unreachable from the perspective of an IOS device in Bar’s main data center. Some or all of the Bar Limited CPE routers may be non-Cisco devices in this scenario.
To meet this requirement, our approach is to configure Bar Limited’s premise routers to be in the surveillance categories Customer_Bar, CPE, and Routers, and to use a filter to create a poller package that applies only to those routers.
This time, though, we will use the special value ${ipaddr}
not in the proxy-ip-addr
parameter but in the target-ip-addr
parameter so that the remote pings will be performed for each Bar CPE router.
Since we want the same IOS device 20.11.5.11 to ping the CPE routers, we statically list that value for the proxy-ip-addr
address.
Example poller-configuration.xml
additions
<package name="ciscoping-bar-cpe">
<filter>catincCustomer_Bar & catincCPE & catincRouters</filter>
<include-range begin="0.0.0.0" end="254.254.254.254" />
<rrd step="300">
<rra>RRA:AVERAGE:0.5:1:2016</rra>
<rra>RRA:AVERAGE:0.5:12:1488</rra>
<rra>RRA:AVERAGE:0.5:288:366</rra>
<rra>RRA:MAX:0.5:288:366</rra>
<rra>RRA:MIN:0.5:288:366</rra>
</rrd>
<service name="BarCentral" interval="300000" user-defined="false" status="on">
<parameter key="rrd-repository" value="/opt/opennms/share/rrd/response" />
<parameter key="rrd-base-name" value="ciscoping" />
<parameter key="ds-name" value="ciscoping" />
<parameter key="proxy-ip-addr" value="20.11.5.11" />
<parameter key="target-ip-addr" value="${ipaddr}" />
</service>
<downtime interval="30000" begin="0" end="300000" /><!-- 30s, 0, 5m -->
<downtime interval="300000" begin="300000" end="43200000" /><!-- 5m, 5m, 12h -->
<downtime interval="600000" begin="43200000" end="432000000" /><!-- 10m, 12h, 5d -->
<downtime begin="432000000" delete="true" /><!-- anything after 5 days delete -->
</package>
<monitor service="BarCentral" class-name="org.opennms.netmgt.poller.monitors.CiscoPingMibMonitor" />
5.6.9. CitrixMonitor
This monitor is used to test if a Citrix® Server or XenApp Server® is providing the Independent Computing Architecture (ICA) protocol on TCP 1494.
The monitor opens a TCP socket and tests the greeting banner returns with ICA
, otherwise the service is unavailable.
Monitor facts
Class Name |
|
Remote Enabled |
true |
Configuration and Usage
Parameter | Description | Required | Default value |
---|---|---|---|
|
TCP port where the ICA protocol is listening. |
optional |
|
This monitor implements the Common Configuration Parameters.
If you have configure the Metaframe Presentation Server Client using Session Reliability, the TCP port is 2598 instead of 1494. You can find additional information on CTX104147. It is not verified if the monitor works in this case. |
Examples
The following example configures OpenNMS Horizon to monitor the ICA protocol on TCP 1494 with 2 retries and waiting 5 seconds for each retry.
<service name="Citrix-TCP-ICA" interval="300000" user-defined="false" status="on">
<parameter key="retry" value="2" />
<parameter key="timeout" value="5000" />
</service>
<monitor service="Citrix-TCP-ICA" class-name="org.opennms.netmgt.poller.monitors.CitrixMonitor" />
5.6.10. DhcpMonitor
This monitor is used to check the availability and functionality of DHCP servers.
The monitor class DhcpMonitor is executed by Pollerd and opens the background process listening for incoming DHCP responses.
A DHCP server is tested by sending a DISCOVER message.
If the DHCP server responds with an OFFER the service is marked as up.
The background listening process is only started if the DhcpMonitor is used.
The behavior for testing the DHCP server can be modified in the poller-configuration.xml
configuration file.
Make sure no DHCP client is running on the OpenNMS Horizon server and using port UDP/67 and UDP/68.
If UDP/67 and UDP/68 are already in use, you will find warning messages in your log files.
You can test if a process is listening on UDP/68 with sudo ss -lnpu sport = :68 .
|
Monitor facts
Class Name |
|
Remote Enabled |
true |
This monitor implements the Common Configuration Parameters.
DhcpMonitor configuration
Parameter |
Description |
Required |
Default value |
|
The MAC address which OpenNMS Horizon uses for a dhcp request |
optional |
|
|
Puts the poller in |
optional |
|
|
This parameter will usually be set to the IP address of the OpenNMS Horizon server,
if |
optional |
|
|
When extendedMode is false, the DHCP poller will send a DISCOVER and expect an OFFER in return. When extendedMode is true, the DHCP poller will first send a DISCOVER. If no valid response is received it will send an INFORM. If no valid response is received it will then send a REQUEST. OFFER, ACK, and NAK are all considered valid responses in extendedMode. |
optional |
|
|
This parameter only applies to REQUEST queries sent to the DHCP server when extendedMode is true. The IP address specified will be requested in the query. |
optional |
|
|
The location to write RRD data. Generally, you will not want to change this from default |
required |
|
|
The name of the RRD file to write (minus the extension, .rrd or .jrb) |
required |
|
|
This is the name as reference for this particular data source in the RRD file |
required |
|
Example testing DHCP server in the same subnet
Example configuration how to configure the monitor in the poller-configuration.xml
.
The monitor will try to send in maximum 3 DISCOVER messages and waits 3 seconds for the DHCP server OFFER message.
poller-configuration.xml
<service name="DHCP" interval="300000" user-defined="false" status="on">
<parameter key="retry" value="2" />
<parameter key="timeout" value="3000" />
<parameter key="relayMode" value="false"/>
<parameter key="extendedMode" value="false"/>
<parameter key="rrd-repository" value="/opt/opennms/share/rrd/response" />
<parameter key="rrd-base-name" value="dhcp" />
<parameter key="ds-name" value="dhcp" />
</service>
<monitor service="DHCP" class-name="org.opennms.netmgt.poller.monitors.DhcpMonitor"/>
Example testing DHCP server in a different subnet in extended mode
You can use the same monitor in poller-configuration.xml
as in the example above.
myIpAddress
.<service name="DHCP" interval="300000" user-defined="false" status="on">
<parameter key="retry" value="2" />
<parameter key="timeout" value="3000" />
<parameter key="relayMode" value="true"/>
<parameter key="extendedMode" value="false"/>
<parameter key="myIpAddress" value="1.2.3.4"/>
<parameter key="rrd-repository" value="/opt/opennms/share/rrd/response" />
<parameter key="rrd-base-name" value="dhcp" />
<parameter key="ds-name" value="dhcp" />
</service>
<monitor service="DHCP" class-name="org.opennms.netmgt.poller.monitors.DhcpMonitor"/>
If in extendedMode , the time required to complete the poll for an unresponsive node is increased by a factor of 3.
Thus it is a good idea to limit the number of retries to a small number.
|
5.6.11. DiskUsageMonitor
The DiskUsageMonitor monitor can be used to test the amount of free space available on certain storages of a node.
The monitor gets information about the available free storage spaces available by inspecting the hrStorageTable of the HOST-RESOURCES-MIB.
A storage’s description (as found in the corresponding hrStorageDescr object) must match the criteria specified by the disk
and match-type
parameters to be monitored.
A storage’s available free space is calculated using the corresponding hrStorageSize and hrStorageUsed objects.
The hrStorageUsed doesn’t account for filesystem reserved blocks (i.e. for the super-user), so DiskUsageMonitor will report the service as
unavailable only when the amount of free disk space is actually lower than free minus the percentage of reserved filesystem blocks.
|
This monitor uses SNMP to accomplish its work. Therefore systems against which it is to be used must have an SNMP agent supporting the HOST-RESOURCES-MIB installed and configured. Most modern SNMP agents, including most distributions of the Net-SNMP agent and the SNMP service that ships with Microsoft Windows, support this MIB. Out-of-box support for HOST-RESOURCES-MIB among commercial Unix operating systems may be somewhat spotty.
Monitor facts
Class Name |
|
Remote Enabled |
false, relies on SNMP configuration. |
Configuration and Usage
Parameter | Description | Required | Default value |
---|---|---|---|
|
A pattern that a storage’s description (hrStorageDescr) must match to be taken into account. |
required |
|
|
The minimum amount of free space that storages matching the criteria must have available. This parameter is evaluated as a percent of the storage’s reported maximum capacity. |
optional |
|
|
The way how the pattern specified by the |
optional |
|
|
Destination port where the SNMP requests shall be sent. |
optional |
|
|
Deprecated.
Same as |
optional |
|
This monitor implements the Common Configuration Parameters.
Examples
<!-- Make sure there's at least 5% of free space available on storages ending with "/home" -->
<service name="DiskUsage-home" interval="300000" user-defined="false" status="on">
<parameter key="timeout" value="3000" />
<parameter key="retry" value="2" />
<parameter key="disk" value="/home" />
<parameter key="match-type" value="endsWith" />
<parameter key="free" value="5" />
</service>
<monitor service="DiskUsage-home" class-name="org.opennms.netmgt.poller.monitors.DiskUsageMonitor" />
DiskUsageMonitor vs thresholds
Storages' available free space can also be monitored using thresholds if you are already collecting these data.
5.6.12. DnsMonitor
This monitor is build to test the availability of the DNS service on remote IP interfaces. The monitor tests the service availability by sending a DNS query for A resource record types against the DNS server to test.
The monitor is marked as up if the DNS Server is able to send a valid response to the monitor. For multiple records it is possible to test if the number of responses are within a given boundary.
The monitor can be simulated with the command line tool host
:
~ % host -v -t a www.google.com 8.8.8.8
Trying "www.google.com"
Using domain server:
Name: 8.8.8.8
Address: 8.8.8.8#53
Aliases:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 9324
;; flags: qr rd ra; QUERY: 1, ANSWER: 5, AUTHORITY: 0, ADDITIONAL: 0
;; QUESTION SECTION:
;www.google.com. IN A
;; ANSWER SECTION:
www.google.com. 283 IN A 74.125.232.17
www.google.com. 283 IN A 74.125.232.20
www.google.com. 283 IN A 74.125.232.19
www.google.com. 283 IN A 74.125.232.16
www.google.com. 283 IN A 74.125.232.18
Received 112 bytes from 8.8.8.8#53 in 41 ms
TIP: This monitor is intended for testing the availability of a DNS service. If you want to monitor the DNS resolution of some of your nodes from a client’s perspective, please use the DNSResolutionMonitor.
Monitor facts
Class Name |
|
Remote Enabled |
true |
Configuration and Usage
Parameter | Description | Required | Default value |
---|---|---|---|
|
Number of retries before the service is marked as down |
optional |
|
|
Time in milliseconds to wait for the A Record response from the server |
optional |
|
|
UDP Port for the DNS server |
optional |
|
|
DNS A Record for lookup test |
optional |
|
|
A comma-separated list of numeric DNS response codes that will be considered fatal if
present in the server’s response. Default value is |
optional |
|
|
Minmal number of records in the DNS server respone for the given lookup |
optional |
|
|
Maximal number of records in the DNS server respone for the given lookup |
optional |
|
This monitor implements the Common Configuration Parameters.
Examples
The given examples shows how to monitor if the IP interface from a given DNS server resolves a DNS request.
This service should be bound to a DNS server which should be able to give a valid DNS respone for DNS request www.google.com.
The service is up if the DNS server gives between 1
and 10
A record responses.
<service name="DNS-www.google.com" interval="300000" user-defined="false" status="on">
<parameter key="lookup" value="www.google.com" />
<parameter key="fatal-response-code" value="2" />
<parameter key="min-answers" value="1" />
<parameter key="max-answers" value="10" />
</service>
<monitor service="DNS-www.google.com" class-name="org.opennms.netmgt.poller.monitors.DnsMonitor" />
5.6.13. DNSResolutionMonitor
The DNS resolution monitor, tests if the node label of an OpenNMS Horizon node can be resolved. This monitor uses the name resolver configuration from the poller configuration or from the operating system where OpenNMS Horizon is running on. It can be used to test a client behavior for a given host name. For example: Create a node with the node label www.google.com and an IP interface. Assigning the DNS resolution monitor on the IP interface will test if www.google.com can be resolved using the DNS configuration defined by the poller. The response from the A record lookup can be any address, it is not verified with the IP address on the OpenNMS Horizon IP interface where the monitor is assigned to. This monitor implements placeholder substitution in parameter values.
Monitor facts
Class Name |
|
Remote Enabled |
true |
Configuration and Usage
Parameter | Description | Required | Default value | Placeholder substitution |
---|---|---|---|---|
|
Type of record for the node label test. |
optional |
|
No |
|
Alternate DNS record types to search for. |
optional |
`` |
No |
|
Alternate DNS record to lookup |
optional |
The node label. |
Yes |
|
The DNS server to query for the records. |
optional |
Use name server from host system running OpenNMS Horizon |
Yes |
This monitor implements the Common Configuration Parameters.
Examples
The following example shows the possibilities monitoring IPv4 and/or IPv6 for the service configuration:
<!-- Assigned service test if the node label is resolved for an A record -->
<service name="DNS-Resolution-v4" interval="300000" user-defined="false" status="on">
<parameter key="retry" value="2"/>
<parameter key="timeout" value="2000"/>
<parameter key="resolution-type" value="v4"/>
<parameter key="rrd-repository" value="/opt/opennms/share/rrd/response"/>
<parameter key="rrd-base-name" value="dns-res-v4"/>
<parameter key="ds-name" value="dns-res-v4"/>
</service>
<!-- Assigned service test if www.google.com is resolved for an A record -->
<service name="DNS-Resolution-v4-lookup" interval="300000" user-defined="false" status="on">
<parameter key="retry" value="2"/>
<parameter key="timeout" value="2000"/>
<parameter key="resolution-type" value="v4"/>
<parameter key="lookup" value="www.google.com"/>
</service>
<!-- Assigned service test if the node label is resolved for an AAAA record using a specific DNS server -->
<service name="DNS-Resolution-v6" interval="300000" user-defined="false" status="on">
<parameter key="retry" value="2"/>
<parameter key="timeout" value="2000"/>
<parameter key="resolution-type" value="v6"/>
<parameter key="rrd-repository" value="/opt/opennms/share/rrd/response"/>
<parameter key="rrd-base-name" value="dns-res-v6"/>
<parameter key="ds-name" value="dns-res-v6"/>
<parameter key="nameserver" value="8.8.8.8"/>
</service>
<!-- Use parameter substitution for nameserver and lookup parameter values -->
<service name="DNS-Resolution-Sub" interval="300000" user-defined="false" status="on">
<parameter key="retry" value="2"/>
<parameter key="timeout" value="2000"/>
<parameter key="resolution-type" value="v6"/>
<parameter key="rrd-repository" value="/opt/opennms/share/rrd/response"/>
<parameter key="rrd-base-name" value="dns-res-v6"/>
<parameter key="ds-name" value="dns-res-v6"/>
<parameter key="nameserver" value="{ipAddr}"/>
<parameter key="lookup" value="{nodeLabel}"/>
</service>
<!-- Assigned service test if the node label is resolved for an AAAA record AND A record -->
<service name="DNS-Resolution-v4-and-v6" interval="300000" user-defined="false" status="on">
<parameter key="retry" value="2"/>
<parameter key="timeout" value="2000"/>
<parameter key="resolution-type" value="both"/>
<parameter key="rrd-repository" value="/opt/opennms/share/rrd/response"/>
<parameter key="rrd-base-name" value="dns-res-both"/>
<parameter key="ds-name" value="dns-res-both"/>
</service>
<!-- Assigned service test if the node label is resolved for an AAAA record OR A record -->
<service name="DNS-Resolution-v4-or-v6" interval="300000" user-defined="false" status="on">
<parameter key="retry" value="2"/>
<parameter key="timeout" value="2000"/>
<parameter key="resolution-type" value="either"/>
<parameter key="rrd-repository" value="/opt/opennms/share/rrd/response"/>
<parameter key="rrd-base-name" value="dns-res-either"/>
<parameter key="ds-name" value="dns-res-either"/>
</service>
<!-- Assigned service test if the node label is resolved for an CNAME record AND MX record -->
<service name="DNS-Resolution-CNAME-and-MX" interval="300000" user-defined="false" status="on">
<parameter key="retry" value="2"/>
<parameter key="timeout" value="2000"/>
<parameter key="record-types" value="CNAME,MX"/>
<parameter key="lookup" value="www.google.comm"/>
<parameter key="rrd-repository" value="/opt/opennms/share/rrd/response"/>
<parameter key="rrd-base-name" value="dns-res-cname-mx"/>
<parameter key="ds-name" value="dns-res-cname-mx"/>
</service>
<monitor service="DNS-Resolution-v4" class-name="org.opennms.netmgt.poller.monitors.DNSResolutionMonitor" />
<monitor service="DNS-Resolution-v4-lookup" class-name="org.opennms.netmgt.poller.monitors.DNSResolutionMonitor" />
<monitor service="DNS-Resolution-v6" class-name="org.opennms.netmgt.poller.monitors.DNSResolutionMonitor" />
<monitor service="DNS-Resolution-Sub" class-name="org.opennms.netmgt.poller.monitors.DNSResolutionMonitor" />
<monitor service="DNS-Resolution-v4-and-v6" class-name="org.opennms.netmgt.poller.monitors.DNSResolutionMonitor" />
<monitor service="DNS-Resolution-v4-or-v6" class-name="org.opennms.netmgt.poller.monitors.DNSResolutionMonitor" />
<monitor service="DNS-Resolution-CNAME-and-MX" class-name="org.opennms.netmgt.poller.monitors.DNSResolutionMonitor" />
To have response time graphs for the name resolution you have to configure RRD graphs for the given ds-names (dns-res-v4
, dns-res-v6
, dns-res-both
, dns-res-either
, dns-res-cname-mx
) in '$OPENNMS_HOME/etc/response-graph.properties'.
DNSResolutionMonitor vs DnsMonitor
The DNSResolutionMonitor is used to measure the availability and record outages of a name resolution from client perspective. The service is mainly used for websites or similar public available resources. It can be used in combination with the Page Sequence Monitor to give a hint if a website isn’t available for DNS reasons.
The DnsMonitor on the other hand is a test against a specific DNS server. In OpenNMS Horizon the DNS server is the node and the DnsMonitor will send a lookup request for a given A record to the DNS server IP address. The service goes down if the DNS server doesn’t have a valid A record in his zone database or as some other issues resolving A records.
5.6.14. FtpMonitor
The FtpMonitor is able to validate ftp connection dial-up processes. The monitor can test ftp server on multiple ports and specific login data.
The service using the FtpMonitor is up if the FTP server responds with return codes between 200 and 299. For special cases the service is also marked as up for 425 and 530.
This monitor implements placeholder substitution in parameter values.
Monitor facts
Class Name |
|
Remote Enabled |
|
Configuration and Usage
Parameter | Description | Required | Default value | Placeholder substitution |
---|---|---|---|---|
|
Number of attempts to get a valid FTP response/response-text |
optional |
|
No |
|
A list of TCP ports to which connection shall be tried. |
optional |
|
No |
|
This parameter is meant to be used together with the |
optional |
|
Yes |
|
This parameter is meant to be used together with the |
optional |
|
Yes |
This monitor implements the Common Configuration Parameters.
Examples
Some example configuration how to configure the monitor in the 'poller-configuration.xml'
<service name="FTP" interval="300000" user-defined="false" status="on">
<parameter key="retry" value="1"/>
<parameter key="timeout" value="3000"/>
<parameter key="port" value="21"/>
<parameter key="userid" value=""/>
<parameter key="password" value=""/>
</service>
<service name="FTP-With-Auth-From-Asset" interval="300000" user-defined="false" status="on">
<parameter key="retry" value="1"/>
<parameter key="timeout" value="3000"/>
<parameter key="port" value="21"/>
<parameter key="userid" value="{username}"/>
<parameter key="password" value="{password}"/>
</service>
<service name="FTP-Customer" interval="300000" user-defined="false" status="on">
<parameter key="retry" value="1"/>
<parameter key="timeout" value="3000"/>
<parameter key="port" value="21"/>
<parameter key="userid" value="Customer"/>
<parameter key="password" value="MySecretPassword"/>
</service>
<monitor service="FTP" class-name="org.opennms.netmgt.poller.monitors.FtpMonitor"/>
<monitor service="FTP-With-Auth-From-Asset" class-name="org.opennms.netmgt.poller.monitors.FtpMonitor"/>
<monitor service="FTP-Customer" class-name="org.opennms.netmgt.poller.monitors.FtpMonitor"/>
Hint
Comment from FtpMonitor source
Also want to accept the following ERROR message generated by some FTP servers following a QUIT command without a previous successful login: "530 QUIT : User not logged in. Please login with USER and PASS first."
Also want to accept the following ERROR message generated by some FTP servers following a QUIT command without a previously successful login: "425 Session is disconnected."
See also: http://tools.ietf.org/html/rfc959
5.6.15. HostResourceSwRunMonitor
This monitor test the running state of one or more processes. It does this via SNMP by inspecting the hrSwRunTable of the HOST-RESOURCES-MIB. The test is done by matching a given process as hrSwRunName against the numeric value of the hrSwRunState.
This monitor uses SNMP to accomplish its work. Therefore systems against which it is to be used must have an SNMP agent installed and configured. Furthermore, the SNMP agent on the system must support the HOST-RESOURCES-MIB. Most modern SNMP agents, including most distributions of the Net-SNMP agent and the SNMP service that ships with Microsoft Windows, support this MIB. Out-of-box support for HOST-RESOURCES-MIB among commercial Unix operating systems may be somewhat spotty.
Monitor facts
Class Name |
|
Remote Enabled |
false |
Configuration and Usage
Parameter | Description | Required | Default value |
---|---|---|---|
|
The port of the SNMP agent of the server to test. |
optional |
|
|
The name of the process to be monitored. This parameter’s value is case-sensitive and is evaluated as an exact match. |
required |
|
|
If the process name appears multiple times in the hrSwRunTable, and this parameter is set to
|
optional |
|
|
The maximum allowable value of hrSWRunStatus among |
optional |
|
|
The numeric object identifier (OID) from which process names are queried. Defaults to
hrSwRunName and should never be changed under normal
circumstances. That said, changing it to hrSwRunParameters ( |
optional |
|
|
The numeric object identifier (OID) from which run status is queried. Defaults to hrSwRunStatus and should never be changed under normal circumstances. |
optional |
|
This monitor implements the Common Configuration Parameters.
Examples
The following example shows how to monitor the process called httpd running on a server using this monitor.
The configuration in poller-configuration.xml
has to be defined as the following:
<service name="Process-httpd" interval="300000" user-defined="false" status="on">
<parameter key="retry" value="3"/>
<parameter key="timeout" value="3000"/>
<parameter key="service-name" value="httpd"/>(1)
<parameter key="run-level" value="3"/>(2)
<parameter key="match-all" value="true"/>(3)
</service>
<monitor service="Process-httpd" class-name="org.opennms.netmgt.poller.monitors.HostResourceSwRunMonitor"/>
1 | Name of the process on the system |
2 | Test the state if the process is in a valid state, i.e. have a run-level no higher than notRunnable(3) |
3 | If the httpd process runs multiple times the test is done for each instance of the process. |
5.6.16. HttpMonitor
The HTTP monitor tests the response of an HTTP server on a specific HTTP 'GET' command. During the poll, an attempt is made to connect on the specified port(s). The monitor can test web server on multiple ports. By default the test is made against port 80, 8080 and 8888. If the connection request is successful, an HTTP 'GET' command is sent to the interface. The response is parsed and a return code extracted and verified. This monitor implements placeholder substitution in parameter values.
Monitor facts
Class Name |
|
Remote Enabled |
true |
Configuration and Usage
Parameter | Description | Required | Default value | Placeholder substitution |
---|---|---|---|---|
|
Authentication credentials to perform basic authentication. |
optional |
|
Yes |
|
Additional headers to be sent along with the request. |
optional |
|
No |
|
Specify the Host header’s value. |
optional |
|
No |
|
If the |
optional |
|
No |
|
This parameter is meant to be used together with the |
optional |
|
Yes |
|
A list of TCP ports to which connection shall be tried. |
optional |
|
No |
|
Number of attempts to get a valid HTTP response/response-text |
optional |
|
No |
|
If the |
optional |
|
No |
|
A comma-separated list of acceptable HTTP response code ranges.
Example: |
optional |
If the |
No |
|
Text to look for in the response body. This will be matched against every line, and it will
be considered a success at the first match. If there is a |
optional |
|
No |
|
URL to be retrieved via the HTTP 'GET' command |
optional |
|
Yes |
|
This parameter is meant to be used together with the |
optional |
|
Yes |
|
Allows you to set the User-Agent HTTP header (see also RFC2616 section 14.43). |
optional |
|
Yes |
|
When set to true, full communication between client and the webserver will be logged
(with a log level of |
optional |
|
No |
This monitor implements the Common Configuration Parameters.
Examples
<!-- Test HTTP service on port 80 only -->
<service name="HTTP" interval="300000" user-defined="false" status="on">
<parameter key="retry" value="2"/>
<parameter key="timeout" value="3000"/>
<parameter key="port" value="80"/>
<parameter key="url" value="/"/>
</service>
<!-- Test for virtual host opennms.com running -->
<service name="OpenNMSdotCom" interval="300000" user-defined="false" status="on">
<parameter key="retry" value="1"/>
<parameter key="timeout" value="3000"/>
<parameter key="port" value="80"/>
<parameter key="host-name" value="opennms.com"/>
<parameter key="url" value="/solutions"/>
<parameter key="response" value="200-202,299"/>
<parameter key="response-text" value="~.*[Cc]onsulting.*"/>
</service>
<!-- Test for instance of OpenNMS 1.2.9 running -->
<service name="OpenNMS-129" interval="300000" user-defined="false" status="on">
<parameter key="retry" value="1"/>
<parameter key="timeout" value="3000"/>
<parameter key="port" value="8080"/>
<parameter key="url" value="/opennms/event/list"/>
<parameter key="basic-authentication" value="admin:admin"/>
<parameter key="response" value="200"/>
</service>
<!-- Test for instance of OpenNMS 1.2.9 with parameter substitution in basic-authentication parameter -->
<service name="OpenNMS-22" interval="300000" user-defined="false" status="on">
<parameter key="retry" value="1"/>
<parameter key="timeout" value="3000"/>
<parameter key="port" value="8080"/>
<parameter key="url" value="/opennms/event/list"/>
<parameter key="basic-authentication" value="{username}:{password}"/>
<parameter key="response" value="200"/>
</service>
<monitor service="HTTP" class-name="org.opennms.netmgt.poller.monitors.HttpMonitor" />
<monitor service="OpenNMSdotCom" class-name="org.opennms.netmgt.poller.monitors.HttpMonitor" />
<monitor service="OpenNMS-129" class-name="org.opennms.netmgt.poller.monitors.HttpMonitor" />
<monitor service="OpenNMS-22" class-name="org.opennms.netmgt.poller.monitors.HttpMonitor" />
Testing filtering proxies with HttpMonitor
In case a filtering proxy server is set up to allow retrieval of some URLs but deny others, the HttpMonitor can be used to verify this behavior.
As an example a proxy server is running on TCP port 3128, and serves http://www.opennms.org/ but never http://www.myspace.com/. To test this behaviour, the HttpMonitor can be configured as the following:
<service name="HTTP-Allow-opennms.org" interval="300000" user-defined="false" status="on">
<parameter key="retry" value="1"/>
<parameter key="timeout" value="3000"/>
<parameter key="port" value="3128"/>
<parameter key="url" value="http://www.opennms.org/"/>
<parameter key="response" value="200-399"/>
</service>
<service name="HTTP-Block-myspace.com" interval="300000" user-defined="false" status="on">
<parameter key="retry" value="1"/>
<parameter key="timeout" value="3000"/>
<parameter key="port" value="3128"/>
<parameter key="url" value="http://www.myspace.com/"/>
<parameter key="response" value="400-599"/>
</service>
<monitor service="HTTP-Allow-opennms.org" class-name="org.opennms.netmgt.poller.monitors.HttpMonitor"/>
<monitor service="HTTP-Block-myspace.com" class-name="org.opennms.netmgt.poller.monitors.HttpMonitor"/>
5.6.17. HttpPostMonitor
If it is required to HTTP POST any arbitrary content to a remote URI, the HttpPostMonitor can be used. A use case is to HTTP POST to a SOAP endpoint. This monitor implements placeholder substitution in parameter values.
Monitor facts
Class Name |
|
Remote Enabled |
|
Configuration and Usage
Parameter | Description | Required | Default value | Placeholder substitution |
---|---|---|---|---|
|
The body of the POST, for example properly escaped XML. |
required |
|
No |
|
The password to use for HTTP BASIC auth. |
optional |
|
Yes |
|
The username to use for HTTP BASIC auth. |
optional |
|
Yes |
|
Additional headers to be sent along with the request. Example of valid
parameter’s names are |
optional |
|
No |
|
A string that is matched against the response of the HTTP POST.
If the output contains the banner, the service is determined as up.
Specify a regex by starting with |
optional |
|
Yes |
|
Set the character set for the POST. |
optional |
|
No |
|
Set the mimetype for the POST. |
optional |
|
No |
|
The port for the web server where the POST is send to. |
optional |
|
No |
|
The connection scheme to use. |
optional |
|
No |
|
Enables or disables the SSL ceritificate validation. |
optional |
|
No |
|
The uri to use during the POST. |
optional |
|
Yes |
|
Should the system wide proxy settings be used? The system proxy settings can be configured in system properties |
optional |
|
No |
This monitor implements the Common Configuration Parameters.
Examples
The following example would create a POST that contains the payload Word.
<service name="MyServlet" interval="300000" user-defined="false" status="on">
<parameter key="banner" value="Hello"/>
<parameter key="port" value="8080"/>
<parameter key="uri" value="/MyServlet">
<parameter key="payload" value="World"/>
<parameter key="retry" value="1"/>
<parameter key="timeout" value="30000"/>
</service>
<monitor service="MyServlet" class-name="org.opennms.netmgt.poller.monitors.HttpPostMonitor"/>
The resulting POST looks like this:
POST /MyServlet HTTP/1.1
Content-Type: text/xml; charset=utf-8
Host: <ip_addr_of_interface>:8080
Connection: Keep-Alive
World
5.6.18. HttpsMonitor
The HTTPS monitor tests the response of an SSL-enabled HTTP server. The HTTPS monitor is an SSL-enabled extension of the HTTP monitor with a default TCP port value of 443. All HttpMonitor parameters apply, so please refer to HttpMonitor’s documentation for more information. This monitor implements placeholder substitution in parameter values.
Monitor facts
Class Name |
|
Remote Enabled |
true |
Configuration and Usage
Parameter | Description | Required | Default value |
---|---|---|---|
|
A list of TCP ports to which connection shall be tried. |
optional |
|
Examples
<!-- Test HTTPS service on port 8443 -->
<service name="HTTPS" interval="300000" user-defined="false" status="on">
<parameter key="retry" value="2"/>
<parameter key="timeout" value="3000"/>
<parameter key="port" value="8443"/>
<parameter key="url" value="/"/>
</service>
<monitor service="HTTPS" class-name="org.opennms.netmgt.poller.monitors.HttpsMonitor" />
5.6.19. IcmpMonitor
The ICMP monitor tests for ICMP service availability by sending echo request ICMP messages. The service is considered available when the node sends back an echo reply ICMP message within the specified amount of time.
Monitor facts
Class Name |
|
Remote Enabled |
true with some restrictions (see below) |
Configuration and Usage
Parameter | Description | Required | Default value |
---|---|---|---|
|
Time in milliseconds to wait for a response. |
optional |
|
|
Whether to set the "Don’t Fragment" bit on outgoing packets |
optional |
|
|
DSCP traffic-control value. |
optional |
|
|
Number of bytes of the ICMP packet to send. |
optional |
|
|
Enables ICMP thresholding. |
optional |
|
This monitor implements the Common Configuration Parameters.
Examples
<service name="ICMP" interval="300000" user-defined="false" status="on">
<parameter key="retry" value="2"/>
<parameter key="timeout" value="3000"/>
<parameter key="rrd-repository" value="/var/lib/opennms/rrd/response"/>
<parameter key="rrd-base-name" value="icmp"/>
<parameter key="ds-name" value="icmp"/>
</service>
<monitor service="ICMP" class-name="org.opennms.netmgt.poller.monitors.IcmpMonitor"/>
<!-- Advanced example: set DSCP bits and send a large packet with allow-fragmentation=false -->
<service name="ICMP" interval="300000" user-defined="false" status="on">
<parameter key="retry" value="2"/>
<parameter key="timeout" value="3000"/>
<parameter key="dscp" value="0x1C"/> <!-- AF32: Class 3, Medium drop probability -->
<parameter key="allow-fragmentation" value="false"/>
<parameter key="packet-size" value="2048"/>
<parameter key="rrd-repository" value="/var/lib/opennms/rrd/response"/>
<parameter key="rrd-base-name" value="icmp"/>
<parameter key="ds-name" value="icmp"/>
</service>
<monitor service="ICMP" class-name="org.opennms.netmgt.poller.monitors.IcmpMonitor"/>
5.6.20. ImapMonitor
This monitor checks if an IMAP server is functional. The test is done by initializing a very simple IMAP conversation. The ImapMonitor establishes a TCP connection, sends a logout command and test the IMAP server responses.
The behavior can be simulated with telnet
:
telnet mail.myserver.de 143 Trying 62.108.41.197... Connected to mail.myserver.de. Escape character is '^]'. * OK [CAPABILITY IMAP4rev1 LITERAL+ SASL-IR LOGIN-REFERRALS ID ENABLE IDLE STARTTLS LOGINDISABLED] Dovecot ready. (1) ONMSPOLLER LOGOUT (2) * BYE Logging out (3) ONMSPOLLER OK Logout completed. Connection closed by foreign host.
1 | Test IMAP server banner, it has to start * OK to be up |
2 | Sending a ONMSPOLLER LOGOUT |
3 | Test server responds with, it has to start with * BYE to be up |
If one of the tests in the sample above fails the service is marked down.
Monitor facts
Class Name |
|
Remote Enabled |
false |
Configuration and Usage
Parameter | Description | Required | Default value |
---|---|---|---|
|
Number of attempts to get a valid IMAP response |
optional |
|
|
The port of the IMAP server. |
optional |
|
This monitor implements the Common Configuration Parameters.
Examples
Some example configuration how to configure the monitor in the poller-configuration.xml
<!-- Test IMAP service on port 143 only -->
<service name="IMAP" interval="300000" user-defined="false" status="on">
<parameter key="retry" value="1"/>
<parameter key="port" value="143"/>
<parameter key="timeout" value="3000"/>
</service>
<monitor service="IMAP" class-name="org.opennms.netmgt.poller.monitors.ImapMonitor" />
5.6.21. ImapsMonitor
The IMAPS monitor tests the response of an SSL-enabled IMAP server. The IMAPS monitor is an SSL-enabled extension of the IMAP monitor with a default TCP port value of 993. All ImapMonitor parameters apply, so please refer to ImapMonitor’s documentation for more information.
Monitor facts
Class Name |
|
Remote Enabled |
true |
Configuration and Usage
Parameter | Description | Required | Default value |
---|---|---|---|
|
The destination port where connections shall be attempted. |
optional |
|
This monitor implements the Common Configuration Parameters.
Examples
<!-- IMAPS service at OpenNMS.org is on port 9993 -->
<service name="IMAPS" interval="300000" user-defined="false" status="on">
<parameter key="port" value="9993"/>
<parameter key="version" value="3"/>
<parameter key="retry" value="2"/>
<parameter key="timeout" value="3000"/>
<parameter key="rrd-repository" value="/var/lib/opennms/rrd/response"/>
<parameter key="rrd-base-name" value="imaps"/>
<parameter key="ds-name" value="imaps"/>
</service>
<monitor service="IMAPS" class-name="org.opennms.netmgt.poller.monitors.ImapsMonitor" />
5.6.22. JCifsMonitor
This monitor allows to test a file sharing service based on the CIFS/SMB protocol. This monitor implements placeholder substitution in parameter values.
This monitor is not installed by default.
You have to install opennmms-plugin-protocol-cifs from your OpenNMS Horizon installation repository.
|
With the JCIFS monitor you have different possibilities to test the availability of the JCIFS service:
With the JCifsMonitor it is possible to run tests for the following use cases:
-
share is available in the network
-
a given file exists in the share
-
a given folder exists in the share
-
a given folder should contain at least one (1) file
-
a given folder folder should contain no (0) files
-
by testing on files and folders, you can use a regular expression to ignore specific file and folder names from the test
A network resource in SMB like a file or folder is addressed as a UNC Path.
\\server\share\folder\file.txt
The Java implementation jCIFS, which implements the CIFS/SMB network protocol, uses SMB URLs to access the network resource. The same resource as in our example would look like this as an SMB URL:
smb://workgroup;user:password@server/share/folder/file.txt
The JCifsMonitor can not test:
-
file contains specific content
-
a specific number of files in a folder, for example folder should contain exactly / more or less than x files
-
Age or modification time stamps of files or folders
-
Permissions or other attributes of files or folders
Monitor facts
Class Name |
|
Remote Enabled |
false |
Configuration and Usage
Parameter | Description | Required | Default value | Placeholder substitution |
---|---|---|---|---|
|
Number of retries before the service is marked as down. |
optional |
|
No |
|
Windows domain where the user is located. You don’t have to use the domain parameter if you use local user accounts. |
optional |
empty String |
Yes |
|
Username to access the resource over a network |
optional |
empty String |
Yes |
|
Password for the user |
optional |
empty String |
Yes |
|
Path to the resource you want to test |
required |
empty String |
No |
|
The test mode which has the following options |
optional |
|
No |
|
Override the IP address of the SMB url to check shares on different file servers. |
optional |
empty String |
No |
|
Ignore specific files in folder with regular expression. This parameter will just be applied on
|
optional |
|
No |
Due to limitations in the JCifs library, only global timeouts can be used reliably. |
This monitor implements the Common Configuration Parameters.
It makes little sense to have retries higher than 1 .
It is a waste of resources during the monitoring.
|
Please consider, if you are accessing shares with Mac OSX you have some side effects with the hidden file '.DS_Store.'
It could give you false positives in monitoring, you can use then the folderIgnoreFiles parameter.
|
Example test existence of a file
This example shows how to configure the JCifsMonitor to test if a file share is available over a network. For this example we have access to a share for error logs and we want to get an outage if we have any error log files in our folder. The share is named log. The service should go back to normal if the error log file is deleted and the folder is empty.
<service name="CIFS-ErrorLog" interval="30000" user-defined="true" status="on">
<parameter key="retry" value="1" />
<parameter key="timeout" value="3000" />
<parameter key="domain" value="contoso" />(1)
<parameter key="username" value="MonitoringUser" />(2)
<parameter key="password" value="MonitoringPassword" />(3)
<parameter key="path" value="/fileshare/log/" />(4)
<parameter key="mode" value="folder_empty" />(5)
</service>
<monitor service="CIFS-ErrorLog" class-name="org.opennms.netmgt.poller.monitors.JCifsMonitor" />
1 | Name of the SMB or Microsoft Windows Domain |
2 | User for accessing the share |
3 | Password for accessing the share |
4 | Path to the folder inside of the share as part of the SMB URL |
5 | Mode is set to folder_empty |
5.6.23. JDBCMonitor
The JDBCMonitor checks that it is able to connect to a database and checks if it is able to get the database catalog from that database management system (DBMS). It is based on the JDBC technology to connect and communicate with the database. This monitor implements placeholder substitution in parameter values.
Monitor facts
Class Name |
|
Remote Enabled |
true |
Configuration and Usage
Parameter | Description | Required | Default value | Placeholder substitution |
---|---|---|---|---|
|
JDBC driver class to use |
required |
|
No |
|
JDBC Url to connect to. |
required |
|
Yes |
|
Database user |
required |
|
Yes |
|
Database password |
required |
|
Yes |
|
How many retries should be performed before failing the test |
optional |
|
No |
The OPENNMS_JDBC_HOSTNAME is replaced in the url parameter with the IP or resolved hostname of the interface the monitored service is assigned to. |
This monitor implements the Common Configuration Parameters.
Provide the database driver
The JDBCMonitor is based on JDBC and requires a JDBC driver to communicate with any database.
Due to the fact that OpenNMS Horizon itself uses a PostgreSQL database, the PostgreSQL JDBC driver is available out of the box.
For all other database systems a compatible JDBC driver has to be provided to OpenNMS Horizon as a jar-file.
To provide a JDBC driver place the driver-jar in the opennms/lib
folder of your OpenNMS Horizon.
Examples
The following example checks if the PostgreSQL database used by OpenNMS Horizon is available.
<service name="OpenNMS-DBMS" interval="30000" user-defined="true" status="on">
<parameter key="driver" value="org.postgresql.Driver"/>
<parameter key="url" value="jdbc:postgresql://OPENNMS_JDBC_HOSTNAME:5432/opennms"/>
<parameter key="user" value="opennms"/>
<parameter key="password" value="opennms"/>
</service>
<monitor service="OpenNMS-DBMS" class-name="org.opennms.netmgt.poller.monitors.JDBCMonitor" />
5.6.24. JDBCStoredProcedureMonitor
The JDBCStoredProcedureMonitor checks the result of a stored procedure in a remote database. The result of the stored procedure has to be a boolean value (representing true or false). The service associated with this monitor is marked as up if the stored procedure returns true and it is marked as down in all other cases. It is based on the JDBC technology to connect and communicate with the database. This monitor implements placeholder substitution in parameter values.
Monitor facts
Class Name |
|
Remote Enabled |
false |
Configuration and Usage
Parameter | Description | Required | Default value | Placeholder substitution |
---|---|---|---|---|
|
JDBC driver class to use |
required |
|
No |
|
JDBC Url to connect to. |
required |
|
Yes |
|
Database user |
required |
|
Yes |
|
Database password |
required |
|
Yes |
|
How many retries should be performed before failing the test |
optional |
|
No |
|
Name of the database stored procedure to call |
required |
|
No |
|
Name of the database schema in which the stored procedure is |
optional |
|
No |
The OPENNMS_JDBC_HOSTNAME is replaced in the url parameter with the IP or resolved hostname of the interface the monitored service is assigned to. |
This monitor implements the Common Configuration Parameters.
Provide the database driver
The JDBCStoredProcedureMonitor is based on JDBC and requires a JDBC driver to communicate with any database.
Due to the fact that OpenNMS Horizon itself uses a PostgreSQL database, the PostgreSQL JDBC driver is available out of the box.
For all other database systems a compatible JDBC driver has to be provided to OpenNMS Horizon as a jar-file.
To provide a JDBC driver place the driver-jar in the opennms/lib
folder of your OpenNMS Horizon.
Examples
The following example checks a stored procedure added to the PostgreSQL database used by OpenNMS Horizon. The stored procedure returns true as long as less than 250000 events are in the events table of OpenNMS Horizon.
CREATE OR REPLACE FUNCTION eventlimit_sp() RETURNS boolean AS
$BODY$DECLARE
num_events integer;
BEGIN
SELECT COUNT(*) into num_events from events;
RETURN num_events > 250000;
END;$BODY$
LANGUAGE plpgsql VOLATILE NOT LEAKPROOF
COST 100;
<service name="OpenNMS-DB-SP-Event-Limit" interval="300000" user-defined="true" status="on">
<parameter key="driver" value="org.postgresql.Driver"/>
<parameter key="url" value="jdbc:postgresql://OPENNMS_JDBC_HOSTNAME:5432/opennms"/>
<parameter key="user" value="opennms"/>
<parameter key="password" value="opennms"/>
<parameter key="stored-procedure" value="eventlimit_sp"/>
<parameter key="schema" value="public"/>
</service>
<monitor service="OpenNMS-DB-SP-Event-Limit" class-name="org.opennms.netmgt.poller.monitors.JDBCStoredProcedureMonitor"/>
5.6.25. JDBCQueryMonitor
The JDBCQueryMonitor runs an SQL query against a database and is able to verify the result of the query. A read-only connection is used to run the SQL query, so the data in the database is not altered. It is based on the JDBC technology to connect and communicate with the database. This monitor implements placeholder substitution in parameter values.
Monitor facts
Class Name |
|
Remote Enabled |
false |
Configuration and Usage
Parameter | Description | Required | Default value | Placeholder substitution |
---|---|---|---|---|
|
JDBC driver class to use |
required |
|
No |
|
JDBC URL to connect to |
required |
|
Yes |
|
Database user |
required |
|
Yes |
|
Database password |
required |
|
Yes |
|
The SQL query to run |
required |
|
No |
|
What evaluation action to perform |
required |
|
No |
|
The result column to evaluate against when using compare_string method |
required |
|
No |
|
Operator to use for the evaluation |
required |
|
No |
|
The operand to compare against the SQL query result |
required |
depends on the action |
No |
|
The message to use if the service is down. Both operands and the operator are added to the message too. |
optional |
generic message depending on the action |
No |
|
How many retries should be performed before failing the test |
optional |
|
No |
The OPENNMS_JDBC_HOSTNAME is replaced in the url parameter with the IP or resolved hostname of the interface the monitored service is assigned to. |
This monitor implements the Common Configuration Parameters.
Parameter | Description | Default operand |
---|---|---|
|
The number of returned rows is compared, not a value of the resulting rows |
|
|
Strings are always checked for equality with the operand |
|
|
An integer from a column of the first result row is compared |
|
Parameter | XML entity to use in XML configs |
---|---|
|
|
|
|
|
|
|
|
|
|
|
|
Evaluating the action - operator - operand
Only the first result row returned by the SQL query is evaluated. The evaluation can be against the value of one column or the number of rows returned by the SQL query.
Provide the database driver
The JDBCQueryMonitor is based on JDBC and requires a JDBC driver to communicate with any database.
Due to the fact that OpenNMS Horizon itself uses a PostgreSQL database, the PostgreSQL JDBC driver is available out of the box.
For all other database systems a compatible JDBC driver has to be provided to OpenNMS Horizon as a jar-file.
To provide a JDBC driver place the driver-jar in the opennms/lib
folder of your OpenNMS Horizon.
Examples
Row Count
The following example checks if the number of events in the OpenNMS Horizon database is fewer than 250,000.
<service name="OpenNMS-DB-Event-Limit" interval="30000" user-defined="true" status="on">
<parameter key="driver" value="org.postgresql.Driver"/>
<parameter key="url" value="jdbc:postgresql://OPENNMS_JDBC_HOSTNAME:5432/opennms"/>
<parameter key="user" value="opennms"/>
<parameter key="password" value="opennms"/>
<parameter key="query" value="select eventid from events" />
<parameter key="action" value="row_count" />
<parameter key="operand" value="250000" />
<parameter key="operator" value="<" />
<parameter key="message" value="too many events in OpenNMS database" />
</service>
<monitor service="OpenNMS-DB-Event-Limit" class-name="org.opennms.netmgt.poller.monitors.JDBCQueryMonitor" />
String Comparison
The following example checks if the queried string matches against a defined operand.
<service name="MariaDB-Galera" interval="300000" user-defined="false" status="on">
<parameter key="driver" value="org.mariadb.jdbc.Driver"/>
<parameter key="user" value="opennms"/>
<parameter key="password" value="********"/>
<parameter key="url" value="jdbc:mysql://OPENNMS_JDBC_HOSTNAME"/>
<parameter key="query" value="SELECT VARIABLE_VALUE FROM INFORMATION_SCHEMA.GLOBAL_STATUS WHERE VARIABLE_NAME = 'wsrep_cluster_status'"/>
<parameter key="column" value="VARIABLE_VALUE"/>
<parameter key="action" value="compare_string"/>
<parameter key="operator" value="="/>
<parameter key="operand" value="Primary"/>
<parameter key="message" value="Galera Node is not in primary component"/>
</service>
<monitor service="MariaDB-Galera" class-name="org.opennms.netmgt.poller.monitors.JDBCQueryMonitor" />
5.6.26. JmxMonitor
The JMX monitor allows to test service availability of Java applications. The monitor offers the following functionalities:
-
test the application’s connectivity via JMX
-
existence of management beans
-
test the status of a single or multiple management beans and evaluate their value
Monitor facts
Class Name |
|
Remote Enabled |
|
Configuration and Usage
Parameter | Description | Required | Default value |
---|---|---|---|
|
Number of attempts to get a response |
optional |
|
|
Time in milliseconds to wait for a response |
optional |
|
|
Destination port where the JMX requests shall be sent |
optional |
from |
|
Set this to |
optional |
|
|
Protocol used in the JMX connection string |
optional |
|
|
Path used in JMX connection string |
optional |
|
|
RMI port |
optional |
|
|
Use an alternative JMX URL scheme |
optional |
|
|
Defines a mbeans objectname to access. The ´<variable>´ name is arbitrary. |
optional |
|
|
Tests a mbeans attribute value. The ´<variable>´ name is arbitrary. |
optional |
|
Examples
<service name="JMX-Connection-Test" interval="300000" user-defined="false" status="on">
<parameter key="retry" value="3"/>
<parameter key="timeout" value="3000"/>
<parameter key="port" value="18980"/>
</service>
<monitor service="JMX-Connection-Test" class-name="org.opennms.netmgt.poller.monitors.JmxMonitor"/>
<service name="JMX-BeanValue-Test" interval="300000" user-defined="false" status="on">
<parameter key="retry" value="3"/>
<parameter key="timeout" value="3000"/>
<parameter key="port" value="18980"/>
<parameter key="beans.connected" value="org.opennms.workflow:name=client.onms.connected"/>
<parameter key="tests.isConnected" value="connected.get("Value") == true"/>
</service>
<monitor service="JMX-BeanValue-Test" class-name="org.opennms.netmgt.poller.monitors.Jsr160Monitor"/>
Reserved XML characters like >, <, " need to be escaped. |
5.6.27. JolokiaBeanMonitor
The JolokiaBeanMonitor is a JMX monitor specialized for the use with the Jolokia framework. If it is required to execute a method via JMX or poll an attribute via JMX, the JolokiaBeanMonitor can be used. It requires a fully installed and configured Jolokia agent to be deployed in the JVM container. If required it allows attribute names, paths, and method parameters to be provided additional arguments to the call. To determine the status of the service the JolokiaBeanMonitor relies on the output to be matched against a banner. If the banner is part of the output the status is interpreted as up. If the banner is not available in the output the status is determined as down. Banner matching supports regular expression and substring match. This monitor implements placeholder substitution in parameter values.
Monitor facts
Class Name |
|
Remote Enabled |
false |
Configuration and Usage
Parameter | Description | Required | Default value | Placeholder substitution |
---|---|---|---|---|
|
The bean name to query against. |
required |
|
No |
|
The name of the JMX attribute to scrape. |
optional ( |
|
No |
|
The attribute path. |
optional |
|
No |
|
The username to use for HTTP BASIC auth. |
optional |
|
Yes |
|
The password to use for HTTP BASIC auth. |
optional |
|
Yes |
|
A string that is match against the output of the system-call. If the output contains the banner,
the service is determined as up. Specify a regex by starting with |
optional |
|
Yes |
|
Method input |
optional |
|
Yes |
|
Method input |
optional |
|
Yes |
|
The name of the bean method to execute, output will be compared to banner. |
optional ( |
|
Yes |
|
The port of the jolokia agent. |
optional |
|
No |
|
The jolokia agent url. Defaults to "http://<ipaddr>:<port>/jolokia" |
optional |
|
Yes |
This monitor implements the Common Configuration Parameters.
Variable | Description |
---|---|
|
IP-address of the interface the service is bound to. |
|
Port the service it bound to. |
Examples
Some example configuration how to configure the monitor in the poller-configuration.xml
<parameter key="url" value="http://${ipaddr}:${port}/jolokia"/>
<parameter key="url" value="https://${ipaddr}:${port}/jolokia"/>
AttrName vs MethodName
The JolokiaBeanMonitor has two modes of operation. It can either scrape an attribute from a bean, or execute a method and compare output to a banner. The method execute is useful when your application has its own test methods that you would like to trigger via OpenNMS Horizon.
The args to execute a test method called "superTest" that take in a string as input would look like this:
<parameter key="beanname" value="MyBean" />
<parameter key="methodname" value="superTest" />
<parameter key="input1" value="someString"/>
The args to scrape an attribute from the same bean would look like this:
<parameter key="beanname" value="MyBean" />
<parameter key="attrname" value="upTime" />
5.6.28. LdapMonitor
The LDAP monitor tests for LDAP service availability. The LDAP monitor first tries to establish a TCP connection on the specified port. Then, if it succeeds, it will attempt to establish an LDAP connection and do a simple search. If the search returns a result within the specified timeout and attempts, the service will be considered available. The scope of the LDAP search is limited to the immediate subordinates of the base object. The LDAP search is anonymous by default. The LDAP monitor makes use of the com.novell.ldap.LDAPConnection class. This monitor implements placeholder substitution in parameter values.
Monitor facts
Class Name |
|
Remote Enabled |
true |
Configuration and Usage
Parameter | Description | Required | Default value | Placeholder substitution |
---|---|---|---|---|
|
The distinguished name to use if authenticated search is needed. |
optional |
|
Yes |
|
The password to use if authenticated search is needed. |
optional |
|
Yes |
|
The destination port where connection shall be attempted. |
optional |
|
No |
|
Number of attempts to get a search result. |
optional |
|
No |
|
The base distinguished name to search from. |
optional |
|
No |
|
The LDAP search’s filter. |
optional |
|
No |
|
The version of the LDAP protocol to use, specified as an integer. Note: Only LDAPv3 is supported at the moment. |
optional |
|
No |
This monitor implements the Common Configuration Parameters.
Examples
<!-- OpenNMS.org -->
<service name="LDAP" interval="300000" user-defined="false" status="on">
<parameter key="port" value="389"/>
<parameter key="version" value="3"/>
<parameter key="searchbase" value="dc=opennms,dc=org"/>
<parameter key="searchfilter" value="uid=ulf"/>
<parameter key="retry" value="2"/>
<parameter key="timeout" value="3000"/>
<parameter key="rrd-repository" value="/var/lib/opennms/rrd/response"/>
<parameter key="rrd-base-name" value="ldap"/>
<parameter key="ds-name" value="ldap"/>
</service>
<monitor service="LDAP" class-name="org.opennms.netmgt.poller.monitors.LdapMonitor"/>
5.6.29. LdapsMonitor
The LDAPS monitor tests the response of an SSL-enabled LDAP server. The LDAPS monitor is an SSL-enabled extension of the LDAP monitor with a default TCP port value of 636. All LdapMonitor parameters apply, so please refer to LdapMonitor’s documentation for more information. This monitor implements the same placeholder substitution in parameter values as LdapMonitor.
Monitor facts
Class Name |
|
Remote Enabled |
true |
Configuration and Usage
Parameter | Description | Required | Default value |
---|---|---|---|
|
The destination port where connections shall be attempted. |
optional |
|
This monitor implements the Common Configuration Parameters.
Examples
<!-- LDAPS service at OpenNMS.org is on port 6636 -->
<service name="LDAPS" interval="300000" user-defined="false" status="on">
<parameter key="port" value="6636"/>
<parameter key="version" value="3"/>
<parameter key="searchbase" value="dc=opennms,dc=org"/>
<parameter key="searchfilter" value="uid=ulf"/>
<parameter key="retry" value="2"/>
<parameter key="timeout" value="3000"/>
<parameter key="rrd-repository" value="/var/lib/opennms/rrd/response"/>
<parameter key="rrd-base-name" value="ldaps"/>
<parameter key="ds-name" value="ldaps"/>
</service>
<monitor service="LDAPS" class-name="org.opennms.netmgt.poller.monitors.LdapsMonitor" />
5.6.30. MailTransportMonitor
MailTransportMonitor is used to run a synthetic test of a complete email transaction, including sending a mail and determining that it has been delivered. It can also use both sendmail-test and readmail-test independently to determine whether an email can be sent or a mailbox can be read.
Monitor facts
Class Name |
|
Remote Enabled |
true |
Configuration and Usage
Parameter | Description | Required | Default value |
---|---|---|---|
|
Defines the test for sending mail. Contains sendmail-host, sendmail-protocol, sendmail-message, and user-auth |
optional |
|
|
Defines the test for reading mail. Contains readmail-host, readmail-protocol, and user-auth. |
optional |
|
Attribute | Description | Required | Default value |
---|---|---|---|
|
Show additional debug output |
optional |
|
|
Whether to use authentication, in the event it is required |
optional |
|
|
Use the JavaMail Mail Transport Agent |
optional |
|
|
Interval in ms between send attempts |
optional |
|
Attribute | Description | Required | Default value |
---|---|---|---|
|
The SMTP server address for sending mail |
optional |
|
|
The SMTP server port |
optional |
|
Attribute | Description | Required | Default value |
---|---|---|---|
|
Set the character set |
optional |
|
|
Use smtpsend or an alternate mailer |
optional |
|
|
Set the message content-type |
optional |
|
|
Set the message encoding |
optional |
|
|
If set to false, the QUIT command is sent and the connection is immediately closed. If set to true (the default), causes the transport to wait for the response to the QUIT command. |
optional |
|
|
The transport protocol to use. One of: |
optional |
|
|
Use SSL or not |
optional |
|
|
Use the STARTTLS command (if supported or required by the server) to switch the connection to a TLS-protected connection before issuing any login commands |
optional |
|
Attribute | Description | Required | Default value |
---|---|---|---|
|
The destination address |
optional |
|
|
The address to insert into the From: field |
optional |
|
|
The message subject |
optional |
|
|
The body of the message |
optional |
|
Attribute | Description | Required | Default value |
---|---|---|---|
|
Show additional debug output |
optional |
|
|
The folder or IMAP label to check for mail |
optional |
|
|
Substring match in email subjects when looking for a specific email |
optional |
|
|
Attempt to read email after this many milliseconds have passed, also used for retry interval |
optional |
|
|
Delete all read mail after a successful match |
optional |
|
Attribute | Description | Required | Default value |
---|---|---|---|
|
The target host for reading mail |
optional |
|
|
The appropriate port for the protocol |
optional |
|
Attribute | Description | Required | Default value |
---|---|---|---|
|
The transport protocol to use. One of: |
optional |
|
|
Whether to enable SSL for the connection |
optional |
|
|
Use the STARTTLS command (if supported or required by the server) to switch the connection to a TLS-protected connection before issuing any login commands |
optional |
|
Attribute | Description | Required | Default value |
---|---|---|---|
|
The user name for SMTP, POP, or IMAP authentication |
optional |
|
|
The password for SMTP, POP, or IMAP authentication |
optional |
|
Variable | Description |
---|---|
|
This value will be substituted with the IP address of the interface on which the monitored service appears |
Examples
Test for an end-to-end email transaction.
<service name="MTM" interval="300000" user-defined="false" status="on">
<parameter key="mail-transport-test">
<mail-transport-test>
<mail-test>
<sendmail-test attempt-interval="30000" use-authentication="false" use-jmta="false" debug="false" >
<sendmail-host host="${ipaddr}" port="25" />
<sendmail-protocol mailer="smtpsend" />
<sendmail-message to="opennms@gmail.com" subject="OpenNMS Test Message"
body="This is an OpenNMS test message." />
<user-auth user-name="opennms" password="roolz" />
</sendmail-test>
<readmail-test attempt-interval="5000" subject-match="OpenNMS Test Message" mail-folder="OPENNMS" debug="false" >
<readmail-host host="imap.gmail.com" port="993">
<readmail-protocol ssl-enable="true" start-tls="false" transport="imaps" />
</readmail-host>
<user-auth user-name="opennms@gmail.com" password="opennms"/>
</readmail-test>
</mail-test>
</mail-transport-test>
</parameter>
<parameter key="rrd-repository" value="/opt/opennms/share/rrd/response"/>
<parameter key="ds-name" value="mtm_lat"/>
<parameter key="retry" value="20" />
</service>
Test that we can connect via IMAPS and open the OPENNMS folder.
<service name="MTM-Readmail" interval="300000" user-defined="false" status="on">
<parameter key="mail-transport-test">
<mail-transport-test>
<mail-test>
<readmail-test attempt-interval="5000" mail-folder="OPENNMS" debug="false" >
<readmail-host host="imap.gmail.com" port="993">
<readmail-protocol ssl-enable="true" start-tls="false" transport="imaps" />
</readmail-host>
<user-auth user-name="opennms@gmail.com" password="opennms"/>
</readmail-test>
</mail-test>
</mail-transport-test>
</parameter>
<parameter key="rrd-repository" value="/opt/opennms/share/rrd/response"/>
<parameter key="ds-name" value="rdmail_lat"/>
</service>
Tests
There are five basic tests that this monitor can perform.
Sending Mail: The most basic test, the sendmail-test is highly configurable. An exception thrown during the sending of the configured email message will cause the poll to fail.
Access of Mail Store and Folder: Configure a readmail-test and don’t configure a subject-match attribute. This will test only the ability to open the default mail store and the configured mail folder ("INBOX") by default. Folders are given by "INBOX<separator>Foldername"; separator character might vary between IMAP implementations. Exchange uses "/" as separator, for example.
Specific Message in Folder: Configure a readmail-test and a matching subject. Optionally configure the test to delete all read mail. Probably not something you want to do unless this is a mail folder that you are sending email to from another system that you can’t do in the end-to-end test behavior.
Sending and Receipt (end-to-end test) of a Message: Test your infrastructure’s ability to send and receive email. It tests sending and receiving of an mail message via one or two separate mail servers. For example, you can send email via SMTPS to one server outside of your organization addressed to a recipient on your internal mail server and verify delivery.
5.6.31. MemcachedMonitor
This monitor allows to monitor Memcached, a distributed memory object caching system. To monitor the service availability the monitor tests if the Memcached statistics can be requested. The statistics are processed and stored in RRD files. The following metrics are collected:
Metric | Description |
---|---|
uptime |
Seconds the Memcached server has been running since last restart. |
rusageuser |
User time seconds for the server process. |
rusagesystem |
System time seconds for the server process. |
curritems |
Number of items in this servers cache. |
totalitems |
Number of items stored on this server. |
bytes |
Number of bytes currently used for caching items. |
limitmaxbytes |
Maximum configured cache size. |
currconnections |
Number of open connections to this Memcached. |
totalconnections |
Number of successful connect attempts to this server since start. |
connectionstructure |
Number of internal connection handles currently held by the server. |
cmdget |
Number of GET commands received since server startup. |
cmdset |
Number of SET commands received since server startup. |
gethits |
Number of successful GET commands (cache hits) since startup. |
getmisses |
Number of failed GET requests, because nothing was cached. |
evictions |
Number of objects removed from the cache to free up memory. |
bytesread |
Number of bytes received from the network. |
byteswritten |
Number of bytes send to the network. |
threads |
Number of threads used by this server. |
Monitor facts
Class Name |
|
Remote Enabled |
true |
Configuration and Usage
Parameter | Description | Required | Default value |
---|---|---|---|
|
Number of attempts to establish the Memcached connnection. |
optional |
|
|
TCP port connecting to Memcached. |
optional |
|
This monitor implements the Common Configuration Parameters.
Examples
The following example shows a configuration in the poller-configuration.xml
.
<service name="Memcached" interval="300000" user-defined="false" status="on">
<parameter key="port" value="11211" />
<parameter key="retry" value="2" />
<parameter key="timeout" value="3000" />
<parameter key="rrd-repository" value="/opt/opennms/share/rrd/response" />
<parameter key="ds-name" value="memcached" />
<parameter key="rrd-base-name" value="memcached" />
</service>
<monitor service="Memcached" class-name="org.opennms.netmgt.poller.monitors.MemcachedMonitor" />
5.6.32. NetScalerGroupHealthMonitor
This monitor is designed for Citrix® NetScaler® loadbalancing checks. It checks if more than x percent of the servers assigned to a specific group on a loadbalanced service are active. The required data is gathered via SNMP from the NetScaler®. The status of the servers is determined by the NetScaler®. The provided service it self is not part of the check. The basis of this monitor is the SnmpMonitorStrategy. A valid SNMP configuration in OpenNMS Horizon for the NetScaler® is required.
A NetScaler® can manage several groups of servers per application. This monitor just covers one group at a time. If there are multiple groups to check, define one monitor per group. |
This monitor is not checking the loadbalanced service it self. |
Monitor facts
Class Name |
|
Remote Enabled |
false |
Configuration and Usage
Parameter | Description | Required | Default value |
---|---|---|---|
|
The name of the server group to check |
required |
|
|
The percentage of active servers vs total server of the group as an integer |
optional |
|
This monitor implements the Common Configuration Parameters.
Examples
The following example checks a server group called central_webfront_http.
If at least 70% of the servers are active, the service is up.
If less then 70% of the servers are active the service is down.
A configuration like the following can be used for the example in the poller-configuration.xml
.
<service name="NetScaler_Health" interval="300000" user-defined="false" status="on">
<parameter key="group-name" value="central_webfront_http" />
<parameter key="group-health" value="70" />
</service>
<monitor service="NetScaler_Health" class-name="org.opennms.netmgt.poller.monitors.NetScalerGroupHealthMonitor” />
Details about the used SNMP checks
The monitor checks the status of the server group based on the NS-ROOT-MIB using the svcGrpMemberState.
svcGrpMemberState is part of the serviceGroupMemberTable.
The serviceGroupMemberTable is indexed by svcGrpMemberGroupName and svcGrpMemberName.
A initial lookup for the group-name
is performed.
Based on the lookup the serviceGroupMemberTable is walked with the numeric representation of the server group.
The monitor interprets just the server status code 7-up as active server.
Other status codes like 2-unknown or 3-busy are counted for total amount of servers.
5.6.33. NrpeMonitor
This monitor allows to test plugins and checks running on the Nagios Remote Plugin Executor (NRPE) framework. The monitor allows to test the status output of any available check command executed by NRPE. Between OpenNMS Horizon and Nagios are some conceptional differences. In OpenNMS Horizon a service can only be available or not available and the response time for the service is measured. Nagios on the other hand combines service availability, performance data collection and thresholding in one check command. For this reason a Nagios check command can have more states then OK and CRITICAL. Using the NrpeMonitor marks all check command results other than OK as down. The full output of the check command output message is passed into the service down event in OpenNMS Horizon.
NRPE configuration on the server is required and the check command has to be configured, e.g. command[check_apt]=/usr/lib/nagios/plugins/check_apt
|
OpenNMS Horizon executes every NRPE check in a Java thread without fork() a process and it is more resource friendly.
Nevertheless it is possible to run NRPE plugins which combine a lot of external programs like sed , awk or cut .
Be aware, each command end up in forking additional processes.
|
Monitor facts
Class Name |
|
Remote Enabled |
false |
Configuration and Usage
Parameter | Description | Required | Default value |
---|---|---|---|
|
Number of retries before the service is marked as down. |
optional |
|
|
The {check_name} of the command configured as `command[{check_name}]="/path/to/plugin/check-script" |
required |
empty |
|
Port to access NRPE on the remote server. |
optional |
|
|
Padding for sending the command to the NRPE agent. |
optional |
|
|
Enable encryption of network communication. NRPE uses SSL with anonymous DH and the following cipher suite TLS_DH_anon_WITH_AES_128_CBC_SHA |
optional |
|
This monitor implements the Common Configuration Parameters.
Example: Using check_apt with NRPE
This examples shows how to configure the NrpeMonitor running the check_apt command on a configured NRPE.
command[check_apt]=/usr/lib/nagios/plugins/check_apt
<service name="NRPE-Check-APT" interval="300000" user-defined="false" status="on">
<parameter key="retry" value="3" />
<parameter key="timeout" value="3000" />
<parameter key="port" value="5666" />
<parameter key="command" value="check_apt" />
<parameter key="padding" value="2" />
</service>
<monitor service="NRPE-Check-APT" class-name="org.opennms.netmgt.poller.monitors.NrpeMonitor" />
5.6.34. NtpMonitor
The NTP monitor tests for NTP service availability. During the poll an NTP request query packet is generated. If a response is received, it is parsed and validated. If the response is a valid NTP response, the service is considered available.
Monitor facts
Class Name |
|
Remote Enabled |
true |
Configuration and Usage
Parameter | Description | Required | Default value |
---|---|---|---|
|
The destination port where the NTP request shall be sent. |
optional |
|
|
Number of attempts to get a response. |
optional |
|
|
Time in milliseconds to wait for a response. |
optional |
|
This monitor implements the Common Configuration Parameters.
Examples
<!-- Fast NTP server -->
<service name="NTP" interval="300000" user-defined="false" status="on">
<parameter key="retry" value="2"/>
<parameter key="timeout" value="1000"/>
<parameter key="rrd-repository" value="/var/lib/opennms/rrd/response"/>
<parameter key="rrd-base-name" value="ntp"/>
<parameter key="ds-name" value="ntp"/>
</service>
<monitor service="NTP" class-name="org.opennms.netmgt.poller.monitors.NtpMonitor"/>
5.6.35. OmsaStorageMonitor
With OmsaStorageMonitor you are able to monitor your Dell OpenManaged servers RAID array status. The following OIDs from the STORAGEMANAGEMENT-MIB are supported by this monitor:
virtualDiskRollUpStatus .1.3.6.1.4.1.674.10893.1.20.140.1.1.19 arrayDiskLogicalConnectionVirtualDiskNumber .1.3.6.1.4.1.674.10893.1.20.140.3.1.5 arrayDiskNexusID .1.3.6.1.4.1.674.10893.1.20.130.4.1.26 arrayDiskLogicalConnectionArrayDiskNumber .1.3.6.1.4.1.674.10893.1.20.140.3.1.3 arrayDiskState .1.3.6.1.4.1.674.10893.1.20.130.4.1.4
To test the status of the disk array the virtualDiskRollUpStatus
is used.
If the result of the virtualDiskRollUpStatus
is not 3
the monitors is marked as down.
Result | State description | Monitor state in OpenNMS Horizon |
---|---|---|
|
other |
DOWN |
|
unknown |
DOWN |
|
ok |
UP |
|
non-critical |
DOWN |
|
critical |
DOWN |
|
non-recoverable |
DOWN |
You’ll need to know the maximum number of possible logical disks you have in your environment. For example: If you have 3 RAID arrays, you need for each logical disk array a service poller. |
To give more detailed information in case of an disk array error, the monitor tries to identify the problem using the other OIDs. This values are used to enrich the error reason in the service down event. The disk array state is resolved to a human readable value by the following status table.
Value | Status |
---|---|
|
Ready |
|
Failed |
|
Online |
|
Offline |
|
Degraded |
|
Recovering |
|
Removed |
|
Resynching |
|
Rebuilding |
|
noMedia |
|
Formating |
|
Running Diagnostics |
|
Initializing |
Monitor facts
Class Name |
|
Remote Enabled |
|
Configuration and Usage
Parameter | Description | Required | Default value |
---|---|---|---|
|
The disk index of your RAID array |
optional |
|
|
The TCP port OpenManage is listening |
optional |
from |
This monitor implements the Common Configuration Parameters.
Examples
Some example configuration how to configure the monitor in the poller-configuration.xml
.
The RAID array monitor for your first array is configured with virtualDiskNumber = 1
and can look like this:
<service name="OMSA-Disk-Array-1" interval="300000" user-defined="false" status="on">
<parameter key="retry" value="3"/>
<parameter key="timeout" value="6000"/>
<parameter key="virtualDiskNumber" value="1"/>
</service>
<monitor service="OMSA-Disk-Array-1" class-name="org.opennms.netmgt.poller.monitors.OmsaStorageMonitor"/>
If there is more than one RAID array to monitor you need an additional configuration.
In this case virtualDiskNumber = 2
.
<service name="OMSA-Disk-Array-2" interval="300000" user-defined="false" status="on">
<parameter key="retry" value="3"/>
<parameter key="timeout" value="6000"/>
<parameter key="virtualDiskNumber" value="2"/>
</service>
<monitor service="OMSA-Disk-Array-2" class-name="org.opennms.netmgt.poller.monitors.OmsaStorageMonitor"/>
5.6.36. OpenManageChassisMonitor
The OpenManageChassis monitor tests the status of a Dell chassis by querying its SNMP agent. The monitor polls the value of the node’s SNMP OID .1.3.6.1.4.1.674.10892.1.300.10.1.4.1 (MIB-Dell-10892::chassisStatus). If the value is OK (3), the service is considered available.
As this monitor uses SNMP, the queried nodes must have proper SNMP configuration in snmp-config.xml.
Monitor facts
Class Name |
|
Remote Enabled |
false |
Configuration and Usage
Parameter | Description | Required | Default value |
---|---|---|---|
|
The port to which connection shall be tried. |
optional |
from |
This monitor implements the Common Configuration Parameters.
Examples
<!-- Overriding default SNMP config -->
<service name="OMA-Chassis" interval="300000" user-defined="false" status="on">
<parameter key="retry" value="3"/>
<parameter key="timeout" value="5000"/>
</service>
<monitor service="OMA-Chassis" class-name="org.opennms.netmgt.poller.monitors.OpenManageChassisMonitor" />
Dell MIBs
Dell MIBs can be found here. Download the DCMIB<version>.zip or DCMIB<version>.exe file corresponding to the version of your OpenManage agents. The latest one should be good enough for all previous version though.
5.6.37. PageSequenceMonitor
The PageSequenceMonitor (PSM) allows OpenNMS to monitor web applications. This monitor has several configuration options regarding IPv4, IPv6 and how to deal with name resolution. To add flexibility, the node label and IP address can be passed as variable into the monitor. This allows running the monitor with node dependent configuration. Beyond testing a web application with a single URL it can also test a path through a web application. A test path through an web application can look like this:
-
login to a certain web application
-
Execute an action while being logged in
-
Log off
The service is considered as up if all this is working ok. If there’s an error somewhere, your application will need attention and the service changes the state to down.
Monitor facts
Class Name |
|
Remote Enabled |
true |
Configuration and Usage
The configuration for this monitor consists of several parts.
First is the overall configuration for retries
and timeouts
.
These parameters are global for the whole path through the web application.
The overall layout of the monitor configuration is more complex. Additionally, it is possible to configure a page sequence containing a path through a web application.
Parameter | Description | Required | Default value |
---|---|---|---|
|
The number of retries per page. |
optional |
|
|
Defines a timer to wait before a retry attempt is made.
It is only used if at least one (1) retry is configured.
If |
optional |
|
|
Definition of the page-sequence to execute, see table with Page Sequence Parameter |
required |
|
|
The retry parameter for the entire page sequence. |
optional |
|
|
Should the system wide proxy settings be used? The system proxy settings can be configured via system properties |
optional |
|
This monitor implements the Common Configuration Parameters.
Parameter | Description | Required | Default |
---|---|---|---|
|
The name of the page-sequence. (Is this relevant/used?) |
optional |
|
|
HTTP method for example GET or POST |
|
|
|
HTTP protocol version number, 0.9, 1.0 or 1.1 |
optional |
|
|
Set the user agent field in HTTP header to identify the OpenNMS monitor |
optional |
|
|
Set the virtual host field in HTTP header. In case of an HTTPS request, this is also the virtual domain to send as part of the TLS negotiation, known as server name indication (SNI) (See: RFC3546 section 3.1) |
|
|
|
The relative URL to call in the request. |
required |
|
|
Define the URL scheme as |
optional |
|
|
Set user info field in the HTTP header |
|
|
|
Set host field in HTTP header |
optional |
|
|
Communication requires a connection to an IPv6 address. ( |
|
|
|
Communication requires a connection to an IPv4 address. ( |
|
|
|
Enable or disable SSL certificate verification for HTTPS tests. Please use this option carefully, for self-signed certificates import the CA certificate in the JVM and don’t just disable it. |
optional |
|
|
Port of the web server connecting to |
optional |
|
|
?? |
|
|
|
Text to look for in the response body. This is a Regular Expression matched against every line, and it will be considered a failure at the first match and sets the service with this monitor Down. |
|
|
|
The failure message is used to construct the reason code.
|
|
|
|
Text to look for in the response body. This is a Regular Expression matched against every line, and it will be considered a success at the first match and sets the service with this monitor Up. |
optional |
|
|
The relative URL which must be loaded for the request to be considered successful. |
optional |
|
|
Range for allowed HTTP error codes from the response. |
|
|
|
Assign the value of a regex match group to a session variable with a user-defined name. The match group is identified by number and must be zero or greater. |
|
|
|
A comma-separated list of acceptable HTTP response code ranges ( |
optional |
|
If you set requireIPv4 and requireIPv6 false, the host IP for connection will be resolved from system name resolver and the associated IP address from the IP interface is ignored.
|
Variable | Description |
---|---|
|
Nodelabel of the node the monitor is associated to. |
Session variables
It is possible to assign strings from a retrieved page to variables that can be used in page parameters later in the same sequence.
First, specify one or more capturing groups in the successMatch
expression (see Java Class Pattern for more information on regular expressions in Java).
The captured values can then be assigned to variable names by using the session-variable parameter, and used in a later page load.
Per-page response times
It is possible to collect response times for individual pages in a sequence.
To use this functionality, a ds-name
attribute must be added to each page whose load time should be tracked.
The response time for each page will be stored in the same RRD file specified for the service via the rrd-base-name
parameter under the specified datasource name.
You will need to delete existing RRD files and let them be recreated with the new list of datasources when you add a ds-name attribute to a page in a sequence that is already storing response time data.
|
Examples
The following example shows how to monitor the OpenNMS web application using several mechanisms.
It first does an HTTP GET of ${ipaddr}/opennms
(following redirects as a browser would) and then checks to ensure that the resulting page has the phrase Password
on it.
Next, a login is attempted using HTTP POST to the relative URL for submitting form data (usually, the URL which the form action points to).
The parameters (j_username
and j_password
) indicate the form’s data and values to be submitted.
Furthermore a custom header (foo
) is set for demonstration purposes.
After getting the resulting page, first the expression specified in the page’s failureMatch
attribute is verified, which when found anywhere on the page indicates that the page has failed.
If the failureMatch
expression is not found in the resulting page, then the expression specified in the page’s successMatch
attribute is checked to ensure it matches the resulting page.
If the successMatch
expression is not found on the page, then the page fails.
If the monitor was able to successfully login, then the next page is processed.
In the example, the monitor navigates to the Event page, to ensure that the text Event Queries is found on the page.
Finally, the monitor calls the URL of the logout page to close the session.
By using the locationMatch
parameter, it is verified that the logout was successful and a redirect was triggered.
Each page is checked to ensure its HTTP response code fits into the response-range , before the failureMatch , successMatch , and locationMatch expressions are evaluated.
|
<service name="OpenNMS-Web-Login" interval="30000" user-defined="true" status="on">
<parameter key="retry" value="1"/>
<parameter key="timeout" value="5000"/>
<parameter key="rrd-repository" value="/opt/opennms/share/rrd/response"/>
<parameter key="ds-name" value="opennmslogin"/>
<parameter key="page-sequence">
<page-sequence>
<page path="/opennms/login.jsp"
port="8980"
successMatch="Password" />
<page path="/opennms/j_spring_security_check"
port="8980"
method="POST">
<parameter key="j_username" value="admin"/>
<parameter key="j_password" value="admin"/>
<header name="foo" value="bar"/>
</page>
<page path="/opennms/index.jsp"
port="8980"
successMatch="Log Out" />
<page path="/opennms/event/index"
port="8980" successMatch="Event Queries" />
<page path="/opennms/j_spring_security_logout"
port="8980"
method="POST"
response-range="300-399"
locationMatch="/opennms" />
</page-sequence>
</parameter>
</service>
<monitor service="OpenNMS-Web-Login" class-name="org.opennms.netmgt.poller.monitors.PageSequenceMonitor"/>
<service name="OpenNMS-Web-Login" interval="30000" user-defined="true" status="on">
<parameter key="retry" value="1"/>
<parameter key="timeout" value="5000"/>
<parameter key="rrd-repository" value="/opt/opennms/share/rrd/response"/>
<parameter key="ds-name" value="opennmslogin"/>
<parameter key="page-sequence">
<page-sequence>
<page scheme="http"
host="ecomm.example.com"
port="80"
path="/ecomm/jsp/Login.jsp"
virtual-host="ecomm.example.com"
successMatch="eComm Login"
timeout="10000"
http-version="1.1"/>
<page scheme="https"
method="POST"
host="ecomm.example.com" port="443"
path="/ecomm/controller"
virtual-host="ecomm.example.com"
successMatch="requesttab_select.gif"
failureMessage="Login failed: ${1}"
timeout="10000"
http-version="1.1">
<parameter key="action_name" value="XbtnLogin"/>
<parameter key="session_timeout" value=""/>
<parameter key="userid" value="EXAMPLE"/>
<parameter key="password" value="econ"/>
</page>
<page scheme="http"
host="ecomm.example.com" port="80"
path="/econsult/controller"
virtual-host="ecomm.example.com"
successMatch="You have successfully logged out of eComm"
timeout="10000" http-version="1.1">
<parameter key="action_name" value="XbtnLogout"/>
</page>
</page-sequence>
</parameter>
</service>
<monitor service="OpenNMS-Web-Login" class-name="org.opennms.netmgt.poller.monitors.PageSequenceMonitor"/>
<service name="OpenNMS-Web-Login" interval="30000" user-defined="true" status="on">
<parameter key="retry" value="1"/>
<parameter key="timeout" value="5000"/>
<parameter key="rrd-repository" value="/opt/opennms/share/rrd/response"/>
<parameter key="ds-name" value="opennmslogin"/>
<parameter key="page-sequence">
<page-sequence name="opennms-login-seq-dynamic-credentials">
<page path="/opennms"
port="80"
virtual-host="demo.opennms.org"
successMatch="(?s)User:.*<strong>(.*?)</strong>.*?Password:.*?<strong>(.*?)</strong>">
<session-variable name="username" match-group="1" />
<session-variable name="password" match-group="2" />
</page>
<page path="/opennms/j_acegi_security_check"
port="80"
virtual-host="demo.opennms.org"
method="POST"
failureMatch="(?s)Your log-in attempt failed.*Reason: ([^<]*)"
failureMessage="Login Failed: ${1}"
successMatch="Log out">"
<parameter key="j_username" value="${username}" />
<parameter key="j_password" value="${password}" />
</page>
<page path="/opennms/event/index.jsp"
port="80"
virtual-host="demo.opennms.org"
successMatch="Event Queries" />
<page path="/opennms/j_acegi_logout"
port="80"
virtual-host="demo.opennms.org"
successMatch="logged off" />
</page-sequence>
</parameter>
</service>
<monitor service="OpenNMS-Web-Login" class-name="org.opennms.netmgt.poller.monitors.PageSequenceMonitor"/>
<service name="OpenNMS-Demo-Login" interval="300000" user-defined="true" status="on">
<parameter key="page-sequence">
<page-sequence>
<page path="/opennms"
port="80"
virtual-host="demo.opennms.org"
successMatch="(?s)User:.*<strong>(.*?)</strong>.*?Password:.*?<strong>(.*?)</strong>">
<session-variable name="username" match-group="1" />
<session-variable name="password" match-group="2" />
</page>
<page path="/opennms/j_acegi_security_check"
port="80"
virtual-host="demo.opennms.org"
method="POST"
successMatch="Log out">"
<parameter key="j_username" value="${username}" />
<parameter key="j_password" value="${password}" />
</page>
<page path="/opennms/j_acegi_logout"
port="80"
virtual-host="demo.opennms.org"
successMatch="logged off" />
</page-sequence>
</parameter>
</service>
<monitor service="OpenNMS-Demo-Login" class-name="org.opennms.netmgt.poller.monitors.PageSequenceMonitor"/>
<service name="OpenNMS-Login" interval="300000" user-defined="false" status="on">
<parameter key="rrd-repository" value="/opt/opennms/share/rrd/response"/>
<parameter key="rrd-base-name" value="opennmslogin"/>
<parameter key="ds-name" value="overall"/>
<parameter key="page-sequence">
<page-sequence>
<page path="/opennms/acegilogin.jsp"
port="8980"
ds-name="login-page"/>
<page path="/opennms/event/index.jsp"
port="8980"
ds-name="event-page"/>
</page-sequence>
</parameter>
</service>
<monitor service="OpenNMS-Login" class-name="org.opennms.netmgt.poller.monitors.PageSequenceMonitor"/>
5.6.38. PercMonitor
This monitor tests the status of a PERC RAID array.
The monitor first polls the RAID-Adapter-MIB::logicaldriveTable (1.3.6.1.4.1.3582.1.1.2) to retrieve the status of the RAID array you want to monitor. If the value of the status object of the corresponding logicaldriveEntry is not 2, the array is degraded and the monitor further polls the RAID-Adapter-MIB::physicaldriveTable (1.3.6.1.4.1.3582.1.1.3) to detect the failed drive(s).
This monitor requires the outdated persnmpd software to be installed on the polled nodes. Please prefer using OmsaStorageMonitor monitor where possible. |
Monitor facts
Class Name |
|
Remote Enabled |
false (relies on SNMP configuration) |
Configuration and Usage
Parameter | Description | Required | Default value |
---|---|---|---|
|
The RAID array you want to monitor. |
optional |
|
|
The UDP port to connect to |
optional |
|
This monitor implements the Common Configuration Parameters.
Examples
<!-- Monitor 1st RAID arrays using configuration from snmp-config.xml -->
<service name="PERC" interval="300000" user-defined="false" status="on" />
<monitor service="PERC" class-name="org.opennms.netmgt.poller.monitors.PercMonitor" />
5.6.39. Pop3Monitor
The POP3 monitor tests for POP3 service availability on a node.
The monitor first tries to establish a TCP connection on the specified port.
If a connection is established, a service banner should have been received.
The monitor makes sure the service banner is a valid POP3 banner (ie: starts with +OK
).
If the banner is valid, the monitor sends a QUIT
POP3 command and makes sure the service answers with a valid response (ie: a response that starts with +OK
).
The service is considered available if the service’s answer to the QUIT
command is valid.
The behaviour can be simulated with telnet
:
$ telnet mail.opennms.org 110 Trying 192.168.0.100 Connected to mail.opennms.org. Escape character is '^]'. +OK <21860.1076718099@mail.opennms.org> quit +OK Connection closed by foreign host.
Monitor facts
Class Name |
|
Remote Enabled |
true |
Configuration and Usage
Parameter | Description | Required | Default value |
---|---|---|---|
|
TCP port to connect to. |
optional |
|
|
Number of attempts to find the service available. |
optional |
|
|
If set to |
optional |
|
This monitor implements the Common Configuration Parameters.
Examples
<service name="POP3" interval="300000" user-defined="false" status="on">
<parameter key="retry" value="2"/>
<parameter key="timeout" value="3000"/>
<parameter key="rrd-repository" value="/var/lib/opennms/rrd/response"/>
<parameter key="rrd-base-name" value="pop3"/>
<parameter key="ds-name" value="pop3"/>
</service>
<monitor service="POP3" class-name="org.opennms.netmgt.poller.monitors.Pop3Monitor"/>
5.6.40. PrTableMonitor
The PrTableMonitor monitor tests the prTable of a Net-SNMP agent.
A table containing information on running programs/daemons configured for monitoring in the snmpd.conf file of the agent. Processes violating the number of running processes required by the agent’s configuration file are flagged with numerical and textual errors.
The monitor looks up the prErrorFlag entries of this table. If the value of a prErrorFlag entry in this table is set to "1" the service is considered unavailable.
An Error flag to indicate trouble with a process. It goes to 1 if there is an error, 0 if no error.
Monitor facts
Class Name |
|
Remote Enabled |
false |
Configuration and Usage
Parameter | Description | Required | Default value |
---|---|---|---|
|
The port to which connection shall be tried. |
optional |
from |
|
Deprecated.
Same as |
optional |
from |
This monitor implements the Common Configuration Parameters.
Examples
<!-- Overriding default SNMP config -->
<service name="Process-Table" interval="300000" user-defined="false" status="on">
<parameter key="retry" value="3"/>
<parameter key="timeout" value="5000"/>
</service>
<monitor service="Process-Table" class-name="org.opennms.netmgt.poller.monitors.PrTableMonitor" />
UCD-SNMP-MIB
The UCD-SNMP-MIB may be found here.
5.6.41. RadiusAuthMonitor
This monitor allows to test the functionality of the RADIUS authentication system. The availability is tested by sending an AUTH packet to the RADIUS server. If a valid ACCEPT response is received, the RADIUS service is up and considered as available. This monitor implements placeholder substitution in parameter values.
To use this monitor it is required to install the RADIUS protocol for OpenNMS Horizon. |
For RPM-based distributions:
yum install opennms-plugin-protocol-radius
For Debian-based distributions:
apt-get install opennms-plugin-protocol-radius
The test is similar to test the behavior of a RADIUS server by evaluating the result with the command line tool radtest
.
root@vagrant:~# radtest "John Doe" hello 127.0.0.1 1812 radiuspassword
Sending Access-Request of id 49 to 127.0.0.1 port 1812
User-Name = "John Doe"
User-Password = "hello"
NAS-IP-Address = 127.0.0.1
NAS-Port = 1812
Message-Authenticator = 0x00000000000000000000000000000000
rad_recv: Access-Accept packet from host 127.0.0.1 port 1812, id=49, length=37 (1)
Reply-Message = "Hello, John Doe"
1 | The Access-Accept message which is evaluated by the monitor. |
Monitor facts
Class Name |
|
Remote Enabled |
false |
Configuration and Usage
Parameter | Description | Required | Default value | Placeholder substitution |
---|---|---|---|---|
|
Time in milliseconds to wait for the RADIUS service. |
optional |
|
No |
|
This is a placeholder for the second optional monitor parameter description. |
optional |
|
No |
|
RADIUS authentication port. |
optional |
|
No |
|
RADIUS accounting port. |
optional |
|
No |
|
Username to test the authentication |
optional |
|
Yes |
|
Password to test the authentication |
optional |
|
Yes |
|
The RADIUS shared secret used for communication between the client/NAS and the RADIUS server. |
optional |
|
Yes |
|
RADIUS authentication type. The following authentication types are supported:
|
optional |
|
No |
|
The Network Access Server identifier originating the Access-Request. |
optional |
|
Yes |
|
When using EAP-TTLS authentication, this property indicates the tunnelled authentication type.
Only |
optional |
|
No |
|
Username for the tunnelled |
optional |
|
Yes |
This monitor implements the Common Configuration Parameters.
Examples
Example configuration how to configure the monitor in the poller-configuration.xml
.
<service name="Radius-Authentication" interval="300000" user-defined="false" status="on">
<parameter key="retry" value="3" />
<parameter key="timeout" value="3000" />
<parameter key="user" value="John Doe" />
<parameter key="password" value="hello" />
<parameter key="secret" value="radiuspassword" />
<parameter key="rrd-repository" value="/var/lib/opennms/rrd/response" />
<parameter key="ds-name" value="radiusauth" />
</service>
<monitor service="Radius-Authentication" class-name="org.opennms.protocols.radius.monitor.RadiusAuthMonitor" />
5.6.42. SmbMonitor
This monitor is used to test the NetBIOS over TCP/IP name resolution in Microsoft Windows environments. The monitor tries to retrieve a NetBIOS name for the IP address of the interface. Name services for NetBIOS in Microsoft Windows are provided on port 137/UDP or 137/TCP.
The service uses the IP address of the interface, where the monitor is assigned to. The service is up if for the given IP address a NetBIOS name is registered and can be resolved.
For troubleshooting see the usage of the Microsoft Windows command line tool nbtstat
or on Linux nmblookup
.
Microsoft deprecated the usage of NetBIOS. Since Windows Server 2000 DNS is used as the default name resolution. |
Monitor facts
Class Name |
|
Remote Enabled |
false |
Configuration and Usage
Parameter | Description | Required | Default value |
---|---|---|---|
|
Try to get the NetBIOS node status type for the given address |
optional |
|
This monitor implements the Common Configuration Parameters.
Examples
Some example configuration how to configure the monitor in the poller-configuration.xml
.
<service name="SMB" interval="300000" user-defined="false" status="on">
<parameter key="retry" value="1"/>
<parameter key="timeout" value="3000"/>
</service>
<monitor service="SMB" class-name="org.opennms.netmgt.poller.monitors.SmbMonitor"/>
5.6.43. SmtpMonitor
The SMTP monitor tests for SMTP service availability on a node. The monitor first tries to establish a TCP connection on the specified port. If a connection is established, a service banner should have been received. The monitor makes sure the service banner is a valid SMTP banner (starts with "220"). If the banner is valid, the monitor sends a HELO SMTP command, identifying itself with the hostname of the OpenNMS server, and makes sure the service answers with a valid response (starts with "250"). If the response to the HELO is valid, the monitor issues a QUIT SMTP command. The service is considered available if the service’s answer to the HELO command is valid (starts with "221").
The behaviour can be simulated with telnet
or netcat
:
$ nc -v gmail-smtp-in.l.google.com 25 Ncat: Version 7.60 ( https://nmap.org/ncat ) Ncat: Connected to 2607:f8b0:4002:c06::1a:25. 220 mx.google.com ESMTP j17-v6si13545102ywb.87 - gsmtp HELO opennms.com 250 mx.google.com at your service QUIT 221 2.0.0 closing connection j17-v6si13545102ywb.87 - gsmtp
Monitor facts
Class Name |
|
Remote Enabled |
true |
Configuration and Usage
Parameter | Description | Required | Default value |
---|---|---|---|
|
TCP port to connect to. |
optional |
|
|
Number of attempts to find the service available. |
optional |
|
|
Timeout in milliseconds for the underlying socket’s connect and read operations. |
optional |
|
Examples
<service name="SMTP" interval="300000" user-defined="false" status="on">
<parameter key="retry" value="1" />
<parameter key="timeout" value="3000" />
<parameter key="port" value="25" />
<parameter key="rrd-repository" value="${install.share.dir}/rrd/response" />
<parameter key="rrd-base-name" value="smtp" />
<parameter key="ds-name" value="smtp" />
</service>
<monitor service="SMTP" class-name="org.opennms.netmgt.poller.monitors.SmtpMonitor" />
5.6.44. SnmpMonitor
The SNMP monitor gives a generic possibility to monitor states and results from SNMP agents. This monitor has two basic operation modes:
-
Test the response value of one specific OID (scalar object identifier);
-
Test multiple values in a whole table.
To decide which mode should be used, the walk
and match-all
parameters are used.
See the Operating mode selection'' and
Monitor specific parameters for the SnmpMonitor'' tables below for more information about these operation modes.
walk | match-all | Operating mode |
---|---|---|
|
|
tabular, all values must match |
|
tabular, any value must match |
|
|
specifies that the value of at least minimum and at most maximum objects encountered in |
|
|
|
scalar |
|
scalar |
|
|
tabular, between |
Monitor facts
Class Name |
|
Remote Enabled |
false |
When the monitor is configured to persist the response time, it will count the total amount of time spent until a successful response is obtained, including the retries. It won’t store the time spent during the last successful attempt.
Configuration and Usage
Parameter | Description | Required | Default value |
---|---|---|---|
|
Specifies that the value monitored should be compared against its hexadecimal representation. Useful when the monitored value is a string containing non-printable characters. |
optional |
|
|
Can be set to: |
optional |
|
|
Valid only when |
optional |
|
|
Valid only when |
optional |
|
|
The object identifier of the MIB object to monitor. If no other parameters are present, the monitor asserts that the agent’s response for this object must include a valid value (as opposed to an error, no-such-name, or end-of-view condition) that is non-null. |
optional |
|
|
The value to be compared against the observed value of the monitored object.
Note: Comparison will always succeed if either the |
optional |
|
|
The operator to be used for comparing the monitored object against the |
optional |
|
|
Destination port where the SNMP requests shall be sent. |
optional |
from |
|
A user-provided template used for the monitor’s reason code if the service is unvailable. Defaults to a reasonable value if unset. See below for an explanation of the possible template parameters. |
optional |
depends on operation mode |
|
Deprecated Same as |
optional |
from |
|
|
optional |
|
This monitor implements the Common Configuration Parameters.
Variable | Description |
---|---|
|
Value of the |
|
IP address polled. |
|
Value of the |
|
When |
|
Value of the |
|
Value of the |
|
Polled value that made the monitor succeed or fail. |
|
Value of the |
|
Value of the |
|
Value of the |
|
Value of the |
|
Value of the |
|
Value of the |
|
Value of the |
Example for monitoring scalar object
As a working example we want to monitor the thermal system fan status which is provided as a scalar object ID.
cpqHeThermalSystemFanStatus .1.3.6.1.4.1.232.6.2.6.4.0
The manufacturer MIB gives the following information:
SYNTAX INTEGER {
other (1),
ok (2),
degraded (3),
failed (4)
}
ACCESS read-only
DESCRIPTION
"The status of the fan(s) in the system.
This value will be one of the following:
other(1)
Fan status detection is not supported by this system or driver.
ok(2)
All fans are operating properly.
degraded(3)
A non-required fan is not operating properly.
failed(4)
A required fan is not operating properly.
If the cpqHeThermalDegradedAction is set to shutdown(3) the
system will be shutdown if the failed(4) condition occurs."
The SnmpMonitor is configured to test if the fan status returns ok(2). If so, the service is marked as up. Any other value indicates a problem with the thermal fan status and marks the service down.
<service name="HP-Insight-Fan-System" interval="300000" user-defined="false" status="on">
<parameter key="oid" value=".1.3.6.1.4.1.232.6.2.6.4.0"/>(1)
<parameter key="operator" value="="/>(2)
<parameter key="operand" value="2"/>(3)
<parameter key="reason-template" value="System fan status is not ok. The state should be ok(${operand}) the observed value is ${observedValue}. Please check your HP Insight Manager. Syntax: other(1), ok(2), degraded(3), failed(4)"/>(4)
</service>
<monitor service="HP-Insight-Fan-System" class-name="org.opennms.netmgt.poller.monitors.SnmpMonitor" />
1 | Scalar object ID to test |
2 | Operator for testing the response value |
3 | Integer 2 as operand for the test |
4 | Encode MIB status in the reason code to give more detailed information if the service goes down |
Example test SNMP table with all matching values
The second mode shows how to monitor values of a whole SNMP table. As a practical use case the status of a set of physical drives is monitored. This example configuration shows the status monitoring from the CPQIDA-MIB.
We use as a scalar object id the physical drive status given by the following tabular OID:
cpqDaPhyDrvStatus .1.3.6.1.4.1.232.3.2.5.1.1.6
SYNTAX INTEGER {
other (1),
ok (2),
failed (3),
predictiveFailure (4)
}
ACCESS read-only
DESCRIPTION
Physical Drive Status.
This shows the status of the physical drive.
The following values are valid for the physical drive status:
other (1)
Indicates that the instrument agent does not recognize
the drive. You may need to upgrade your instrument agent
and/or driver software.
ok (2)
Indicates the drive is functioning properly.
failed (3)
Indicates that the drive is no longer operating and
should be replaced.
predictiveFailure(4)
Indicates that the drive has a predictive failure error and
should be replaced.
The configuration in our monitor will test all physical drives for status ok(2).
<service name="HP-Insight-Drive-Physical" interval="300000" user-defined="false" status="on">
<parameter key="oid" value=".1.3.6.1.4.1.232.3.2.5.1.1.6"/>(1)
<parameter key="walk" value="true"/>(2)
<parameter key="operator" value="="/>(3)
<parameter key="operand" value="2"/>(4)
<parameter key="match-all" value="true"/>(5)
<parameter key="reason-template" value="One or more physical drives are not ok. The state should be ok(${operand}) the observed value is ${observedValue}. Please check your HP Insight Manager. Syntax: other(1), ok(2), failed(3), predictiveFailure(4), erasing(5), eraseDone(6), eraseQueued(7)"/>(6)
</service>
<monitor service="HP-Insight-Drive-Physical" class-name="org.opennms.netmgt.poller.monitors.SnmpMonitor" />
1 | OID for SNMP table with all physical drive states |
2 | Enable walk mode to test every entry in the table against the test criteria |
3 | Test operator for integer |
4 | Integer 2 as operand for the test |
5 | Test in walk mode has to be passed for every entry in the table |
6 | Encode MIB status in the reason code to give more detailed information if the service goes down |
Example test SNMP table with all matching values
This example shows how to use the SnmpMonitor to test if the number of static routes are within a given boundary. The service is marked as up if at least 3 and at maxium 10 static routes are set on a network device. This status can be monitored by polling the table ipRouteProto from the RFC1213-MIB2.
ipRouteProto 1.3.6.1.2.1.4.21.1.9
The MIB description gives us the following information:
SYNTAX INTEGER {
other(1),
local(2),
netmgmt(3),
icmp(4),
egp(5),
ggp(6),
hello(7),
rip(8),
is-is(9),
es-is(10),
ciscoIgrp(11),
bbnSpfIgp(12),
ospf(13),
bgp(14)}
}
ACCESS read-only
DESCRIPTION
"The routing mechanism via which this route was learned.
Inclusion of values for gateway routing protocols is not
intended to imply that hosts should support those protocols."
To monitor only local routes, the test should be applied only on entries in the ipRouteProto table with value 2
.
The number of entries in the whole ipRouteProto table has to be counted and the boundaries on the number has to be applied.
<service name="All-Static-Routes" interval="300000" user-defined="false" status="on">
<parameter key="oid" value=".1.3.6.1.2.1.4.21.1.9" />(1)
<parameter key="walk" value="true" />(2)
<parameter key="operator" value="=" />(3)
<parameter key="operand" value="2" />(4)
<parameter key="match-all" value="count" />(5)
<parameter key="minimum" value="3" />(6)
<parameter key="maximum" value="10" />(7)
</service>
<monitor service="All-Static-Routes" class-name="org.opennms.netmgt.poller.monitors.SnmpMonitor" />
1 | OID for SNMP table ipRouteProto |
2 | Enable walk mode to test every entry in the table against the test criteria |
3 | Test operator for integer |
4 | Integer 2 as operand for testing local route entries |
5 | Test in walk mode has is set to count to get the number of entries in the table regarding operator and operand |
6 | Lower count boundary set to 3 |
7 | High count boundary is set to 10 |
5.6.45. SshMonitor
The SshMonitor tests the availability of a SSH service. During the poll an attempt is made to connect on the specified port. If the connection request is successful, then the service is considered up. Optionaly, the banner line generated by the service may be parsed and compared against a pattern before the service is considered up.
Monitor facts
Class Name |
|
Remote Enabled |
true |
Configuration and Usage
Parameter | Description | Required | Default value |
---|---|---|---|
|
Regular expression to be matched against the service’s banner. |
optional |
|
|
The client banner that OpenNMS Horizon will use to identify itself on the service. |
optional |
|
|
Regular expression to be matched against the service’s banner. |
optional |
|
|
TCP port to which SSH connection shall be tried. |
optional |
|
|
Number of attempts to establish the SSH connnection. |
optional |
|
This monitor implements the Common Configuration Parameters.
Examples
<service name="SSH" interval="300000" user-defined="false" status="on">
<parameter key="retry" value="1"/>
<parameter key="banner" value="SSH"/>
<parameter key="client-banner" value="OpenNMS poller"/>
<parameter key="timeout" value="5000"/>
<parameter key="rrd-repository" value="/var/lib/opennms/rrd/response"/>
<parameter key="rrd-base-name" value="ssh"/>
<parameter key="ds-name" value="ssh"/>
</service>
<monitor service="SSH" class-name="org.opennms.netmgt.poller.monitors.SshMonitor"/>
5.6.46. SSLCertMonitor
This monitor is used to test if a SSL certificate presented by a remote network server are valid. A certificate is invalid if its initial time is prior to the current time, or if the current time is prior to 7 days (configurable) before the expiration time.
You can simulate the behavior by running a command like this:
echo | openssl s_client -connect <site>:<port> 2>/dev/null | openssl x509 -noout -dates
The output shows you the time range a certificate is valid:
notBefore=Dec 24 14:11:34 2013 GMT notAfter=Dec 25 10:37:40 2014 GMT
You can configure a threshold in days applied on the notAfter
date.
While the monitor is mainly useful for plain SSL sockets, the monitor does provide limited support for STARTTLS protocols by providing the user with the ability to specify a STARTTLS message to be sent prior to the SSL negotiation and a regular expression to match to the response received from the server. An additional preliminary message and response regular expression pair is available for protocols that require it (such as XMPP).
This monitor implements placeholder substitution in parameter values.
Monitor facts
Class Name |
|
Remote Enabled |
true |
Configuration and Usage
Parameter | Description | Required | Default value | Placeholder substition |
---|---|---|---|---|
|
TCP port for the service with SSL certificate. |
required |
|
No |
|
Number of attempts to get the certificate state |
optional |
|
No |
|
Number of days before the certificate expires that we mark the service as failed. |
optional |
|
No |
|
This is the DNS hostname to send as part of the TLS negotiation, known as server name indication (SNI) (See: RFC3546 section 3.1) |
optional |
|
No |
|
Preliminary message to send to server prior to STARTTLS command. |
optional |
`` |
Yes |
|
Regular expression which must match response to preliminary message sent to server prior to STARTTLS command. |
optional |
`` |
Yes |
|
STARTTLS command. |
optional |
`` |
Yes |
|
Regular expression which must match response to STARTTLS command sent to server. |
optional |
`` |
Yes |
This monitor implements the Common Configuration Parameters.
Variable | Description |
---|---|
|
The node’s IP-Address |
|
The node ID |
|
Label of the node the monitor is associated to. |
|
The service name |
The monitor has limited support for communicating on other protocol layers above the SSL session layer. The STARTTLS support has only been tested with a single XMPP server. It is not known if the same approach will prove useful for other use cases, like sending a Host header for HTTPS, or issue a STARTTLS command for IMAP, POP3, SMTP, FTP, LDAP, or NNTP. |
Examples
The following examples show how to monitor SSL certificates on services like IMAPS, SMTPS and HTTPS as well as an example use of the STARTTLS feature for XMPP.
If the certificates expire within 30 days the service goes down and indicates this issue in the reason of the monitor.
In this example the monitoring interval is reduced to test the certificate every 2 hours (7,200,000 ms).
Configuration in poller-configuration.xml
is as the following:
<service name="SSL-Cert-IMAPS-993" interval="7200000" user-defined="false" status="on">
<parameter key="retry" value="2"/>
<parameter key="timeout" value="2000"/>
<parameter key="port" value="993"/>
<parameter key="days" value="30"/>
</service>
<service name="SSL-Cert-SMTPS-465" interval="7200000" user-defined="false" status="on">
<parameter key="retry" value="2"/>
<parameter key="timeout" value="2000"/>
<parameter key="port" value="465"/>
<parameter key="days" value="30"/>
</service>
<service name="SSL-Cert-HTTPS-443" interval="7200000" user-defined="false" status="on">
<parameter key="retry" value="2"/>
<parameter key="timeout" value="3000"/>
<parameter key="port" value="443"/>
<parameter key="days" value="30"/>
<parameter key="server-name" value="${nodelabel}.example.com"/>
</service>
<service name="XMPP-STARTTLS-5222" interval="7200000" user-defined="false" status="on">
<parameter key="retry" value="2"/>
<parameter key="timeout" value="3000"/>
<parameter key="port" value="5222"/>
<parameter key="days" value="30"/>
<parameter key="starttls-preamble" value="<stream:stream xmlns:stream='http://etherx.jabber.org/streams' xmlns='jabber:client' to='{ipAddr}' version='1.0'>"/>
<parameter key="starttls-preamble-response" value="^.*starttls.*$"/>
<parameter key="starttls-start" value="<starttls xmlns='urn:ietf:params:xml:ns:xmpp-tls'/>"/>
<parameter key="starttls-start-response" value="^.*starttls.*$"/>
</service>
<monitor service="SSL-Cert-IMAPS-993" class-name="org.opennms.netmgt.poller.monitors.SSLCertMonitor" />
<monitor service="SSL-Cert-SMTPS-465" class-name="org.opennms.netmgt.poller.monitors.SSLCertMonitor" />
<monitor service="SSL-Cert-HTTPS-443" class-name="org.opennms.netmgt.poller.monitors.SSLCertMonitor" />
<monitor service="XMPP-STARTTLS-5222" class-name="org.opennms.netmgt.poller.monitors.SSLCertMonitor" />
5.6.47. StrafePingMonitor
This monitor is used to monitor packet delay variation to a specific endpoint using ICMP. The main use case is to monitor a WAN end point and visualize packet loss and ICMP packet round trip time deviation. The StrafePingMonitor performs multiple ICMP echo requests (ping) and stores the response-time of each as well as the packet loss, in a RRD file. Credit is due to Tobias Oetiker, as this graphing feature is an adaptation of the SmokePing tool that he developed.
Monitor facts
Class Name |
|
Remote Enabled |
false |
Configuration and Usage
Monitor specific parameters for the StrafePingMonitor
Parameter | Description | Required | Default value |
---|---|---|---|
|
Time in milliseconds to wait before assuming that a packet has not responded |
optional |
|
|
The number of retries to attempt when a packet fails to respond in the given timeout |
optional |
|
|
The number of pings to attempt each interval |
required |
|
|
The number of pings that need to fail for the service to be considered down |
required |
|
|
Whether to set the "Don’t Fragment" bit on outgoing packets |
optional |
|
|
DSCP traffic-control value. |
optional |
|
|
Number of bytes of the ICMP packet to send. |
optional |
|
|
Time in milliseconds to wait between each ICMP echo-request packet |
required |
|
|
The location to write RRD data. Generally, you will not want to change this from default |
required |
|
|
The name of the RRD file to write (minus the extension, |
required |
|
This monitor implements the Common Configuration Parameters.
Examples
The StrafePingMonitor is typically used on WAN connections and not activated for every ICMP enabled device in your network.
Further this monitor is much I/O heavier than just a simple RRD graph with a single ICMP response time measurement.
By default you can find a separate poller package in the 'poller-configuration.xml' called strafer.
Configure the include-range
or a filter
to enable monitoring for devices with the service StrafePing.
Don’t forget to assign the service StrafePing on the IP interface to be activated. |
The following example enables the monitoring for the service StrafePing on IP interfaces in the range 10.0.0.1 until 10.0.0.20.
Additionally the Nodes have to be in a surveillance category named Latency
.
<package name="strafer" >
<filter>categoryName == 'Latency'</filter>
<include-range begin="10.0.0.1" end="10.0.0.20"/>
<rrd step="300">
<rra>RRA:AVERAGE:0.5:1:2016</rra>
<rra>RRA:AVERAGE:0.5:12:1488</rra>
<rra>RRA:AVERAGE:0.5:288:366</rra>
<rra>RRA:MAX:0.5:288:366</rra>
<rra>RRA:MIN:0.5:288:366</rra>
</rrd>
<service name="StrafePing" interval="300000" user-defined="false" status="on">
<parameter key="retry" value="0"/>
<parameter key="timeout" value="3000"/>
<parameter key="ping-count" value="20"/>
<parameter key="failure-ping-count" value="20"/>
<parameter key="wait-interval" value="50"/>
<parameter key="rrd-repository" value="/opt/opennms/share/rrd/response"/>
<parameter key="rrd-base-name" value="strafeping"/>
</service>
<downtime interval="30000" begin="0" end="300000"/>
<downtime interval="300000" begin="300000" end="43200000"/>
<downtime interval="600000" begin="43200000" end="432000000"/>
<downtime begin="432000000" delete="true"/>
</package>
<monitor service="StrafePing" class-name="org.opennms.netmgt.poller.monitors.StrafePingMonitor"/>
5.6.48. TcpMonitor
This monitor is used to test IP Layer 4 connectivity using TCP.
The monitor establishes an TCP connection to a specific port.
To test the availability of the service, the greetings banner of the application is evaluated.
The behavior is similar to a simple test using the telnet
command as shown in the example.
telnet
root@vagrant:~# telnet 127.0.0.1 22
Trying 127.0.0.1...
Connected to 127.0.0.1.
Escape character is '^]'.
SSH-2.0-OpenSSH_6.6.1p1 Ubuntu-2ubuntu2 (1)
1 | Service greeting banner |
Monitor facts
Class Name |
|
Remote Enabled |
true |
Configuration and Usage
Parameter | Description | Required | Default value |
---|---|---|---|
|
TCP port of the application. |
required |
|
|
Number of retries before the service is marked as down. |
optional |
|
|
Evaluation of the service connection banner with regular expression. By default any banner result is valid. |
optional |
|
This monitor implements the Common Configuration Parameters.
Examples
This example shows to test if the ICA service is available on TCP port 1494.
The test evaluates the connection banner starting with ICA
.
<service name="TCP-Citrix-ICA" interval="300000" user-defined="false" status="on">
<parameter key="retry" value="0" />
<parameter key="banner" value="ICA" />
<parameter key="port" value="1494" />
<parameter key="timeout" value="3000" />
<parameter key="rrd-repository" value="/var/lib/opennms/rrd/response" />
<parameter key="rrd-base-name" value="tcpCitrixIca" />
<parameter key="ds-name" value="tcpCitrixIca" />
</service>
<monitor service="TCP-Citrix-ICA" class-name="org.opennms.netmgt.poller.monitors.TcpMonitor" />
5.6.49. SystemExecuteMonitor
If it is required to execute a system call or run a script to determine a service status, the SystemExecuteMonitor can be used.
It is calling a script or system command, if required it provides additional arguments to the call.
To determine the status of the service the SystemExecuteMonitor can rely on 0
or a non-0 exit code of system call.
As an alternative, the output of the system call can be matched against a banner.
If the banner is part of the output the status is interpreted as up.
If the banner is not available in the output the status is determined as down.
Monitor facts
Class Name |
|
Remote Enabled |
true |
Configuration and Usage
Parameter | Description | Required | Default value |
---|---|---|---|
|
The system-call to execute. |
required |
|
|
The arguments to hand over to the system-call. It supports variable replacement, see below. |
optional |
|
|
A string that is match against the output of the system-call. If the output contains the banner, the service is determined as UP. |
optional |
|
The parameter args
supports variable replacement for the following set of variables.
Providing always a script output with a more detailed test error makes it easier to diagnose the problem when the nodeLostDown event occurs. |
This monitor implements the Common Configuration Parameters.
Variable | Description |
---|---|
|
Timeout in milliseconds, based on config of the service. |
|
Timeout in seconds, based on config of the service. |
|
Amount of retries based on config of the service. |
|
Service name based on the config of the service. |
|
IP-address of the interface the service is bound to. |
|
Nodeid of the node the monitor is associated to. |
|
Nodelabel of the node the monitor is associated to. |
Examples
Placeholder usage
<parameter key="args" value="-i ${ipaddr} -t ${timeout}"/>
<parameter key="args" value="http://${nodelabel}/${svcname}/static"/>
Exit status example
<service name="Script_Example" interval="300000" user-defined="true" status="on">
<parameter key="script" value="/opt/opennms/contrib/Script_Example.sh"/>
<parameter key="timeout" value="5000"/>
</service>
<monitor service="Script_Example" class-name="org.opennms.netmgt.poller.monitors.SystemExecuteMonitor"/>
#!/usr/bin/env bash
# ...some test logic
RESULT="TEST OK"
if [[ "TEST OK" == "${RESULT}" ]]; then
echo "This test passed"
exit 0
else
echo "This test failed because of ..."
exit 1
fi
Banner matching example
<service name="Script_Example" interval="300000" user-defined="true" status="on">
<parameter key="script" value="/opt/opennms/contrib/Script_Example.sh"/>
<parameter key="banner" value="PASSED"/>
<parameter key="timeout" value="5000"/>
</service>
<monitor service="Script_Example" class-name="org.opennms.netmgt.poller.monitors.SystemExecuteMonitor"/>
#!/usr/bin/env bash
# ...some test logic
RESULT="TEST OK"
if [[ "TEST OK" == "${RESULT}" ]]; then
echo "PASSED"
else
echo "FAILED"
fi
SystemExecuteMonitor vs GpMonitor
The SystemExecuteMonitor is the successor of the GpMonitor. The main differences are:
-
Variable replacement for the parameter args
-
There are no fixed arguments handed to the system-call
-
The SystemExecuteMonitor supports RemotePoller deployment
To migrate services from the GpMonitor to the SystemExecuteMonitor it is required to alter the parameter args.
To match the arguments called hoption
for the hostAddress
and toption
for the timeoutInSeconds
.
The args string that matches the GpMonitor call looks like this:
<parameter key="args" value="--hostname ${ipaddr} --timeout ${timeoutsec}" />
To migrate the GpMonitor parameters hoption
and toption
just replace the --hostname
and --timeout
directly in the args
key.
5.6.50. VmwareCimMonitor
This monitor is part of the VMware integration provided in Provisiond. The monitor is specialized to test the health status provided from all Host System (host) sensor data.
This monitor is only executed if the host is in power state on. |
This monitor requires to import hosts with Provisiond and the VMware import. OpenNMS Horizon requires network access to VMware vCenter and the hosts. To get the sensor data the credentials from vmware-config.xml for the responsible vCenter is used. The following asset fields are filled from Provisiond and is provided by VMware import feature: VMware Management Server, VMware Managed Entity Type and the foreignId which contains an internal VMware vCenter Identifier. |
The global health status is evaluated by testing all available host sensors and evaluating the state of each sensor. A sensor state could be represented as the following:
-
Unknown(0)
-
OK(5)
-
Degraded/Warning(10)
-
Minor failure(15)
-
Major failure(20)
-
Critical failure(25)
-
Non-recoverable error(30)
The service is up if all sensors have the status OK(5). If any sensor gives another status then OK(5) the service is marked as down. The monitor error reason contains a list of all sensors which not returned status OK(5).
In case of using Distributed Power Management the standBy state forces a service down. The health status is gathrered with a direct connection to the host and in stand by this connection is unavailable and the service is down. To deal with stand by states, the configuration ignoreStandBy can be used. In case of a stand by state, the service is considered as up. |
state can be changed see the ignoreStandBy
configuration parameter.
Monitor facts
Class Name |
|
Remote Enabled |
false |
Configuration and Usage
Parameter | Description | Required | Default value |
---|---|---|---|
|
Number of retries before the service is marked as down. |
optional |
|
|
Treat power state standBy as up. |
optional |
|
This monitor implements the Common Configuration Parameters.
Examples
Some example configuration how to configure the monitor in the poller-configuration.xml
.
<service name="VMwareCim-HostSystem" interval="300000" user-defined="false" status="on">
<parameter key="retry" value="2"/>
<parameter key="timeout" value="3000"/>
</service>
<monitor service="VMwareCim-HostSystem" class-name="org.opennms.netmgt.poller.monitors.VmwareCimMonitor"/>
5.6.51. VmwareMonitor
This monitor is part of the VMware integration provided in Provisiond and test the power state of a virtual machine (VM) or a host system (host).
If the power state of a VM or host is poweredOn the service is up.
The state off the service on the VM or Host is marked as down.
By default standBy is also considered as down.
In case of using Distributed Power Management the standBy state can be changed see the ignoreStandBy
configuration parameter.
The information for the status of a virtual machine is collected from the responsible VMware vCenter using the credentials from the vmware-config.xml. It is also required to get specific asset fields assigned to an imported virtual machine and host system. The following asset fields are required, which are populated by the VMware integration in Provisiond: VMware Management Server, VMware Managed Entity Type and the foreignId which contains an internal VMware vCenter Identifier. |
Monitor facts
Class Name |
|
Remote Enabled |
false |
Configuration and Usage
Parameter | Description | Required | Default value |
---|---|---|---|
|
Number of retries before the service is marked as down. |
optional |
|
|
Treat power state standBy as up. |
optional |
|
|
Checks for unacknowledged vSphere alarms for a given comma-separated list of severities (red, yellow, green, gray). |
optional |
`` |
This monitor implements the Common Configuration Parameters.
Examples
Some example configuration how to configure the monitor in the poller-configuration.xml
.
With this configuration the monitor will go down if any unacknowledged vSphere alarms with severity red or yellow exist for this managed entity.
<service name="VMware-ManagedEntity" interval="300000" user-defined="false" status="on">
<parameter key="retry" value="2"/>
<parameter key="timeout" value="3000"/>
<parameter key="reportAlarms" value="red, yellow"/>
</service>
<monitor service="VMware-ManagedEntity" class-name="org.opennms.netmgt.poller.monitors.VmwareMonitor"/>
5.6.52. WebMonitor
WebMonitor is a clone of HttpMonitor, that uses a different underlying library for doing HTTP connections. WebMonitor uses Apache HttpClient, which acts more like a real browser (follows redirects, etc.) than HttpMonitor.
Monitor facts
Class Name |
|
Configuration and Usage
Note that all parameters listed are optional.
Parameter | Description | Default value |
---|---|---|
|
Specifies that system-wide proxy settings be used. The system proxy settings can be configured via system properties. |
|
|
Protocol/scheme to use. |
|
|
The port to connect to. |
80 |
|
The path of the URL to request (e.g., |
|
|
The query string to add to the URL after a |
|
|
The connection/socket timeout. |
|
|
The |
|
|
The |
|
|
True/false whether to use HTTP 1.0 or 1.1. |
|
|
Headers to add |
|
|
Defaults to false, if true it will trust self-signed certificates. |
false |
|
Whether to enable basic authentication. |
|
|
The username for basic authentication. |
|
|
The password |
|
|
Whether to send basic authentication even if the site did not ask for it. |
true |
|
The response text to look for. |
|
|
What HTTP status ranges are considered success. |
100-399 |
5.6.53. Win32ServiceMonitor
The Win32ServiceMonitor enables OpenNMS Horizon to monitor the running state of any Windows service. The service status is monitored using the Microsoft Windows® provided SNMP agent providing the LAN Manager MIB-II. For this reason it is required the SNMP agent and OpenNMS Horizon is correctly configured to allow queries against part of the MIB tree. The status of the service is monitored by polling the
svSvcOperatingState = 1.3.6.1.4.1.77.1.2.3.1.3
of a given service by the display name.
Monitor facts
Class Name |
|
Remote Enabled |
false |
Configuration and Usage
Parameter | Description | Required | Default value |
---|---|---|---|
|
The name of the service, this should be the exact name of the Windows service to monitor as it appears in the Services MSC snap-in. Short names such as you might use with net start will not work here. |
required |
|
This monitor implements the Common Configuration Parameters.
Non-English Windows
The service-name is sometime encoded in languages other than English.
Like in French, the Task Scheduler service is Planificateur de tâche.
Because of the "â" (non-English character), the OID value is encoded in hexa (0x50 6C 61 6E 69 66 69 63 61 74 65 75 72 20 64 65 20 74 C3 A2 63 68 65 73).
|
Troubleshooting
If you’ve created a Win32ServiceMonitor poller and are having difficulties with it not being monitored properly on your hosts, chances are there is a difference in the name of the service you’ve created, and the actual name in the registry.
For example, I need to monitor a process called Example Service on one of our production servers.
I retrieve the Display name from looking at the service in service manager, and create an entry in the poller-configuration.xml
files using the exact name in the Display name field.
However, what I don’t see is the errant space at the end of the service display name that is revealed when doing the following:
snmpwalk -v 2c -c <communitystring> <hostname> .1.3.6.1.4.1.77.1.2.3.1.1
This provides the critical piece of information I am missing:
iso.3.6.1.4.1.77.1.2.3.1.1.31.83.116.97.102.102.119.97.114.101.32.83.84.65.70.70.86.73.69.87.32.66.97.99.107.103.114.111.117.110.100.32 = STRING: "Example Service "
Note the extra space before the close quote. |
The extra space at the end of the name was difficult to notice in the service manager GUI, but is easily visible in the snmpwalk
output.
The right way to fix this would be to correct the service Display name field on the server, however, the intent of this procedure is to recommend verifying the true name using snmpwalk
as opposed to relying on the service manager GUI.
Examples
Monitoring the service running state of the Task Scheduler on an English local Microsoft Windows® Server requires at minimum the following entry in the poller-configuration.xml
.
<service name="Windows-Task-Scheduler" interval="300000" user-defined="false" status="on">
<parameter key="service-name" value="Task Scheduler"/>
</service>
<monitor service="Windows-Task-Scheduler" class-name="org.opennms.netmgt.poller.monitors.Win32ServiceMonitor"/>
5.6.54. WsManMonitor
This monitor can be used to issue a WS-Man Get command and validate the results using a SPEL expression. This monitor implements placeholder substitution in parameter values.
Monitor facts
Class Name |
|
Remote Enabled |
false |
Configuration and Usage
Parameter | Description | Required | Default value | Placeholder substitution |
---|---|---|---|---|
|
Resource URI |
required |
|
No |
|
SPEL expression applied against the result of the Get |
required |
|
Yes |
|
Used to filter the result set. All selectors must prefixed with |
optional |
|
No |
This monitor implements the Common Configuration Parameters.
Examples
The following monitor will issue a Get against the configured resource and verify that the correct service tag is returned:
<service name="WsMan-ServiceTag-Check" interval="300000" user-defined="false" status="on">
<parameter key="resource-uri" value="http://schemas.dell.com/wbem/wscim/1/cim-schema/2/root/dcim/DCIM_ComputerSystem"/>
<parameter key="selector.CreationClassName" value="DCIM_ComputerSystem"/>
<parameter key="selector.Name" value="srv:system"/>
<parameter key="rule" value="#IdentifyingDescriptions matches '.*ServiceTag' and #OtherIdentifyingInfo matches 'C7BBBP1'"/>
</service>
<monitor service="WsMan-ServiceTag-Check" class-name="org.opennms.netmgt.poller.monitors.WsManMonitor/>
5.6.55. XmpMonitor
The XMP monitor tests for XMP service/agent availability by establishing an XMP session and querying the target agent’s sysObjectID variable contained in the Core MIB. The service is considered available when the session attempt succeeds and the agent returns its sysObjectID without error.
Monitor facts
Class Name |
|
Remote Enabled |
false |
Configuration and Usage
These parameters can be set in the XMP service entry in collectd-configuration.xml and will override settings from xmp-config.xml. Also, don’t forget to add an entry in response-graph.properties so that response values will be graphed.
Parameter | Description | Required | Default value |
---|---|---|---|
|
Time in milliseconds to wait for a successful session. |
optional |
|
|
The authenUser parameter for use with the XMP session. |
optional |
|
|
TCP port to connect to for XMP session establishment |
optional |
|
|
Name of MIB to query |
optional |
|
|
Name of MIB object to query |
optional |
|
This monitor implements the Common Configuration Parameters.
Examples
<service name="XMP" interval="300000" user-defined="false" status="on">
<parameter key="timeout" value="3000"/>
<parameter key="rrd-repository" value="/opt/opennms/share/rrd/response"/>
<parameter key="rrd-base-name" value="xmp"/>
<parameter key="ds-name" value="xmp"/>
</service>
<monitor service="XMP" class-name="org.opennms.netmgt.poller.monitors.XmpMonitor"/>
reports=icmp, \
xmp, \ . . . .
report.xmp.name=XMP
report.xmp.columns=xmp
report.xmp.type=responseTime
report.xmp.command=--title="XMP Response Time" \
--vertical-label="Seconds" \
DEF:rtMills={rrd1}:xmp:AVERAGE \
DEF:minRtMills={rrd1}:xmp:MIN \
DEF:maxRtMills={rrd1}:xmp:MAX \
CDEF:rt=rtMills,1000,/ \
CDEF:minRt=minRtMills,1000,/ \
CDEF:maxRt=maxRtMills,1000,/ \
LINE1:rt#0000ff:"Response Time" \
GPRINT:rt:AVERAGE:" Avg \\: %8.2lf %s" \
GPRINT:rt:MIN:"Min \\: %8.2lf %s" \
GPRINT:rt:MAX:"Max \\: %8.2lf %s\\n"
5.7. Application Perspective Monitoring
With OpenNMS Horizon Application Perspective Monitoring you can see the availability of a service hosted in Houston that is accessed in Seattle from your central location in New York. If a service outage occurs, understanding the perspective from which that outage was monitored makes it easier to troubleshoot the problem.
Application Perspective Monitoring uses the Minion infrastructure to monitor a service’s availability from these different perspectives. When a service is not responsive an outage will be generated with the corresponding perspective.
The service monitor configuration is looked up in the poller-configuration.xml
.
Use the perspective-only flag in the package definition to define packages that will be used only for Application Perspective Monitoring.
|
5.7.1. Configuring Application Perspective Monitoring
Application Perspective Monitoring requires at least one OpenNMS Horizon Minion on your network. Refer to the Installation Guide for more information.
To configure Application Perspective Monitoring, create an application and then associate a set of services and perspectives from which to monitor those services with the application.
-
Login to the web UI.
-
Click the gear icon and select Manage Applications.
-
Specify a name for the application and click Add New Application.
Figure 33. Create a new application -
Click the edit icon.
-
In the upper section, select the services you want to monitor from perspective locations with this application.
-
In the lower section, select the perspective locations from which to monitor the specified services.
After configuring the application, Minions at the perspective locations start to monitor the services associated with this application. The next figure shows an HTTP outage noticed from all perspective locations and the OpenNMS Horizon poller daemon itself.
The Perspective column shows the perspective location from which a Minion has detected this outage. An empty Perspective column indicates that the normal process detected the outage: either the OpenNMS Horizon instance detected it in the default location or a Minion detected it in the corresponding node’s location.
6. Performance Management
OpenNMS Horizon collects performance data using the Collectd daemon, which is enabled by default. Collectd schedules data collection on OpenNMS Horizon entities (currently nodes and interfaces), using management agents and protocol-specific collectors (SNMP, HTTPS, JMX, JDBC, etc.) to collect performance metrics. Each collector has its own associated configuration that defines parameters for the collector.
By default, data collection is enabled for SNMP and for OpenNMS-JVM (to monitor itself through JMX).
Data collection works out of the box with SNMP, provided you have your SNMP community string configured properly.
The default value of the community string is public
.
If your community string is different, you need to change the value:
-
Login to the web UI.
-
Go to admin>Configure OpenNMS.
-
In the Provisioning section, select Configure SNMP Community Names by IP Address.
-
Under v1/v2c specific parameters change the
Read Community String
value and click Save Config.
Performance data collection on other protocols (HTTPS, JMX, JDBC, etc.), requires additional configuration. You may also want to change how collectd works: when, how, and what data it collects.
Learn how to manage performance data collection:
-
collectd administration (logging, graphing, and event properties)
6.1. Configuring Collectd
The collectd-configuration.xml file defines the nodes, services and parameters on which collectd collects metrics. It also specifies the list of available collectors.
The file is located in $OPENNMS_HOME
.
Edit the collectd-configuration.xml file to:
In addtion to editing the collectd-configuration.xml, you need to configure collectors for the protocols from which you want to collect data by editing the configuration files associated with them.
6.1.1. Setting the Thread Pool
A globally defined thread attribute limits the number of threads the data collection process uses in parallel.
Increase or decrease this value based on your network and the size of your server by changing the value in $OPENNMS_HOME/etc/collectd-configuration.xml
:
<collectd-configuration
threads="50">
6.1.2. Configuring Collector Packages
Collector packages in the collectd-configuration.xml file contain the information (IP addresses, interfaces, services, and connection parameters) that collectd needs to activate data collection.
Collectd activates data collection for each node that contains an IP interface in the configured range and also contains any of the services listed in the package associated with the selected IP interface.
Edit existing collector packages or create new ones to customize data collection for your needs. If you create a new collector package, we recommend copying and pasting an existing package in the collectd-configuration.xml to use as a template.
A collector package has two categories of information to edit or specify:
At a minimum, collector package attributes include a package name and a filter that specifies the interfaces to include in the collector package:
<package name="cassandra-via-jmx" remote="false">
<filter>IPADDR != '0.0.0.0'</filter>
Note that remote="false"
means that the services in this package are tested only from the OpenNMS core system itself and not from a different remote location.
Each package must have a filter tag that performs the initial test to see if an interface should be included in a package.
Filters operate on interfaces (not nodes).
Each package can have only one filter
tag.
The following tags are also available for an interface filter:
Tag |
Description |
Example |
|
Specify an actual IP address to include in the package. |
|
|
Specify a range of IP addresses to include in a package. |
|
|
Specify a range of IP addresses to exclude in a package.
This will override an |
|
|
Specify a file that contains a list of #IP addresses to include. |
|
The following example illustrates collector package attributes that use some of these additional tags:
<package name="example1" remote="false">(1)
<filter>IPADDR != '0.0.0.0'</filter>(2)
<include-range begin="1.1.1.1" end="254.254.254.254"/>(3)
<include-range begin="::1" end="ffff:ffff:ffff:ffff:ffff:ffff:ffff:ffff"/>(4)
1 | Unique name of the collection package. |
2 | Apply this package to all IP interfaces with a configured IPv4 address (not equal 0.0.0.0 ) |
3 | Evaluate IPv4 rule to collect for all IPv4 interfaces in the given range |
4 | Evaluate IPv6 rule to collect for all IPv6 interfaces in the given range |
Service configuration attributes define the collector to use and which performance metrics to collect. Each service is associated with a specific collector; the collector and its related Java class must appear at the bottom of the collectd-configuration.xml file:
<service name="SNMP"(1)
interval="300000"(2)
user-defined="false"(3)
status="on">(4)
<parameter key="collection" value="default"/>(5)
<parameter key="thresholding-enabled" value="true"/>(6)
</service>
<collector service="SNMP" class-name="org.opennms.netmgt.collectd.SnmpCollector"/>(7)
1 | Service configuration name, which is mapped to a specific collector. |
2 | The interval at which to collect the service (in milliseconds). |
3 | Marker to say if service is user defined (used for UI purposes). |
4 | Service is collected only if "on". |
5 | Assign the performance data collection schema named default (found in the corresponding configuration file for the type of collection, in this case datacollection-config.xml). |
6 | Enable threshold evaluation for metrics provided by this service. |
7 | Run the SnmpCollector implementation for the service named SNMP . |
The following table lists service attributes common to all services. For a list of collector-specific parameters and their default values, refer to the specific collector listed in the Collectors section.
Attribute |
Description |
|
Service name |
|
Polling interval, in milliseconds (5 minutes by default). |
|
Set to "true" if user defined the collection source in the UI. |
|
Indicates that data collection for the service is on or off. |
6.1.3. Guidelines for Collector Packages
You can configure multiple packages, and an interface can exist in more than one package. This gives great flexibility in determining the service levels for a given device.
When IP interfaces match multiple collector packages with the same service configuration, collectd applies the last collector package on the service:
-
Use this "final" collector package as a less-specific, catch-all filter for a default configuration.
OR
-
Use it as a more-specific collector package to overwrite the default setting.
Meta-Data-DSL
Metadata-DSL allows you to use dynamic configuration in each parameter value to interpolate metadata into a parameter. The syntax allows for the use of patterns in an expression, whereby the metadata is replaced with a corresponding value during the collection process.
During evaluation of an expression the following scopes are available:
-
Node metadata
-
Interface metadata
-
Service metadata
6.2. Configuring Collectors
Collectors collect performance data via specific agents and protocols. This section includes the following information for each collector:
-
collector-specific parameters (used in the collectd-configuration.xml file)
-
configuration file(s)
Understanding resource types helps when editing collector-specific configuration files. |
6.3. Collectors
6.3.1. SnmpCollector
The SnmpCollector collects performance data through the SNMP protocol.
Configure access to the SNMP Agent through the SNMP configuration in the Web UI (Admin>Configure SNMP Community Names by IP Address
).
Collector Facts
Class Name |
|
Package |
core |
Supported on Minion |
|
Collector Parameters
Use these parameters used in the collectd-configuration.xml file.
Parameter | Description | Required | Default value |
---|---|---|---|
|
The name of the SNMP Collection to use. |
required |
|
|
Whether collected performance data should be tested against thresholds. |
optional |
|
|
Timeout in milliseconds to wait for SNMP responses. |
optional |
SNMP configuration |
SNMP Collection Configuration
Understanding resource types helps when editing collector-specific configuration files. |
Define SNMP Collection in etc/datacollection-config.xml
and etc/datacollection.d/*.xml
.
<?xml version="1.0"?>
<datacollection-config rrd-repository="/var/lib/opennms/rrd/snmp/">(1)
<snmp-collection name="default"(2)
snmpStorageFlag="select">(3)
<rrd step="300">(4)
<rra>RRA:AVERAGE:0.5:1:2016</rra>
<rra>RRA:AVERAGE:0.5:12:1488</rra>
<rra>RRA:AVERAGE:0.5:288:366</rra>
<rra>RRA:MAX:0.5:288:366</rra>
<rra>RRA:MIN:0.5:288:366</rra>
</rrd>
<include-collection dataCollectionGroup="MIB2"/>(5)
<include-collection dataCollectionGroup="3Com"/>
...
<include-collection dataCollectionGroup="VMware-Cim"/>
</snmp-collection>
</datacollection-config>
1 | Directory where to persist RRD files on the file system, ignored if NewTS is used as time-series storage. |
2 | Name of the SNMP data collection referenced in the collection package in collectd-configuration.xml . |
3 | Configure SNMP MIB-II interface metric collection behavior: all means collect metrics from all interfaces, primary only from interface provisioned as primary interface, select only from manually selected interfaces from the Web UI. |
4 | RRD archive configuration for this set of performance metrics, ignored when NewTS is used as time series storage. |
5 | Include device- or application-specific performance metric OIDS to collect. |
SnmpCollectorNG
The SnmpCollectorNG
provides an alternate implementation to the SnmpCollector
that takes advantages of new APIs in the platform.
It is provided as a separate collector while we work to validate its functionality and run-time characteristics, with the goal of eventually having it replace the SnmpCollector
.
Use this new collector by updating existing references from org.opennms.netmgt.collectd.SnmpCollector
to org.opennms.netmgt.collectd.SnmpCollectorNG
.
Known caveats include:
-
No support for alias type resources
-
No support for min/max values
6.3.2. JmxCollector
The JmxCollector collects performance data via JMX. Attributes are extracted from the available MBeans.
Collector Facts
Class Name |
|
Package |
core |
Supported on Minion |
|
Collector Parameters
Use these parameters in the collectd-configuration.xml file.
Parameter | Description | Required | Default value |
---|---|---|---|
|
The name of the JMX Collection to use. |
required |
(none) |
|
Whether collected performance data should be tested against thresholds |
optional |
|
|
Number of retries |
optional |
|
|
Name of the path in which the metrics should be stored |
optional |
Value of the port, or 'jsr160' if no port is set. |
|
The password strategy to use.
Supported values are: |
optional |
|
|
The connection url, e.g., |
optional |
(none) |
|
The username if authentication is required. |
optional |
(none) |
|
The password if authentication is required. |
optional |
(none) |
|
Deprecated. JMX port. |
optional |
|
|
Deprecated. Protocol used in the |
optional |
|
|
Deprecated. Path used in |
optional |
|
|
Deprecated. RMI port. |
optional |
|
|
Deprecated. Use an alternative |
optional |
|
The deprecated parameters port , protocol , urlPath , rmiServerPort and remoteJMX should be replaced with the url parameter.
If url is not defined the collector falls back to legacy mode and the deprecated parameters are used instead to build the connection url.
|
If a service requires different configuration, an entry in $OPENNMS_HOME/etc/jmx-config.xml can overwrite it.
|
JMX Collection Configuration
Understanding resource types helps when editing collector-specific configuration files. |
Define JMX Collections in etc/jmx-datacollection-config.xml
and etc/jmx-datacollection-config.d/
.
This snippet provides a collection definition named opennms-poller
:
<jmx-collection name="opennms-poller">
<rrd step="300">
<rra>RRA:AVERAGE:0.5:1:2016</rra>
<rra>RRA:AVERAGE:0.5:12:1488</rra>
<rra>RRA:AVERAGE:0.5:288:366</rra>
<rra>RRA:MAX:0.5:288:366</rra>
<rra>RRA:MIN:0.5:288:366</rra>
</rrd>
<mbeans>
<mbean name="OpenNMS Pollerd" objectname="OpenNMS:Name=Pollerd">
<attrib name="NumPolls" alias="ONMSPollCount" type="counter"/>
</mbean>
</mbeans>
</jmx-collection>
Once added to etc/jmx-datacollection-config.xml
you can test it using the collect
command available in the Karaf Shell:
opennms:collect org.opennms.netmgt.collectd.Jsr160Collector 127.0.0.1 collection=opennms-poller port=18980
Generic Resource Type
To support wildcard (*) in objectname, JMX collector supports generic resource types. JMX configuration requires two changes for this to work:
-
Create a custom resource type in
etc/resource-types.d/
. For example, there is already a definition injmx-resource.xml
that defines a custom resource for Kafka lag
<resource-types>
<resourceType name="kafkaLag" label="Kafka Lag"
resourceLabel="${index}">
<persistenceSelectorStrategy class="org.opennms.netmgt.collection.support.PersistAllSelectorStrategy"/>
<storageStrategy class="org.opennms.netmgt.dao.support.SiblingColumnStorageStrategy">
<parameter key="sibling-column-name" value="name" />
</storageStrategy>
</resourceType>
</resource-types>
-
Match the resourceType name as
resource-type
in MBean definition:
<mbean name="org.opennms.core.ipc.sink.kafka.heartbeat" resource-type="kafkaLag" objectname="org.opennms.core.ipc.sink.kafka:name=OpenNMS.Sink.*.Lag">
<attrib name="Value" alias="Lag" type="gauge"/>
</mbean>
Resource definition
JMX objectname is the full name of MBean in form of ( domain:key=value, key=value, ..)
.
Wildcard (*)
can exist anywhere in the objectname.
Depending on wildcard definition, use SiblingColumnStorageStrategy
to extract resource label.
If wildcard exists in the value (usual case), use corresponding key
as the sibling-column-name
parameter. For example:
org.apache.activemq:BrokerName=*,Type=Queue,Destination=com.mycompany.myqueue
Here BrokerName
can be defined as parameter for SiblingColumnStorageStrategy
<parameter key="sibling-column-name" value="BrokerName" />
The extracted BrokerNames from the wildcard will be the resource folders in the form of nodeId/resourceTypeName/{resource-label}
Wildcard may exist in domain as well. For example: org.apache.*:BrokerName=trap, Type=Queue
.
Then domain
can be defined as the sibling-column-name
parameter.
<parameter key="sibling-column-name" value="domain" />
To use the objectname
itself as a resource label, use IndexStorageStrategy
as storageStrategy in resource-type
definition.
Third-Party JMX Services
Some Java applications provide their own JMX implementation and require certain libraries to be present on the classpath, e.g., the Java application server Wildfly. To successfully collect data, you may need to do the following:
-
Place the jmx client lib to the $OPENNMS_HOME/lib folder (e.g., jboss-cli-client.jar)
-
Configure the collection accordingly (see above)
-
Configure the JMX-Collector in collectd-configuration.xml (see below)
<service name="JMX-WILDFLY" interval="300000" user-defined="false" status="on">
<parameter key="url" value="service:jmx:http-remoting-jmx://${ipaddr}:9990"/>
<parameter key="retry" value="2"/>
<parameter key="timeout" value="3000"/>
<parameter key="factory" value="PASSWORD-CLEAR"/>
<parameter key="username" value="admin"/>
<parameter key="password" value="admin"/>
<parameter key="rrd-base-name" value="java"/>
<parameter key="collection" value="jsr160"/>
<parameter key="thresholding-enabled" value="true"/>
<parameter key="ds-name" value="jmx-wildfly"/>
<parameter key="friendly-name" value="jmx-wildfly"/>
</service>
<collector service="JMX-WILDFLY" class-name="org.opennms.netmgt.collectd.Jsr160Collector"/>
6.3.3. HttpCollector
The HttpCollector collects performance data via HTTP and HTTPS. Attributes are extracted from the HTTP responses using a regular expression.
Collector Facts
Class Name |
|
Package |
core |
Supported on Minion |
|
Collector Parameters
Use these parameters in the collectd-configuration.xml file.
Parameter | Description | Required | Default value |
---|---|---|---|
|
The name of the HTTP Collection to use. |
required |
(none) |
|
Whether collected performance data should be tested against thresholds. |
optional |
|
|
Override the default port in all of the URIs |
optional |
80 |
|
Connection and socket timeout in milliseconds |
optional |
3000 |
|
Number of retries |
optional |
2 |
|
Should the system-wide proxy settings be used? Configure system proxy settings via system properties |
optional |
|
HTTP Collection Configuration
Understanding resource types helps when editing collector-specific configuration files. |
Define HTTP Collections in etc/http-datacollection-config.xml
.
This snippet provides a collection definition named opennms-copyright
:
<http-collection name="opennms-copyright">
<rrd step="300">
<rra>RRA:AVERAGE:0.5:1:2016</rra>
<rra>RRA:AVERAGE:0.5:12:1488</rra>
<rra>RRA:AVERAGE:0.5:288:366</rra>
<rra>RRA:MAX:0.5:288:366</rra>
<rra>RRA:MIN:0.5:288:366</rra>
</rrd>
<uris>
<uri name="login-page">
<url path="/opennms/login.jsp"
matches=".*2002\-([0-9]+).*" response-range="100-399" dotall="true" >
</url>
<attributes>
<attrib alias="copyrightYear" match-group="1" type="gauge"/>
</attributes>
</uri>
</uris>
</http-collection>
Once added to etc/http-datacollection-config.xml
you can test it using the collect
command available in the Karaf Shell:
opennms:collect org.opennms.netmgt.collectd.HttpCollector 127.0.0.1 collection=opennms-copyright port=8980
6.3.4. JdbcCollector
The JdbcCollector collects performance data via JDBC drivers. Attributes are retrieved using SQL queries.
Collector Facts
Class Name |
|
Package |
core |
Supported on Minion |
|
Limitations on Minion
When running on Minion the data sources in opennms-datasources.xml
cannot be referenced.
Instead, you must set the JDBC connection settings using the service parameters.
Also, the JDBC driver must be properly loaded in the Minion container (see Installing JDBC drivers in Minion) By default, only the JDBC driver for PostgreSQL is available.
Collector Parameters
Use these parameters in the collectd-configuration.xml file.
Parameter | Description | Required | Default value |
---|---|---|---|
|
The name of the JDBC Collection to use. |
required |
(empty) |
|
Use an existing datasource defined in opennms-datasources.xml |
optional |
NO_DATASOURCE_FOUND |
|
Driver class name |
optional |
org.postgresql.Driver |
|
JDBC URL |
optional |
jdbc:postgresql://:OPENNMS_JDBC_HOSTNAME/opennms |
|
JDBC username |
optional |
postgres |
|
JDBC password |
optional |
(empty string) |
JDBC Collection Configuration
Understanding resource types helps when editing collector-specific configuration files. |
Define JDBC Collections in etc/jdbc-datacollection-config.xml
.
This snippet provided a collection definition named opennms-stats
:
<jdbc-collection name="opennms-stats">
<rrd step="300">
<rra>RRA:AVERAGE:0.5:1:2016</rra>
<rra>RRA:AVERAGE:0.5:12:1488</rra>
<rra>RRA:AVERAGE:0.5:288:366</rra>
<rra>RRA:MAX:0.5:288:366</rra>
<rra>RRA:MIN:0.5:288:366</rra>
</rrd>
<queries>
<query name="opennmsQuery" ifType="ignore">
<statement data-source="opennms">
<queryString>select count(*) as event_count from events;</queryString>
</statement>
<columns>
<column name="event_count" data-source-name="event_count" alias="event_count" type="GAUGE"/>
</columns>
</query>
</queries>
</jdbc-collection>
Once added to etc/jdbc-datacollection-config.xml
you can test it using the collect
command available in the Karaf Shell:
opennms:collect org.opennms.netmgt.collectd.JdbcCollector 127.0.0.1 collection=opennms-stats data-source=opennms
To test this same collection on Minion you must specify the JDBC settings as service attributes, for example:
opennms:collect -l MINION org.opennms.netmgt.collectd.JdbcCollector 127.0.0.1 collection=opennms-stats driver=org.postgresql.Driver url=jdbc:postgresql://localhost:5432/opennms user=opennms password=opennms
6.3.5. NSClientCollector
The NSClientCollector collects performance data over HTTP from NSClient++.
Collector Facts
Class Name |
|
Package |
opennms-plugin-protocol-nsclient |
Supported on Minion |
|
Collector Parameters
Use these parameters in the collectd-configuration.xml file).
Parameter | Description | Required | Default value |
---|---|---|---|
|
The name of the NSClient Collection to use. |
optional |
default |
6.3.6. PrometheusCollector
The PrometheusCollector collects performance metrics via HTTP(S) using the text-based Prometheus Exposition format. This has been adopted by many applications and is in the process of being standardized in the OpenMetrics project.
This collector provides tools for parsing and mapping the metrics to the collection model used by OpenNMS Horizon.
Collector Facts
Class Name |
|
Package |
core |
Supported on Minion |
|
Collector Parameters
Parameter | Description | Required | Default value |
---|---|---|---|
|
The name of the Prometheus Collection to use |
required |
|
|
HTTP URL to query for the metrics |
required |
|
|
HTTP socket and read timeout in milliseconds |
optional |
10000 (10 seconds) |
|
Number of retries before failing |
optional |
2 |
`header-* |
Optional headers to pass in the HTTP request |
optional |
(none) |
Prometheus Collector Usage
Let’s demonstrate the usage of the collector with an example running against node_exporter.
Obtain a copy of the appropriate release binary from the node_exporter release page.
Extract and start the service:
$ tar xvf node_exporter-0.18.1.linux-amd64.tar.gz
$ ./node_exporter-0.18.1.linux-amd64/node_exporter
INFO[0000] Starting node_exporter (version=0.18.1, branch=HEAD, revision=3db77732e925c08f675d7404a8c46466b2ece83e) source="node_exporter.go:156"
INFO[0000] Build context (go=go1.12.5, user=root@b50852a1acba, date=20190604-16:41:18) source="node_exporter.go:157"
INFO[0000] Enabled collectors: source="node_exporter.go:97"
INFO[0000] - arp source="node_exporter.go:104"
INFO[0000] - bcache source="node_exporter.go:104"
INFO[0000] - bonding source="node_exporter.go:104"
INFO[0000] - conntrack source="node_exporter.go:104"
INFO[0000] - cpu source="node_exporter.go:104"
INFO[0000] - cpufreq source="node_exporter.go:104"
...
INFO[0000] - uname source="node_exporter.go:104"
INFO[0000] - vmstat source="node_exporter.go:104"
INFO[0000] - xfs source="node_exporter.go:104"
INFO[0000] - zfs source="node_exporter.go:104"
INFO[0000] Listening on :9100 source="node_exporter.go:170"
From the Karaf Shell, you can now issue an ad hoc collection request against the node_exporter
process
admin@opennms> opennms:collect org.opennms.netmgt.collectd.prometheus.PrometheusCollector 127.0.0.1 collection=node_exporter url='http://127.0.0.1:9100/metrics'
NOTE: Some collectors require a database node and IP interface.
NodeLevelResource[nodeId=0,path=null]
Group: node_exporter_loadavg
Attribute[load1:1.26]
Attribute[load15:1.0]
Attribute[load5:0.59]
Group: node_exporter_memory
Attribute[Active_anon_bytes:1.1776770048E10]
Attribute[Active_bytes:2.4471535616E10]
Attribute[Active_file_bytes:1.2694765568E10]
Update the IP addresses in the command as necessary.
Prometheus Collector Configuration
Prometheus collection definitions are maintained in etc/prometheus-datacollection.d/
.
Let’s look at an excerpt of the node_exporter
collection:
<!--
node_memory_Active 1.3626548224e+10
node_memory_Active_anon 6.314020864e+09
node_memory_Active_file 7.31252736e+09
...
node_memory_HugePages_Free 0
...
-->
<group name="node_exporter_memory"
resource-type="node"
filter-exp="name matches 'node_memory_.*'">
<numeric-attribute alias-exp="name.substring('node_memory_'.length())"/>
</group>
This group definition matches metrics that start the node_memory_
prefix, extracts the suffix as the metric name and associates these metrics with the node_exporter_memory
group in the node-level resource.
Expression are written in Spring Expression Language (SpEL).
The metric instances are used as the expression context, which means you have access to the name
and label
properties.
Here’s another excerpt where we extract metrics grouped by CPU:
<!--
node_cpu{cpu="cpu0",mode="guest"} 0
node_cpu{cpu="cpu0",mode="idle"} 16594.88
...
node_cpu{cpu="cpu1",mode="guest"} 0
node_cpu{cpu="cpu1",mode="idle"} 17790.51
-->
<group name="node_exporter_cpus"
resource-type="nodeExporterCPU"
filter-exp="name matches 'node_cpu'"
group-by-exp="labels[cpu]">
<numeric-attribute alias-exp="labels[mode]"/>
</group>
This group definition matches metrics called 'node_cpu', groups them by the value of the cpu
label and extracts the name of the mode
for the name of the numeric attributes.
6.3.7. TcaCollector
The TcaCollector collects special SNMP data from Juniper TCA Devices.
Collector Facts
Class Name |
|
Package |
opennms-plugin-collector-juniper-tca |
Supported on Minion |
|
Collector Parameters
Use these parameters in the collectd-configuration.xml file.
Parameter | Description | Required | Default value |
---|---|---|---|
|
The name of the TCA Collection to use. |
required |
6.3.8. VmwareCimCollector
The VmwareCimCollector collects ESXi host and sensor metrics from vCenter.
Collector Facts
Class Name |
|
Package |
core |
Supported on Minion |
|
Collector Parameters
Use these parameters in the collectd-configuration.xml file.
Parameter | Description | Required | Default value |
---|---|---|---|
|
The name of the VMWare CIM Collection to use. |
required |
|
|
Connection timeout in milliseconds |
optional |
6.3.9. VmwareCollector
The VmwareCollector collects peformance metrics for managed entities from vCenter.
Collector Facts
Class Name |
|
Package |
core |
Supported on Minion |
|
Collector Parameters
Use these parameters in the collectd-configuration.xml file.
Parameter | Description | Required | Default value |
---|---|---|---|
|
The name of the VMWare Collection to use. |
required |
|
|
Connection timeout in milliseconds |
optional |
6.3.10. WmiCollector
The WmiCollector collects peformance metrics from Windows systems using Windows Management Instrumentation (WMI).
Collector Facts
Class Name |
|
Package |
core |
Supported on Minion |
|
Collector Parameters
Use these parameters in the collectd-configuration.xml file.
Parameter | Description | Required | Default value |
---|---|---|---|
|
The name of the WMI Collection to use. |
required |
6.3.11. WsManCollector
The WsManCollector collects peformance metrics using the Web Services-Management (WS-Management) protocol.
Web Services-Management (WS-Management) is a DMTF open standard defining a SOAP-based protocol for the management of servers, devices, applications and various Web services. Windows Remote Management (WinRM) is the Microsoft implementation of WS-Management Protocol.
Collector Facts
Class Name |
|
Package |
core |
Supported on Minion |
|
Collector Parameters
Use these parameters in the collectd-configuration.xml file.
Parameter | Description | Required | Default value |
---|---|---|---|
|
The name of the WS-Man Collection to use. |
required |
WS-Management Setup
Before setting up OpenNMS Horizon to communicate with a WS-Management agent, you should confirm that it is properly configured and reachable from the OpenNMS Horizon system. If you need help enabling the WS-Management agent, consult the documentation from the manufacturer. Here are some resources that could help:
We suggest using the Openwsman command line client to validate authentication and connectivity.
Packages are available for most distributions under wsmancli
.
For example:
wsman identify -h localhost -P 5985 -u wsman -p secret
Once validated, add the agent-specific details to the OpenNMS Horizon configuration, defined in the next section.
Troubleshooting and Commands
For troubleshooting there is a set of commands you can use in Powershell verified on Microsoft Windows Server 2012.
Enable-PSRemoting
netsh advfirewall firewall add rule name="WinRM-HTTP" dir=in localport=5985 protocol=TCP action=allow
netsh advfirewall firewall add rule name="WinRM-HTTPS" dir=in localport=5986 protocol=TCP action=allow
winrm id
winrm get winrm/config
winrm e winrm/config/listener
nc -z -w1 <windows-server-ip-or-host> 5985;echo $?
Use BasicAuthentication just with WinRM over HTTPS with verifiable certificates in production environment. |
winrm set winrm/config/client/auth '@{Basic="true"}'
winrm set winrm/config/service/auth '@{Basic="true"}'
winrm set winrm/config/service '@{AllowUnencrypted="true"}'
WS-Management Agent Configuration
Understanding resource types helps when editing collector-specific configuration files. |
The agent-specific configuration details are maintained in etc/wsman-config.xml
.
This file has a similar structure as etc/snmp-config.xml
, which the reader may already be familiar with.
This file is consulted when a connection to a WS-Man Agent is made.
If the IP address of the agent is matched by the range
, specific
or ip-match
elements of a definition, then the attributes on that definition are used to connect to the agent.
Otherwise, the attributes on the outer wsman-config
definition are used.
This etc/wsman-config.xml
files automatically reload when modified.
Here is an example with several definitions:
<?xml version="1.0"?>
<wsman-config retry="3" timeout="1500" ssl="true" strict-ssl="false" path="/wsman">
<definition ssl="true" strict-ssl="false" path="/wsman" username="root" password="calvin" product-vendor="Dell" product-version="iDRAC 6">
<range begin="192.168.1.1" end="192.168.1.10"/>
</definition>
<definition ssl="false" port="5985" path="/wsman" username="Administrator" password="P@ssword">
<ip-match>172.23.1-4.1-255</ip-match>
<specific>172.23.1.105</specific>
</definition>
</wsman-config>
Attribute | Description | Default |
---|---|---|
|
HTTP Connection and response timeout in milliseconds. |
HTTP client default |
|
Number of retries on connection failure. |
|
|
Username for basic authentication. |
none |
|
Password used for basic authentication. |
none |
|
HTTP/S port |
Default for protocol |
|
Maximum number of elements to retrieve in a single request. |
no limit |
|
Enable SSL |
|
|
Enforce SSL certificate verification. |
|
|
Path in the URL to the WS-Management service. |
|
|
Used to overwrite the detected product vendor. |
none |
|
Used to overwrite the detected product version. |
none |
|
Enables GSS authentication. When enabled a reverse lookup is performed on the target IP address in order to determine the canonical host name. |
|
If you try to connect against Microsoft Windows Server make sure to set specific ports for WinRM connections.
By default Microsoft Windows Server uses port TCP/5985 for plain text and port TCP/5986 for SSL connections.
|
WS-Management Collection Configuration
Configuration for the WS-Management collector is stored in etc/wsman-datacollection-config.xml
and etc/wsman-datacollection.d/*.xml
.
The contents of these files are automatically merged and reloaded when changed.
The default WS-Management collection looks as follows:
|
<?xml version="1.0"?>
<wsman-datacollection-config rrd-repository="${install.share.dir}/rrd/snmp/">
<collection name="default">
<rrd step="300">
<rra>RRA:AVERAGE:0.5:1:2016</rra>
<rra>RRA:AVERAGE:0.5:12:1488</rra>
<rra>RRA:AVERAGE:0.5:288:366</rra>
<rra>RRA:MAX:0.5:288:366</rra>
<rra>RRA:MIN:0.5:288:366</rra>
</rrd>
<!--
Include all of the available system definitions
-->
<include-all-system-definitions/>
</collection>
</wsman-datacollection-config>
The magic happens with the <include-all-system-definitions/>
element which automatically includes all of the system definitions into the collection group.
If required, you can include a specific system-definition with <include-system-definition>sys-def-name</include-system-definition> .
|
System definitions and related groups can be defined in the root etc/wsman-datacollection-config.xml
file, but it is preferred that be added to a device specific configuration files in etc/wsman-datacollection-config.d/*.xml
.
Avoid modifying any of the distribution configuration files and create new ones to store you specific details instead. |
Here is an example configuration file for a Dell iDRAC:
<?xml version="1.0"?>
<wsman-datacollection-config>
<group name="drac-system"
resource-uri="http://schemas.dell.com/wbem/wscim/1/cim-schema/2/root/dcim/DCIM_ComputerSystem"
resource-type="node">
<attrib name="OtherIdentifyingInfo" index-of="#IdentifyingDescriptions matches '.*ServiceTag'" alias="serviceTag" type="String"/>
</group>
<group name="drac-power-supply"
resource-uri="http://schemas.dmtf.org/wbem/wscim/1/*"
dialect="http://schemas.microsoft.com/wbem/wsman/1/WQL"
filter="select InputVoltage,InstanceID,PrimaryStatus,SerialNumber,TotalOutputPower from DCIM_PowerSupplyView where DetailedState != 'Absent'"
resource-type="dracPowerSupplyIndex">
<attrib name="InputVoltage" alias="inputVoltage" type="Gauge"/>
<attrib name="InstanceID" alias="instanceId" type="String"/>
<attrib name="PrimaryStatus" alias="primaryStatus" type="Gauge"/>
<attrib name="SerialNumber" alias="serialNumber" type="String"/>
<attrib name="TotalOutputPower" alias="totalOutputPower" type="Gauge"/>
</group>
<system-definition name="Dell iDRAC (All Version)">
<rule>#productVendor matches '^Dell.*' and #productVersion matches '.*iDRAC.*'</rule>
<include-group>drac-system</include-group>
<include-group>drac-power-supply</include-group>
</system-definition>
</wsman-datacollection-config>
System Definitions
Rules in the system definition are written using SpEL expressions.
The expression has access to the following variables in its evaluation context:
Name | Type |
---|---|
(root) |
org.opennms.netmgt.model.OnmsNode |
agent |
org.opennms.netmgt.collection.api.CollectionAgent |
productVendor |
java.lang.String |
productVersion |
java.lang.String |
If a particular agent is matched by any of the rules, then the collector will attempt to collect the referenced groups from the agent.
Group Definitions
Groups are retrieved by issuing an Enumerate command against a particular Resource URI
and parsing the results.
The Enumerate commands can include an optional filter
in order to filter the records and attributes that are returned.
When configuring a filter, you must also specify the dialect. |
The resource type used by the group must of be of type node
or a generic resource type.
Interface level resources are not supported.
When using a generic resource type, the IndexStorageStrategy
cannot be used since records have no implicit index.
Instead, you must use an alternative such as the SiblingColumnStorageStrategy
.
If a record includes a multi-valued key, you can collect the value at a specific index with an index-of
expression.
This is best demonstrated with an example. Let`s assume we wanted to collect the ServiceTag
from the following record:
<IdentifyingDescriptions>CIM:GUID</IdentifyingDescriptions>
<IdentifyingDescriptions>CIM:Tag</IdentifyingDescriptions>
<IdentifyingDescriptions>DCIM:ServiceTag</IdentifyingDescriptions>
<OtherIdentifyingInfo>45454C4C-3700-104A-8052-C3C01BB25031</OtherIdentifyingInfo>
<OtherIdentifyingInfo>mainsystemchassis</OtherIdentifyingInfo>
<OtherIdentifyingInfo>C8BBBP1</OtherIdentifyingInfo>
Specifying, the attribute name OtherIdentifyingInfo
would not be sufficient, since there are multiple values for that key.
Instead, we want to retrieve the value for the OtherIdentifyingInfo
key at the same index where IdentifyingDescriptions
is set to DCIM:ServiceTag
.
This can be achieved using the following attribute definition:
<attrib name="OtherIdentifyingInfo" index-of="#IdentifyingDescriptions matches '.*ServiceTag'" alias="serviceTag" type="String"/>
Special Attributes
A group can contain the placeholder attribute ElementCount
that, during collection, will be populated with the total number of results returned for that group.
This can be used to threshold on the number results returned by an enumeration.
<group name="Event-1234"
resource-uri="http://schemas.microsoft.com/wbem/wsman/1/wmi/root/cimv2/*"
dialect="http://schemas.microsoft.com/wbem/wsman/1/WQL"
filter="select * from Win32_NTLogEvent where LogFile = 'Some-Application-Specific-Logfile/Operational' AND EventCode = '1234'"
resource-type="node">
<attrib name="##ElementCount##" alias="elementCount" type="Gauge"/>
</group>
6.3.12. XmlCollector
The XmlCollector collects and extracts metrics from XML and JSON documents.
Collector Facts
Class Name |
|
Package |
core |
Supported on Minion |
|
Limitations on Minion
The following handlers are not currently supported on Minion:
-
DefaultJsonCollectionHandler
-
Sftp3gppXmlCollectionHandler
-
Sftp3gppVTDXmlCollectionHandler
Collector Parameters
Use these parameters in the collectd-configuration.xml file.
Parameter | Description | Required | Default value |
---|---|---|---|
|
The name of the XML Collection to use. |
required |
|
|
Class that performs the collection. |
optional |
|
The available handlers include:
-
org.opennms.protocols.xml.collector.DefaultXmlCollectionHandler
-
org.opennms.protocols.xml.collector.Sftp3gppXmlCollectionHandler
-
org.opennms.protocols.xml.vtdxml.DefaultVTDXmlCollectionHandler
-
org.opennms.protocols.xml.vtdxml.Sftp3gppVTDXmlCollectionHandler
-
org.opennms.protocols.json.collector.DefaultJsonCollectionHandler
-
org.opennms.protocols.http.collector.HttpCollectionHandler
XML Collection Configuration
Understanding resource types helps when editing collector-specific configuration files. |
XML Collections are defined in etc/xml-datacollection-config.xml
and etc/xml-datacollection/
.
This snippet provides a collection definition named xml-opennms-nodes
:
<xml-collection name="xml-opennms-nodes">
<rrd step="300">
<rra>RRA:AVERAGE:0.5:1:2016</rra>
<rra>RRA:AVERAGE:0.5:12:1488</rra>
<rra>RRA:AVERAGE:0.5:288:366</rra>
<rra>RRA:MAX:0.5:288:366</rra>
<rra>RRA:MIN:0.5:288:366</rra>
</rrd>
<xml-source url="http://admin:admin@{ipaddr}:8980/opennms/rest/nodes">
<request method="GET">
<parameter name="use-system-proxy" value="true"/>
</request>
<import-groups>xml-datacollection/opennms-nodes.xml</import-groups>
</xml-source>
</xml-collection
The request element can have the following child elements:
Parameter |
Description |
Required |
Default value |
|
The connection and socket timeout in milliseconds |
optional |
|
|
How often should the request be repeated in case of an error? |
optional |
0 |
|
Should the system-wide proxy settings be used? Configure the system proxy settings via system properties |
optional |
false |
The referenced opennms-nodes.xml
file contains:
<xml-groups>
<xml-group name="nodes" resource-type="node" resource-xpath="/nodes">
<xml-object name="totalCount" type="GAUGE" xpath="@totalCount"/>
</xml-group>
</xml-groups>
With the configuration in place, you can test it using the collect
command available in the Karaf Shell:
opennms:collect -n 1 org.opennms.protocols.xml.collector.XmlCollector 127.0.0.1 collection=xml-opennms-nodes
Caveats
The org.opennms.protocols.json.collector.DefaultJsonCollectionHandler
requires the fetched document to be single element of type object to make xpath query work.
If the root element is an array, it will be wrapped in an object, whereas the original array is accessible as /elements
.
6.3.13. XmpCollector
The XmpCollector collects peformance metrics via the X/Open Management Protocol API (XMP) protocol.
Collector Facts
Class Name |
|
Package |
opennms-plugin-protocol-xmp |
Supported on Minion |
|
Collector Parameters
Use these parameters in the collectd-configuration.xml file.
Parameter | Description | Required | Default value |
---|---|---|---|
|
The name of the XMP Collection to use. |
required |
|
|
The TCP port on which the agent communicates. |
required |
|
|
The username used for authenticating to the agent. |
optional |
(none) |
|
The timeout used when communicating with the agent. |
optional |
3000 |
|
The number of retries permitted when timeout expires. |
optional |
0 |
6.4. Resource Types
Resource types group sets of performance data measurements for persisting, indexing, and display in the web UI. Each resource type has a unique name, label definitions for display in the UI, and strategy definitions for archiving the measurements for long-term analysis.
There are two labels for a resource type.
The first, label
, defines a string to display in the UI.
The second, resourceLabel
, defines the template used when displaying each unique group of measurements name for the resource type.
There are two types of strategy definitions for resource types, persistence selector and storage strategies.
The persistence selector strategy filters the group indexes down to a subset for storage on disk.
The storage strategy is used to convert an index into a resource path label for persistence.
There are two special resource types that do not have a resource-type definition.
They are node
and ifIndex
.
Resource types can be defined inside files in either $OPENNMS_HOME/etc/resource-types.d
or $OPENNMS_HOME/etc/datacollection
, with the latter being specific for SNMP.
Here is the diskIOIndex resource type definition from $OPENNMS_HOME/etc/datacollection/netsnmp.xml
:
<resourceType name="diskIOIndex" label="Disk IO (UCD-SNMP MIB)" resourceLabel="${diskIODevice} (index ${index})">
<persistenceSelectorStrategy class="org.opennms.netmgt.collection.support.PersistRegexSelectorStrategy">
<parameter key="match-expression" value="not(#diskIODevice matches '^(loop|ram).*')" />
</persistenceSelectorStrategy>
<storageStrategy class="org.opennms.netmgt.dao.support.SiblingColumnStorageStrategy">
<parameter key="sibling-column-name" value="diskIODevice" />
<parameter key="replace-all" value="s/^-//" />
<parameter key="replace-all" value="s/\s//" />
<parameter key="replace-all" value="s/:\\.*//" />
</storageStrategy>
</resourceType>
6.4.1. Persistence Selector Strategies
Class | Description |
---|---|
org.opennms.netmgt.collection.support.PersistAllSelectorStrategy |
Persist All indexes |
org.opennms.netmgt.collection.support.PersistRegexSelectorStrategy |
Persist indexes based on JEXL evaluation |
PersistRegexSelectorStrategy
The PersistRegexSelectorStrategy class takes a single parameter, match-expression
, which defines a JEXL expressions.
On evaluation, this expression should return either true, persist index to storage, or false, discard data.
6.4.2. Storage Strategies
Class | Storage Path Value |
---|---|
org.opennms.netmgt.collection.support.IndexStorageStrategy |
Index |
org.opennms.netmgt.collection.support.JexlIndexStorageStrategy |
Value after JexlExpression evaluation |
org.opennms.netmgt.collection.support.ObjectNameStorageStrategy |
Value after JexlExpression evaluation |
org.opennms.netmgt.dao.support.FrameRelayStorageStrategy |
interface label + '.' + dlci |
org.opennms.netmgt.dao.support.HostFileSystemStorageStrategy |
Uses the value from the hrStorageDescr column in the hrStorageTable, cleaned up for unix filesystems. |
org.opennms.netmgt.dao.support.SiblingColumnStorageStrategy |
Uses the value from an SNMP lookup of OID in sibling-column-name parameter, cleaned up for unix filesystems. |
org.opennms.protocols.xml.collector.XmlStorageStrategy |
Index, but cleaned up for unix filesystems. |
IndexStorageStrategy
The IndexStorageStrategy takes no parameters.
JexlIndexStorageStrategy
The JexlIndexStorageStrategy takes two parameters, index-format
which is required, and clean-output
which is optional.
Parameter | Description |
---|---|
index-format |
The JexlExpression to evaluate |
clean-output |
Boolean to indicate whether the index value is cleaned up. |
If the index value will be cleaned up, then it will have all whitespace, colons, forward and back slashes, and vertical bars replaced with underscores. All equal signs are removed.
This class can be extended to create custom storage strategies by overriding the updateContext
method to set additional key/value pairs to use in your index-format
template.
public class ExampleStorageStrategy extends JexlIndexStorageStrategy {
private static final Logger LOG = LoggerFactory.getLogger(ExampleStorageStrategy.class);
public ExampleStorageStrategy() {
super();
}
@Override
public void updateContext(JexlContext context, CollectionResource resource) {
context.set("Example", resource.getInstance());
}
}
ObjectNameStorageStrategy
The ObjectNameStorageStrategy extends the JexlIndexStorageStrategy, so its requirements are the same. Extra key/values pairs are added to the JexlContext which can then be used in the index-format
template.
The original index string is converted to an ObjectName and can be referenced as ${ObjectName}
. The domain from the ObjectName can be referenced as ${domain}
. All key properties
from the ObjectName can also be referenced by ${key}
.
This storage strategy is meant to be used with JMX MBean datacollections where multiple MBeans can return the same set of attributes. As of OpenNMS Horizon 20, this is only supported using a HTTP to JMX proxy and using the XmlCollector as the JmxCollector does not yet support indexed groups.
Given an MBean like java.lang:type=MemoryPool,name=Survivor Space
, and a storage strategy like this:
<storageStrategy class="org.opennms.netmgt.collection.support.ObjectNameStorageStragegy">
<parameter key="index-format" value="${domain}_${type}_${name}" />
<parameter key="clean-output" value="true" />
</storageStrategy>
Then the index value would be java_lang_MemoryPool_Survivor_Space
.
FrameRelayStorageStrategy
The FrameRelayStorageStrategy takes no parameters.
HostFileSystemStorageStrategy
The HostFileSystemStorageStrategy takes no parameters. This class is marked as deprecated, and can be replaced with:
<storageStrategy class="org.opennms.netmgt.dao.support.SiblingColumnStorageStrategy">
<parameter key="sibling-column-name" value="hrStorageDescr" />
<parameter key="replace-first" value="s/^-$/_root_fs/" />
<parameter key="replace-all" value="s/^-//" />
<parameter key="replace-all" value="s/\\s//" />
<parameter key="replace-all" value="s/:\\\\.*//" />
</storageStrategy>
SiblingColumnStorageStrategy
Parameter | Description |
---|---|
sibling-column-name |
Alternate string value to use for index |
replace-first |
Regex Pattern, replaces only the first match |
replace-all |
Regex Pattern, replaces all matches |
Values for replace-first
, and replace-all
must match the pattern s/regex/replacement/ or an error will be thrown.
XmlStorageStrategy
This XmlStorageStrategy takes no parameters. The index value will have all whitespace, colons, forward and back slashes, and vertical bars replaced with underscores. All equal signs are removed.
6.5. SNMP Property Extenders
When collecting tabular numeric metrics from a given MIB table, it’s helpful to include one or more string properties from each conceptual row of the table in question.
These properties can be used in the resourceLabel
attribute of the resourceType
associated with the collected data.
When the string property exists as a column in the same table that contains the numeric metrics, it’s easy to associate the string to the correct resource by adding a mibObj
with the same instance
attribute and a type of string
.
For example, the Cisco ENVMON MIB’s temperature status table contains both a numeric gauge for the temperature value and a string describing the associated temperature sensor. A partial walk of this table illustrates this very direct relationship:
ciscoEnvMonTemperatureStatusIndex |
ciscoEnvMonTemperatureStatusDescr (.1.3.6.1.4.1.9.9.13.1.3.1.2) |
ciscoEnvMonTemperatureStatusValue (.1.3.6.1.4.1.9.9.13.1.3.1.3) |
---|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
To collect the ciscoEnvMonTemperatureStatusDescr
and ciscoEnvMonTemperatureStatusValue
columns within an SNMP data-collection group, all that’s needed is a resourceType
and a group
to hold the two mibObj
elements corresponding to these two columns.
The mibObj
aliases are shortened to maintain compatibility with storage engines that limit the length of column names to 19 characters.
<resourceType name="ciscoEnvMonTemperatureStatusIndex<1>" label="Cisco Temperature" resourceLabel="${cvmTempStatusDescr} (index ${index})">
<persistenceSelectorStrategy class="org.opennms.netmgt.collection.support.PersistAllSelectorStrategy"/>
<storageStrategy class="org.opennms.netmgt.collection.support.IndexStorageStrategy"/>
</resourceType>
...
<group name="cisco-temperature" ifType="all">
<mibObj oid=".1.3.6.1.4.1.9.9.13.1.3.1.2" instance="ciscoEnvMonTemperatureStatusIndex" alias="cvmTempStatusDescr" type="string"/>
<mibObj oid=".1.3.6.1.4.1.9.9.13.1.3.1.3" instance="ciscoEnvMonTemperatureStatusIndex" alias="cvmTempStatusValue" type="gauge"/>
</group>
Even in cases where the string property exists in a separate MIB table, it’s straightforward to include it as long as the "source" table uses an identical set of index variables.
For example, the ifXTable
augments the ifTable
, meaning the two tables use the same set of instance identifiers – namely ifIndex
.
Whether or not the MIB definition of the second table declares an AUGMENTS
relationship to the first table, objects from tables with this kind of relationship can be mixed in the same group.
In this contrived configuration example, ifDescr
(which is from ifTable
) is freely mixed with ifName
and ifAlias
(from ifXTable
):
<group name="mib2-string-properties-example" ifType="all">
<mibObj oid=".1.3.6.1.2.1.2.2.1.2" instance="ifIndex" alias="ifDescr" type="string"/>
<mibObj oid=".1.3.6.1.2.1.31.1.1.1.1" instance="ifIndex" alias="ifName" type="string"/>
<mibObj oid=".1.3.6.1.2.1.31.1.1.1.18" instance="ifIndex" alias="ifAlias" type="string"/>
</group>
Most SNMP property extenders make it possible to include string properties from a "source" MIB table that is indexed differently from the table containing most of the relevant data. For purposes of configuring property extenders, the table containing the majority of the data (and into which we want to include the string properties) is called the target table, and the table containing the string property is called the source table. Several different extenders are available; selecting the right one depends on the relationship between the target table and the source table.
A few property extenders also exist whose effect is strictly local to the "target" resource. These extenders are useful for dealing in partial indices and other similar operations that do not involve looking outside the target MIB table.
SNMP Property Extenders are used in the context of a property
element inside an SNMP data-collection group
parent element.
The property
element, when it appears, is a sibling of any mibObj
elements beneath the same parent group
.
The instance
and alias
attributes of the property
element are both required, and serve the same purpose as the same attributes of mibObj
.
The class-name
attribute of the property
element contains the full class name (including package) of the Property Extender class needed to join the source and target tables.
The property
element takes a number of parameter
child elements; these parameters are used to configure the property extender class named in class-name
.
Each extender class recognizes a different set of parameters.
6.5.1. Cisco CBQoS Property Extender
This property extender is used only in very specific circumstances.
When to Use Cisco CBQoS Property Extender
The Cisco CBQoS Property Extender is designed specifically and exclusively for the purpose of including string properties across ifXTable
and the several MIB tables that make up the Cisco Class-Based QoS MIB.
It is not useful for any other sets of target and source tables.
Configuring Cisco CBQoS Extended Properties
The complex relationships among the various Cisco CBQoS tables are encapsulated in the code of this property extender class.
As a result, this extender takes only a single parameter, target-property
, whose value must be one of policyName
, classMapName
, interfaceAlias
, or interfaceName
.
6.5.2. Enum Lookup Property Extender
The Enum Lookup property extender provides a mechanism that works like a lookup table for values of a local MIB table column.
When to use the Enum Lookup Property Extender
The Enum Lookup property extender may be used to map an enumerated set of integer values to a corresponding set of human-sensible textual values.
For example, the dot1dStpPortTable
contains two integer columns whose values reflect attributes of a port.
dot1dStpPortState OBJECT-TYPE (1)
SYNTAX INTEGER {
disabled(1),
blocking(2),
listening(3),
learning(4),
forwarding(5),
broken(6)
}
-- ...
dot1dStpPortEnable OBJECT-TYPE (2)
SYNTAX INTEGER {
enabled(1),
disabled(2)
}
1 | Port STP state enumerated type |
2 | Port enablement status enumerated type |
This extender enables persisting the values of these enumerated integer columns as text that an operator can easiliy recognize.
While this extender is intended primarily for translating integer values to more descriptive ones as shown in the example below, it could also be used to translate from one set of alphanumeric values to another set. |
Configuring the Enum Lookup Property Extender
The Enum Lookup property extender expects zero or more parameters.
Only the default-value
parameter has a fixed name; if it is present, its value is used any time a lookup cannot be completed.
If default-value
is not provided and a lookup fails, no value will be returned for the property.
The remaining parameters are named for the input values, and their values represent the output values.
This example shows how to map values of dot1dStpPortState
and dot1dStpPortEnable
to their textual equivalents.
<resourceType name="dot1dStpPortEntry" label="dot1d STP Port" resourceLabel="${index}">
<persistenceSelectorStrategy class="org.opennms.netmgt.collection.support.PersistAllSelectorStrategy" />
<storageStrategy class="org.opennms.netmgt.collection.support.IndexStorageStrategy" />
</resourceType>
...
<groups>
<group name="dot1dStpPortTable" ifType="all">
<mibObj oid=".1.3.6.1.2.1.17.2.15.1.3" instance="dot1dStpPortEntry" alias="dot1dStpPortState" type="string"/> (1)
<mibObj oid=".1.3.6.1.2.1.17.2.15.1.4" instance="dot1dStpPortEntry" alias="dot1dStpPortEnable" type="string"/> (2)
<mibObj oid=".1.3.6.1.2.1.17.2.15.1.10" instance="dot1dStpPortEntry" alias="dot1dStpPortFwTrans" type="counter" />
<property instance="dot1dStpPortEntry" alias="dot1dStpPortStateText" class-name="org.opennms.netmgt.collectd.EnumLookupPropertyExtender"> (3)
<parameter key="enum-attribute" value="dot1dStpPortState"/>
<parameter key="1" value="disabled(1)"/>
<parameter key="2" value="blocking(2)"/>
<parameter key="3" value="listening(3)"/>
<parameter key="4" value="learning(4)"/>
<parameter key="5" value="forwarding(5)"/>
<parameter key="6" value="broken(6)"/>
</property>
<property instance="dot1dStpPortEntry" alias="dot1dStpPortEnableText" class-name="org.opennms.netmgt.collectd.EnumLookupPropertyExtender"> (4)
<!-- Note absence of parenthetical numeric values; they are entirely optional -->
<parameter key="1" value="enabled"/>
<parameter key="2" value="disabled"/>
</property>
</group>
</groups>
1 | Port STP state enumerated integer attribute |
2 | Port enablement status enumerated integer attribute |
3 | Derived port STP state textual attribute dot1dStpPortStateText |
4 | Derived port enablement status textual attribute dot1dStpPortEnableText |
6.5.3. Index Split Property Extender
The Index Split property extender enables extraction of part of a resource’s local instance identifier.
When to use the Index Split Property Extender
The Index Split property extender is useful when collecting data from tables with compound indices, because it enables extraction of a single index component.
For example, the Cisco Airespace bsnAPIfLoadParametersTable
is indexed using the tuple of bsnAPDot3MacAdddress
and bsnAPIfSlotId
.
bsnAPIfLoadParametersEntry OBJECT-TYPE
-- ...
DESCRIPTION
"An entry (conceptual row) in the Table.
Entries in this MIB are indexed by
bsnAPDot3MacAddress and bsnAPIfSlotId"
INDEX {
bsnAPDot3MacAddress,
bsnAPIfSlotId
} (1)
-- ...
1 | bsnAPDot3MacAddress is the first component of the compound index for the entry type for bsnAPIfLoadParametersTable |
This extender enables extraction of just the bsnAPIfSlotId
component for use in a resource label.
Configuring the Index Split Property Extender
The Index Split property extender expects a single parameter, index-pattern
, whose value is a regular expression.
The expression must be general enough to match all possible index values for the table at hand, and should include one capturing group.
The subpattern matched by the expression’s first capturing group will be returned; any further groups are ignored.
This example shows how to extract just the bsnAPIfSlotId
index component as a string property.
<group name="bsnAPIfLoadParametersTable" ifType="all">
<mibObj oid=".1.3.6.1.4.1.14179.2.2.13.1.4" instance="bsnAPIfLoadParametersEntry" alias="bsnAPIfLoadNumOfCli" type="integer" />
<property instance="bsnAPIfLoadParametersEntry" alias="slotNumber" class-name="org.opennms.netmgt.collectd.IndexSplitPropertyExtender"> (1)
<parameter key="index-pattern" value="^.+\.(\d+)$" /> (2)
</property>
</group>
1 | Derived string property slotNumber |
2 | Regular expression; the portion in parentheses is what gets extracted. \d+ means "one or more decimal digit characters". |
6.5.4. Regex Property Extender
The Regex property extender works similarly to the Index Split property extender, with the added capability of importing a string property from a source table.
When to Use the Regex Property Extender
The Regex property extender is useful when some portion of the target MIB table’s index can be used as an index to the source MIB table.
For example, the Cisco Airespace bsnAPIfLoadParametersTable
is indexed using the tuple of bsnAPDot3MacAdddress
and bsnAPIfSlotId
, whereas the bsnAPTable
is indexed on bsnAPDot3MacAddress
alone.
bsnAPIfLoadParametersEntry OBJECT-TYPE
-- ...
DESCRIPTION
"An entry (conceptual row) in the Table.
Entries in this MIB are indexed by
bsnAPDot3MacAddress and bsnAPIfSlotId"
INDEX {
bsnAPDot3MacAddress,
bsnAPIfSlotId
} (1)
-- ...
bsnAPEntry OBJECT-TYPE
-- ...
DESCRIPTION
"An entry in the bsnAPTable."
INDEX { bsnAPDot3MacAddress } (2)
-- ...
1 | bsnAPDot3MacAddress is the first component of the compound index for the entry type for bsnAPIfLoadParametersTable |
2 | bsnAPDot3MacAddress is the sole index for the entry type for bsnAPTable |
By extracting just the first index component and using the result as an index into the source MIB table, it’s possible to import the human-sensible bsnAPName
string property from the source MIB table.
Configuring the Regex Property Extender
The Regex property extender expects three parameters, all of which are required:
Name | Description |
---|---|
|
The name of the |
|
The alias name of the string property to be imported from the source MIB table |
|
A regular expression containing one matching group |
The index-pattern
expression must meet the same criteria as for the Index Split property extender.
The subpattern matched by its first capturing group will be used as an index into the source MIB table; any further groups are ignored.
This example shows how to use the value of bsnAPDot3MacAddress
as an index into the bsnAPTable
.
<resourceType name="bsnAPEntry" label="Cisco Wireless AP" resourceLabel="${bsnAPName} (index ${index})">
<persistenceSelectorStrategy class="org.opennms.netmgt.collection.support.PersistAllSelectorStrategy" />
<storageStrategy class="org.opennms.netmgt.collection.support.IndexStorageStrategy" />
</resourceType>
<resourceType name="bsnAPIfLoadParametersEntry" label="Cisco Wireless AP Resources" resourceLabel="${bsnAPName} (index ${index})">
<persistenceSelectorStrategy class="org.opennms.netmgt.collection.support.PersistAllSelectorStrategy" />
<storageStrategy class="org.opennms.netmgt.collection.support.IndexStorageStrategy" />
</resourceType>
<groups>
<group name="bsnAPTable" ifType="all">
<mibObj oid=".1.3.6.1.4.1.14179.2.2.1.1.3" instance="bsnAPEntry" alias="bsnAPName" type="string" /> (1)
</group>
<group name="bsnAPIfLoadParametersTable" ifType="all">
<mibObj oid=".1.3.6.1.4.1.14179.2.2.13.1.4" instance="bsnAPIfLoadParametersEntry" alias="bsnAPIfLoadNumOfCli" type="integer" />
<property instance="bsnAPIfLoadParametersEntry" alias="bsnAPName" class-name="org.opennms.netmgt.collectd.RegexPropertyExtender"> (2)
<parameter key="source-type" value="bsnAPEntry" />
<parameter key="source-alias" value="bsnAPName" />
<parameter key="index-pattern" value="^(.+)\.\d+$" /> (3)
</property>
</group>
</groups>
1 | Regular string property bsnAPName on the source table |
2 | Extended string property bsnAPName on the target table |
3 | Regular expression; the portion in parentheses is what gets extracted. \d+ means "one or more decimal digit characters". |
6.5.5. Pointer-Like Index Property Extender
The Pointer-Like Index property extender makes it possible to use the value of an attribute from the target MIB table as the index into the source MIB table. Unlike the Index Split and Regex extenders, this extender class does not require the target and source MIB tables to share any index components.
When to Use the Pointer-Like Index Property Extender
The Pointer-Like Index property extender is useful when the target MIB table contains a column whose value can be used as an index into the source MIB table.
For example, the Cisco Process MIB’s cpmCPUTotalTable
has its own index that is not shared with any other tables, but its cpmCPUTotalPhysicalIndex
column contains an integer which can be used as an index into the entPhysicalTable
.
cpmCPUTotalEntry OBJECT-TYPE
-- ...
DESCRIPTION
"Overall information about the CPU load. Entries in this
table come and go as CPUs are added and removed from the
system."
INDEX { cpmCPUTotalIndex } (1)
-- ...
cpmCPUTotalPhysicalIndex OBJECT-TYPE (2)
-- ...
DESCRIPTION
"The entPhysicalIndex of the physical entity for which
the CPU statistics in this entry are maintained.
The physical entity can be a CPU chip, a group of CPUs,
a CPU card etc. The exact type of this entity is described by
its entPhysicalVendorType value. If the CPU statistics
in this entry correspond to more than one physical entity
(or to no physical entity), or if the entPhysicalTable is
not supported on the SNMP agent, the value of this object
must be zero."
-- ...
entPhysicalEntry OBJECT-TYPE
-- ...
DESCRIPTION
"Information about a particular physical entity.
Each entry provides objects (entPhysicalDescr,
entPhysicalVendorType, and entPhysicalClass) to help an NMS
identify and characterize the entry, and objects
(entPhysicalContainedIn and entPhysicalParentRelPos) to help
an NMS relate the particular entry to other entries in this
table."
INDEX { entPhysicalIndex } (3)
-- ...
1 | The cpmCPUTotalTable entry type is indexed on cpmCPUTotalIndex , which has no meaning outside this table |
2 | The cpmCPUTotalPhysicalIndex column contains a value of entPhysicalIndex corresponding to the CPU referenced in a given row |
3 | The entPhysicalTable entry type is indexed on entPhysicalIndex and provides many useful textual columns. |
By treating cpmCPUTotalPhysicalIndex
somewhat like a pointer, it’s possible to import string properties from the entPhysicalTable
for use in the resource-label.
Some combinations of Cisco hardware and software appear to use values of cpmCPUTotalIndex that are directly interchangeable with entPhysicalIndex .
This relationship does not hold across all product lines or software revisions.
|
Configuring the Pointer-Like Index Property Extender
The Pointer-Like Index property extender expects three parameters, all of which are required:
Name | Description |
---|---|
|
The name of the |
|
The alias name of the string property to be imported from the source MIB table |
|
The alias name of the column in the target MIB table whose value may be used as an index into the source MIB table |
This example shows how to use cpmCPUTotalPhysicalIndex
as a pointer-like index into the entPhysicalTable
.
The target resource gains a pair of string properties, which we will call cpmCPUTotalName
and cpmCPUTotalDescr
.
<resourceType name="entPhysicalEntry" label="Physical Entity" resourceLabel="${entPhysicalName} (${entPhysicalDescr}))">
<persistenceSelectorStrategy class="org.opennms.netmgt.collection.support.PersistAllSelectorStrategy"/>
<storageStrategy class="org.opennms.netmgt.collection.support.IndexStorageStrategy"/>
</resourceType>
<resourceType name="cpmCPUTotalEntry" label="Cisco CPU Total" resourceLabel="${cpmCPUTotalName} (${cpmCPUTotalDescr})">
<persistenceSelectorStrategy class="org.opennms.netmgt.collection.support.PersistAllSelectorStrategy" />
<storageStrategy class="org.opennms.netmgt.collection.support.IndexStorageStrategy" />
</resourceType>
<groups>
<group name="entity-physical-table" ifType="all">
<mibObj oid=".1.3.6.1.2.1.47.1.1.1.1.2" instance="entPhysicalEntry" alias="entPhysicalDescr" type="string"/> (1)
<mibObj oid=".1.3.6.1.2.1.47.1.1.1.1.7" instance="entPhysicalEntry" alias="entPhysicalName" type="string"/> (2)
</group>
<group name="cpm-cpu-total" ifType="all">
<mibObj oid=".1.3.6.1.4.1.9.9.109.1.1.1.1.2" instance="cpmCPUTotalEntry" alias="cpmCPUTotalPhysicalIndex" type="string" /> (3)
<mibObj oid=".1.3.6.1.4.1.9.9.109.1.1.1.1.8" instance="cpmCPUTotalEntry" alias="cpmCPUTotal5minRev" type="gauge" />
<property instance="cpmCPUTotalEntry" alias="cpmCPUTotalName" class-name="org.opennms.netmgt.collectd.PointerLikeIndexPropertyExtender"> (4)
<parameter key="source-type" value="entPhysicalEntry"/>
<parameter key="source-attribute" value="entPhysicalName"/> (5)
<parameter key="target-index-pointer-column" value="cpmCPUTotalPhysicalIndex"/>
</property>
<property instance="cpmCPUTotalEntry" alias="cpmCPUTotalDescr" class-name="org.opennms.netmgt.collectd.PointerLikeIndexPropertyExtender"> (6)
<parameter key="source-type" value="entPhysicalEntry"/>
<parameter key="source-attribute" value="entPhysicalDescr"/> (7)
<parameter key="target-index-pointer-column" value="cpmCPUTotalPhysicalIndex"/>
</property>
</group>
</groups>
<1>, <2> First we collect entPhysicalDescr
and entPhysicalName
in the source group, which uses a resource-type associated with the entPhysicalTable
<3> Then we collect the pointer-like cpmCPUTotalPhysicalIndex
in the target group, whose resource-type is associated with the cpmCPUTotalTable
<4> We derive cpmCPUTotalName
in the target group telling the extender to use the pointer-like property’s value as an index into the source table, and specify that we want to "pull over" the source attribute entPhysicalName
<5>
<6> Deriving cpmCPUTotalDescr
is almost identical, except that this time we are pulling over the value of entPhysicalDescr
<7>
6.5.6. SNMP Interface Property Extender
The SNMP Interface property extender does much the same job as the Pointer-Like Index property extender, but it is specialized for importing properties from the ifTable
.
Resources representing rows in the ifTable
are modeled differently in OpenNMS Horizon compared to other tabular resource types, and this extender accounts for those differences.
When to Use the SNMP Interface Property Extender
Use the SNMP Interface property extender when the string property you want to import is associated with a network interface which is represented by a row in the ifTable
.
For example, the dot1dBasePortTable
has its own index which does not share any components with any other table, but its dot1dBasePortIfIndex
column contains a value that is a valid ifIndex
.
dot1dBasePortEntry OBJECT-TYPE
-- ...
DESCRIPTION
"A list of information for each port of the bridge."
-- ...
INDEX { dot1dBasePort } (1)
-- ...
dot1dBasePortIfIndex OBJECT-TYPE (2)
-- ...
DESCRIPTION
"The value of the instance of the ifIndex object,
defined in IF-MIB, for the interface corresponding
to this port."
::= { dot1dBasePortEntry 2 }
-- ...
ifEntry OBJECT-TYPE
-- ...
DESCRIPTION
"An entry containing management information applicable to a
particular interface."
INDEX { ifIndex } (3)
::= { ifTable 1 }
1 | The entry type for dot1dBasePortTable is indexed on dot1dBasePort , which has no significance outside this table |
2 | But dot1dBasePortTable contains column dot1dBasePortIfIndex , which tells us the ifIndex corresponding to the physical port underlying to the associated bridge base port |
3 | ifIndex is the index of the ifTable entry type (and also of the ifXTable entry type) |
By using this extender, it’s possible to import string attributes from the ifTable
, ifXTable
, or another table that augments the ifTable
.
Configuring the SNMP Interface Property Extender
The SNMP Interface property extender expects two or three parameters:
Name | Description | Required | Default value |
---|---|---|---|
|
The alias name of the string property to be imported from the source MIB table |
required |
– |
|
The name of the column in the source MIB table that contains a value of |
optional |
|
|
The name of the column in the target MIB table that contains a value of |
required |
– |
This example shows how to use dot1dBasePortIfIndex
as a pointer-like index to import ifDescr
from the ifTable
, and ifName
and ifAlias
from the ifXTable
, into a trio of new string properties in the target resource.
<resourceType name="dot1dBasePortEntry" label="dot1d Base Port" resourceLabel="${index}">
<persistenceSelectorStrategy class="org.opennms.netmgt.collection.support.PersistAllSelectorStrategy" />
<storageStrategy class="org.opennms.netmgt.collection.support.IndexStorageStrategy" />
</resourceType>
<groups>
<group name="ifTable" ifType="all">
<mibObj oid=".1.3.6.1.2.1.2.2.1.1" instance="ifIndex" alias="interfaceIndex" type="string" /> (1)
<mibObj oid=".1.3.6.1.2.1.2.2.1.2" instance="ifIndex" alias="interfaceDescr" type="string" />
<mibObj oid=".1.3.6.1.2.1.31.1.1.1.1" instance="ifIndex" alias="interfaceName" type="string" />
<mibObj oid=".1.3.6.1.2.1.31.1.1.1.18" instance="ifIndex" alias="interfaceAlias" type="string" />
</group>
<group name="dot1dBasePortTable" ifType="all">
<mibObj oid=" .1.3.6.1.2.1.17.1.4.1.1" instance="dot1dBasePortEntry" alias="dot1dBasePort" type="string" />
<mibObj oid=" .1.3.6.1.2.1.17.1.4.1.2" instance="dot1dBasePortEntry" alias="dot1dBasePortIfIndex" type="string" /> (2)
<mibObj oid=" .1.3.6.1.2.1.17.1.4.1.4" instance="dot1dBasePortEntry" alias="d1dBPDelayExDiscard" type="counter" />
<mibObj oid=" .1.3.6.1.2.1.17.1.4.1.5" instance="dot1dBasePortEntry" alias="d1dBPMtuExDiscard" type="counter" />
<property instance="dot1dBasePortEntry" alias="dot1dBasePortIfDescr" class-name="org.opennms.netmgt.collectd.InterfaceSnmpPropertyExtender"> (3)
<parameter key="source-ifindex-attribute" value="interfaceIndex"/>
<parameter key="source-attribute" value="interfaceDescr"/> (4)
<parameter key="target-ifindex-pointer-column" value="dot1dBasePortIfIndex"/>
</property>
<property instance="dot1dBasePortEntry" alias="dot1dBasePortIfName" class-name="org.opennms.netmgt.collectd.InterfaceSnmpPropertyExtender"> (5)
<parameter key="source-ifindex-attribute" value="interfaceIndex"/>
<parameter key="source-attribute" value="interfaceName"/> (6)
<parameter key="target-ifindex-pointer-column" value="dot1dBasePortIfIndex"/>
</property>
<property instance="dot1dBasePortEntry" alias="dot1dBasePortIfAlias" class-name="org.opennms.netmgt.collectd.InterfaceSnmpPropertyExtender"> (7)
<parameter key="source-ifindex-attribute" value="interfaceIndex"/>
<parameter key="source-attribute" value="interfaceAlias"/> (8)
<parameter key="target-ifindex-pointer-column" value="dot1dBasePortIfIndex"/>
</property>
</group>
</groups>
1 | First we collect all of ifIndex , ifDescr , ifName , and ifAlias in a group associated with the ifIndex source resource-type, using modified names to avoid collisions with internal workings (the ifIndex type is built in, so we do not need a custom resource-type definition for it) |
2 | Then we collect the pointer-like column dot1dBasePortIfIndex in the target group |
3 | To derive the dot1dBasePortIfDescr string property, we tell the extender which target attribute contains the pointer-like value, which source column needs to have a matching value, and that we want to "pull over" the interfaceDescr property <4> from the source group |
4 | Deriving dot1dBasePortIfName is almost identical, except that we want the property interfaceName <6> from the source group instead |
5 | Again with dot1dBasePortIfAlias , we repeat ourselves except that our desired property from the source group is interfaceAlias <8> |
6.6. Administration and Troubleshooting
6.6.1. Collectd Administration
This section describes reference and administrative information associated with collectd.
File | Description |
---|---|
|
Configuration file for global collectd daemon and collectors configuration. (See Configuring Collectd.) |
|
Log file for all collectors and the global collectd daemon. |
|
RRD graph definitions to render performance data measurements in the UI. |
|
Directory with RRD graph definitions for devices and applications to render performance data measurements in the UI. |
|
Event definitions for collectd, i.e., dataCollectionSucceeded, and dataCollectionFailed. |
|
Directory to store generic resource type definitions. (See Resource Types.) |
6.6.2. Shell Commands
A number of Karaf Shell commands are made available to help administer and diagnose issues related to performance data collection.
To use the commands, log into the Karaf Shell on your system using:
ssh -p 8101 admin@localhost
The Karaf shell uses the same credential as the web interface.
Users must be associated with the ADMIN role to access the shell.
|
In order to keep the session open while executing long-running tasks without any user input add -o ServerAliveInterval=10 to your ssh command.
|
Ad hoc collection
The opennms:collect
Karaf Shell command can be used to trigger and perform a collection on any of the available collectors.
The results of the collection (also referred to as the "collection set") will be displayed in the console after a successful collection. The resulting collection set will not be persisted, nor will any thresholding be applied.
List all of the available collectors:
opennms:list-collectors
Invoke the SnmpCollector
against interface 127.0.0.1
on NODES:n1
.
opennms:collect -n NODES:n1 org.opennms.netmgt.collectd.SnmpCollector 127.0.0.1
Invoke the SnmpCollector
against interface 127.0.0.1
on NODES:n1
via the MINION
location.
opennms:collect -l MINION -n NODES:n1 org.opennms.netmgt.collectd.SnmpCollector 127.0.0.1
Setting the location on the command line will override the node’s location. |
If you see errors caused by RequestTimedOutException`s when invoking a collector at a remote location, consider increasing the time to live.
By default, `collectd will use the service interval as the time to live.
|
Invoke the JdbcCollector
against 127.0.0.1
while specifying some of the collector parameters.
opennms:collect org.opennms.netmgt.collectd.JdbcCollector 127.0.0.1 collection=PostgreSQL driver=org.postgresql.Driver url=jdbc:postgresql://OPENNMS_JDBC_HOSTNAME/postgres user=postgres
Some collectors, such as the JdbcCollector , can be invoked without specifying a node.
|
Persist a collection :
opennms:collect -l MINION -n NODES=n1 -p org.opennms.netmgt.collectd.SnmpCollector 127.0.0.1
-p/--persist option will persist collection set there by introducing an extra datapoint other than data collected during already configured collection interval.
|
A complete list of options is available using:
opennms:collect --help
Interpreting the output
After a successful collection, the collection set will be displayed in the following format:
resource a
group 1
attribute
attribute
group 2
attribute
resource b
group 1
attribute
...
The description of the resources, groups and attribute may differ between collectors. This output is independent of the persistence strategy that is being used.
Measurements & Resources
The following Karaf Shell commands are made available to help enumerate, view and manage measurement related resources.
The opennms:show-measurement-resources
command can be used to enumerate or lookup resources:
admin@opennms> opennms:show-measurement-resources --node NODES:node --no-children
ID: node[NODES:node]
Name: NODES:node
Label: node
Type: Node
Link: element/node.jsp?node=NODES:node
Children:
node[NODES:node].nodeSnmp[]
node[NODES:node].interfaceSnmp[lo]
node[NODES:node].interfaceSnmp[opennms-jvm]
node[NODES:node].responseTime[192.168.238.140]
node[NODES:node].responseTime[192.168.39.1]
node[NODES:node].responseTime[172.17.0.1]
node[NODES:node].responseTime[127.0.0.1]
...
The opennms:delete-measurement-resources
command can be used to delete resources, and all of the associated metrics:
admin@opennms> opennms:delete-measurement-resources "node[NODES:node].responseTime[127.0.0.1]"
Deleting measurements and metadata associated with resource ID 'node[NODES:node].responseTime[127.0.0.1]'...
Done.
The opennms:show-measurements
command can be used to render the values of the attributes (measurements) associated with a particular resource:
admin@opennms> opennms:show-measurements -a ifHCInOctets "node[NODES:node].interfaceSnmp[lo]"
Resource with ID 'node[NODES:node].interfaceSnmp[lo]' has attributes: [ifHCOutUcastPkts, ifInDiscards, ifHCInBroadcastPkts, ifHCInOctets, ifHCOutOctets, ifOutErrors, ifHCOutMulticastPkt, ifHCInUcastPkts, ifInErrors, ifHCInMulticastPkts, ifHCOutBroadcastPkt, ifOutDiscards]
Limiting attributes to: [ifHCInOctets]
timestamp,ifHCInOctets
Fri Sep 13 13:30:00 EDT 2019,NaN
Fri Sep 13 13:35:00 EDT 2019,NaN
Fri Sep 13 13:40:00 EDT 2019,NaN
The opennms:show-newts-samples
command can be used to view the raw samples (collected values) associated with a particular resource.
admin@opennms> opennms:show-newts-samples -a ifHCInOctets "node[NODES:node].interfaceSnmp[lo]"
Resource with ID 'node[NODES:node].interfaceSnmp[lo]' has attributes: [ifHCOutUcastPkts, ifInDiscards, ifHCInBroadcastPkts, ifOutErrors, ifHCInOctets, ifHCOutMulticastPkt, ifHCOutOctets, ifHCInUcastPkts, ifInErrors, ifHCInMulticastPkts, ifOutDiscards, ifHCOutBroadcastPkt]
Limiting attributes to: [ifHCInOctets]
Fetching samples for Newts resource ID 'snmp:2:lo:mib2-X-interfaces'...
Fri Sep 13 14:31:05 EDT 2019,ifHCInOctets,1271178704.0000
Stress Testing
The opennms:stress-metrics
Karaf Shell command can be used to simulate load on the active persistence strategy, whether it be RRDtool
, JRobin
, or Newts
.
The tool works by generating collection sets, similar to those built when performing data collection, and sending these to the active persistence layer. By using the active persistence layer, we ensure that we use the same write path which is used by the actual data collection services.
Generate samples for 10 nodes every 15 seconds and printing the statistic report every 30 seconds:
opennms:stress-metrics -n 10 -i 15 -r 30
While active, the command will continue to generate and persist collection sets. During this time you can monitor the system I/O and other relevant statistics.
When your done, use CTRL+C to stop the stress tool.
A complete list of options is available using:
opennms:stress-metrics --help
Interpreting the output
The statistics output by the tool can be be interpreted as follows:
- numeric-attributes-generated
-
The number of numeric attributes that were sent to the persistence layer. We have no guarantee as to whether or not these were actually persisted.
- string-attributes-generated
-
The number of string attributes that were sent to the persistence layer. We have no guarantee as to whether or not these were actually persisted.
- batches
-
The count is used to indicate how many batches of collection sets (one at every interval) were sent to the persistence layer. The timers show how much time was spent generating the batch, and sending the batch to the persistence layer.
7. Thresholding
Thresholding allows you to define limits against network performance metrics of a managed entity to trigger an event when a value goes above or below the specified limit.
-
High
-
Low
-
Absolute Value
-
Relative Change
7.1. How Thresholding Works in OpenNMS Horizon
OpenNMS Horizon uses collectors to implement data collection for a particular protocol or family of protocols (SNMP, JMX, HTTP, XML/JSON, WS-Management/WinRM, JDBC, etc.). You can specify configuration for a particular collector in a collection package: essentially the set of instructions that drives the behavior of the collector.
The collectd daemon gathers and stores performance data from these collectors. This is the data against which OpenNMS Horizon applies thresholds. Thresholds trigger events when a specified threshold value is met. You can further create notifications and alarms for threshold events.
7.2. What Triggers a Thresholding Event?
OpenNMS Horizon uses four thresholding algorithms that trigger an event when the datasource value:
-
Low - equals or drops below the threshold value and re-arms when it equals or comes back up above the re-arm value (e.g., available disk space falls under the specified value)
-
High - equals or exceeds the threshold value, and re-arms when it equals or drops below the re-arm value (e.g., bandwidth use exceeds the specified amount)
-
Absolute - changes by the specified amount (e.g., on a fiber-optic link, a change in loss of anything greater than 3 dB is a problem regardless of what the original or final value is)
-
Relative - changes by percent (e.g., available disk space changes more than 5% from the last poll)
These thresholds can be basic (tested against a single value) or an expression (evaluated against multiple values in an expression).
OpenNMS Horizon applies these algorithms against any performance data (telemetry) collected by collectd or pushed to telemetryd. This includes, but is not limited to, metrics such as CPU load, bandwidth, disk space, etc.
The basic walkthrough focuses on how to set simple thresholds using default values in the OpenNMS Horizon setup. For information on setting and configuring collectors, collectd, and the collectd-configuration.xml file, see Performance Management. |
7.3. Basic Walk-through – Thresholding
This section describes how to create a basic threshold for a single, system-wide variable: the number of logged-in users. Our threshold will tell OpenNMS Horizon to create an event when the number of logged-in users on the device exceeds two, and re-arm when it falls below two.
Before creating a threshold, you need to make sure you are collecting the metric against which you want to threshold.
7.3.1. Determine You are Collecting Metric
In this case, we have chosen a metric (number of logged-in users) that is collected by default. We are also using data collected via SNMP. (For information on other collectors, see Collectors.)
-
In the OpenNMS Horizon UI, choose
Reports>Resource Graphs
. -
Select one of the listed resources.
-
Under
SNMP Node Data
, selectNode-level Performance Data
and chooseGraph Selection
. -
Scroll to find the
Number of Users
graph.-
You can click the binoculars icon to display only this graph.
-
7.3.2. Create a Threshold
-
Select
<User_Name>>Configure OpenNMS
from the top-right menu. -
Under
Performance Measurement
, chooseConfigure Thresholds
.-
A screen with a list of preconfigured threshold groups appears. We will work with
netsnmp
. For information on how to create a threshold group, see Creating a Threshold Group.
-
-
Click
Edit
beside thenetsnmp
group. -
Click
Create New Threshold
at the bottom of theBasic Thresholds
area of the screen. -
Set the following information and click
Save
:
Field |
Value |
Description |
Type |
high |
Triggers an event when the datasource value equals or exceeds the threshold value, and re-arms when it equals or drops below the re-arm value |
Datasource |
hrSystemNumUsers |
Name of the datasource you want to threshold against. For this tutorial, we have provided the datasource for logged-in users. For information on how to determine a metric’s datasource, see Determine the Datasource. |
Datasource label |
leave blank |
Optional text label. Not required for this tutorial. |
Value |
2 |
The value above which we want to trigger an event. In this case, we want to trigger an event when the number of logged-in users exceeds two. |
Re-arm |
2 |
The value below which we want the system to re-arm. In this case, once the number of logged-in users falls below two. |
Trigger |
3 |
The number of consecutive times the threshold value can occur before the system triggers an event. Since our default polling period is 5 minutes, a value of 3 means OpenNMS Horizon would create a threshold event if there are more than 2 users for 15 minutes. |
Description |
leave blank |
Optional text to describe your threshold. |
Triggered UEI |
leave blank |
A custom uniform event identifier (UEI) sent into the events system when the threshold is triggered. A custom UEI for each threshold makes it easier to create notifications. If left blank, it defaults to the standard thresholds UEIs. |
Re-armed UEI |
leave blank |
A custom uniform event identifier (UEI) sent into the events system when the threshold is re-armed. |
7.3.3. Testing the Threshold
To test the threshold we just created, log a second person into the node you are monitoring.
Navigate to the Events
page.
You should see an event that indicates your threshold triggered when more than one user logged in.
Log out the second user.
The Events
page should indicate that the system has re-armed.
7.3.4. Creating a Threshold for CPU Usage
This procedure describes how to create an expression-based threshold when the five-minute CPU load average metric reaches or goes above 70% for two consecutive measurement intervals. Expression-based thresholds are useful when you need to threshold on a percentage, not the actual value of the data collected.
Expression-based thresholds work only if the data sources in question lie in the same directory. |
-
Select
<User_Name>>Configure OpenNMS
from the top-right menu. -
Under
Performance Measurement
, chooseConfigure Thresholds
. -
Click
Edit
beside thenetsnmp
group. -
Click
Create New Expression-based Threshold
. -
Fill in the following information:
Field
Value
Description
Type
high
Triggers an event when the datasource value equals or exceeds the threshold value, and re-arms when it equals or drops below the re-arm value
Expression
((loadavg5 / 100) / CpuNumCpus) * 100
Divides the five-minute CPU load average by 100 (to obtain the effective load average), which is then divided by the number of CPUs. This value is then multiplied by 100 to provide a percentage.
( SNMP does not report in decimals, which is why the expression divides the loadavg5 by 100.)
Datasource type
node
The type of datasource from which you are collecting data.
Datasource label
leave blank
Optional text label. Not required for this tutorial.
Value
70
Trigger an event when the five-minute CPU load average goes above 70%.
Re-arm
50
Re-arm the system when the five-minute CPU load average drops below 50%
Trigger
2
The number of consecutive times the threshold value can occur before the system triggers an event. In this case, when the five-minute CPU load average goes above 70% for two consecutive polling periods.
Description
Trigger an alert when the five-minute CPU load average metric reaches or goes above 70% for two consecutive measurement intervals
Optional text to describe your threshold.
Triggered UEI
leave blank
See the table in Create a Threshold for details.
Re-armed UEI
leave blank
See the table in Create a Threshold for details.
-
Click
Save
.
7.3.5. Using Metadata in a Threshold
Metadata in expression-based thresholds can streamline threshold creation. The Metadata DSL (domain specific language) allows for the use of patterns in an expression, whereby the metadata is replaced with a corresponding value during the collection process. A single expression can behave differently based on the node being tested against.
During evaluation of an expression, the following scopes are available:
-
Node metadata
-
Interface metadata
-
Service metadata
Metadata is also supported in Value, Re-arm, and Trigger fields for Single-DS and expression-based thresholds.
For more information on metadata and how to define it, see Metadata.
This procedure uses metadata to trigger an event when the number of logged-in users exceeds 1.
The expression is in the form ${context:key|context_fallback:key_fallback|…|default}
.
Before using metatdata in a threshold, you need to add the metatdata context pair, in this case, a requisition key called userLimit (see Adding Metadata through the Web UI).
-
Select
<User_Name>>Configure OpenNMS
from the top-right menu. -
Under Performance Measurement, choose Configure Thresholds.
-
Click Edit beside the
netsnmp
group. -
Click Create New Expression-based Threshold.
-
Fill in the following information:
-
Type: High
-
Expression:
hrSystemNumUsers / ${requisition:userLimit|1}
-
Datasource type: Node
-
Value: 1
-
Rearm: 1
-
Description: Too many logged-in users
-
-
Click Save.
This expression will trigger an event when the number of logged-in users exceeds 1.
7.3.6. Determining the Datasource
Creating a threshold requires the name of the datasource generating the metrics on which you want to threshold.
Datasource names for the SNMP protocol appear in etc/snmp-graph.properties.d/
.
-
To determine the name of the datasource, navigate to the
Resource Graphs
screen. For example,-
Reports>Resource Graphs
. -
Select one of the listed resources.
-
Under
SNMP Node Data
, selectNode-level Performance Data
and chooseGraph Selection
.
-
-
Scroll through the graphs to find the title of the graph that displays the metric on which you want to threshold. For example, "Number of Processes" or "System Uptime":
-
Go to
etc/snmp-graph.properties.d/
and search for the title of the graph (for example, "System Uptime"). -
Note the name of the datasource, and enter it in the
Datasource
field when you create your threshold.
7.3.7. Create a Threshold Group
A threshold group associates a set of thresholds to a service (e.g., thresholds that apply to all Cisco devices). OpenNMS Horizon includes seven preconfigured, editable threshold groups:
-
mib2
-
cisco
-
hrstorage
-
netsnmp
-
juniper-srx
-
netsnmp-memory-linux
-
netsnmp-memory-nonlinux
You can edit an existing group (through the UI) or create a new one (in the thresholds.xml file located in $OPENNMS_HOME/etc/thresholds.xml
).
Once you create the group, you can then define it in the thresholds.xml file or define it in the UI.
We will create a threshold group called "demo_group".
-
Type the following in the thresholds.xml file.
<group name="demo_group" rrdRepository="/opt/opennms/share/rrd/snmp/"> </group>
-
Once you have created the group in the thresholds.xml file, switch to the UI, go to the threshold screen and click
Request a reload threshold packages configuration
.-
The group you created should appear in the UI.
-
-
Click
Edit
to edit it.
The following is a sample of how the threshold appears in the thresholds.xml file:
<group name="demo_group" rrdRepository="/opt/opennms/share/rrd/snmp/"> (1)
<expression type="high" ds-type="hrStorageIndex" value="90.0"
rearm="75.0" trigger="2" ds-label="hrStorageDescr"
filterOperator="or" expression="hrStorageUsed / hrStorageSize * 100.0">
<resource-filter field="hrStorageType">^\.1\.3\.6\.1\.2\.1\.25\.2\.1\.4$</resource-filter> (2)
</expression>
</group>
1 | The name of the group and the directory of the stored data. |
2 | The details of the threshold including type, datasource type, threshold value, rearm value, etc. |
7.3.8. Create a Notification on a Threshold Event
A custom UEI for each threshold makes it easier to create notifications.
7.4. Thresholding Service
The Thresholding Service is the component responsible for maintaining the state of the performance metrics and for generating alarms from these when thresholds are triggered (armed) or cleared (unarmed).
The thresholding service listens for and visits performance metrics after they are persisted to the time series database.
The state of the thresholds are held in memory and pushed to persistent storage only when they are changed.
7.4.1. Distributed Thresholding with Sentinel
Thresholding for streaming telemetry with telemetryd is supported on Sentinel when using Newts. When running on Sentinel, the thresholding state can be stored in either Cassandra or PostgreSQL. Given that Newts already requires Cassandra, we recommend using Casssandra in order to help minimize the load on PostgreSQL.
Thresholding on Sentinel uses the same configuration files as OpenNMS Horizon and operates similarly. When a thresholding changes to/from trigger or cleared, and event is published which is processed by OpenNMS Horizon and the alarm is created or updated.
7.5. Shell Commands
The following shell commands are made available to help debug and manage thresholding.
Enumerate the persisted threshold states using opennms:threshold-enumerate
:
admin@opennms> opennms:threshold-enumerate
Index State Key
1 23-127.0.0.1-hrStorageIndex-hrStorageUsed / hrStorageSize * 100.0-/opt/opennms/share/rrd/snmp-RELATIVE_CHANGE
2 23-127.0.0.1-if-ifHCInOctets * 8 / 1000000 / ifHighSpeed * 100-/opt/opennms/share/rrd/snmp-HIGH
3 23-127.0.0.1-node-((loadavg5 / 100) / CpuNumCpus) * 100.0-/opt/opennms/share/rrd/snmp-HIGH
4 23-127.0.0.1-if-ifInDiscards + ifOutDiscards-/opt/opennms/share/rrd/snmp-HIGH
Each state is uniquely identified by a state key
and aliased by the given index
.
Indexes are scoped to the particular shell session and provided as an alternative to specifying the complete state key in subsequent commands.
Display state details using opennms:threshold-details
:
admin@opennms> opennms:threshold-details 1
multiplier=1.333
lastSample=64.77758166043765
previousTriggeringSample=28.862826722171075
interpolatedExpression='hrStorageUsed / hrStorageSize * 100.0'
admin@opennms> opennms:threshold-details 2
exceededCount=0
armed=true
interpolatedExpression='ifHCInOctets * 8 / 1000000 / ifHighSpeed * 100'
Different types of thresholds will display different properties. |
Clear a particular persisted state using opennms:threshold-clear
:
admin@opennms> opennms:threshold-clear 2
Or clear all the persisted states with opennms:threshold-clear-all
:
admin@opennms> opennms:threshold-clear-all
Clearing all thresholding states....done
8. Events
Events are central to the operation of the OpenNMS Horizon platform, so it’s critical to have a firm grasp of this topic.
Whenever something in OpenNMS Horizon appears to work by magic, it’s probably events working behind the curtain. |
8.1. Anatomy of an Event
Events are structured historical records of things that happen in OpenNMS Horizon and the nodes, interfaces, and services it manages. Every event has a number of fixed fields and zero or more parameters.
- UEI (Universal Event Identifier)
-
A string uniquely identifying the event’s type. UEIs are typically formatted in the style of a URI, but the only requirement is that they start with the string
uei.
. - Event Label
-
A short, static label summarizing the gist of all instances of this event.
- Description
-
A long-form description describing all instances of this event.
- Log Message
-
A long-form log message describing this event, optionally including expansions of fields and parameters so that the value is tailored to the event at hand.
- Severity
-
A severity for this event type. Possible values range from
Cleared
toCritical
. - Event ID
-
A numeric identifier used to look up a specific event in the OpenNMS Horizon system.
- Operator Instruction
-
A set of instructions for an operator to respond appropriately to an event of this type.
- Alarm Data
-
If this field is provided for an event, OpenNMS Horizon will create, update, or clear alarms for events of that type according to the alarm-data specifics.
8.2. Sources of Events
Events may originate within OpenNMS Horizon itself or from outside.
Internally-generated events can be the result of the platform’s monitoring and management functions (e.g. a monitored node becoming totally unavailable results in an event with the UEI uei.opennms.org/nodes/nodeDown
) or they may act as inputs or outputs of housekeeping processes.
The following subsections summarize the mechanisms by which externally-created events can arrive.
8.2.1. SNMP Traps
If SNMP-capable devices in the network are configured to send traps to OpenNMS Horizon, these traps are transformed into events according to pre-configured rules. The Trapd
service daemon, which enables OpenNMS Horizon to receive SNMP traps, is enabled by default.
Disabling the Trapd service daemon will render OpenNMS Horizon incapable of receiving SNMP traps.
|
Event definitions are included with OpenNMS Horizon for traps from many vendors' equipment.
Traps forwarded via proxy
When SNMP traps are forwarded through a proxy using SNMPv2c or SNMPv3, preserving the original source IP address is a challenge due to the lack of an agent-addr
field in the TRAP-V2
PDU used in those protocol versions.
RFC 3584 defines an optional varbind snmpTrapAddress (.1.3.6.1.6.3.18.1.3.0)
which can be added to forwarded traps to convey the original source IP address.
To configure OpenNMS Horizon to honor snmpTrapAddress
when present, set use-address-from-varbind="true"
in the top-level element of ${OPENNMS_HOME}/etc/trapd-configuration.xml
and restart OpenNMS Horizon.
<trapd-configuration<1> snmp-trap-port="162" new-suspect-on-trap="false" use-address-from-varbind="true"<2>/>
1 | Top-level trapd-configuration element |
2 | New attribute to enable use of snmpTrapAddress varbind, when present |
8.2.2. Syslog Messages
Syslog messages sent over the network to OpenNMS Horizon can be transformed into events according to pre-configured rules.
The Syslogd service daemon, which enables OpenNMS Horizon to receive syslog messages over the network, must be enabled for this functionality to work. This service daemon is disabled by default.
|
Parsers
Different parsers can be used to convert the syslog message fields into OpenNMS Horizon event fields.
Parser | Description |
---|---|
|
Parser that uses a regex statement to parse the syslog header. |
|
Parser that uses an internal list of grok-style statements to parse the syslog header. |
|
Parser that strictly parses messages in the default pattern of syslog-ng. |
|
Parser that strictly parses the RFC 5424 format for syslog messages. |
RadixTreeSyslogParser
The RadixTreeSyslogParser
normally uses a set of internally-defined patterns to parse multiple syslog message formats.
If you wish to customize the set of patterns, you can put a new set of patterns into a syslog-grok-patterns.txt
in the etc
directory for OpenNMS Horizon.
The patterns are defined in grok-style statements where each token is defined by a %{PATTERN:semantic}
clause.
Whitespace in the pattern will match 0…n whitespace characters and character literals in the pattern will match the corresponding characters.
The '%' character literal must be escaped by using a backslash, ie. '\%'.
The RadixTreeSyslogParser’s grok implementation only supports a limited number of pattern types. However, these patterns should be sufficient to parse any syslog message format. |
The patterns should be arranged in the file from most specific to least specific since the first pattern to successfully match the syslog message will be used to construct the OpenNMS Horizon event.
Pattern | Description |
---|---|
|
String containing only valid hostname characters (alphanumeric plus '.', '-' and '_'). |
`HOSTNAMEORIP |
String containing only valid hostname characters or IP address characters (IPv4 or IPv6). |
|
Positive integer. |
`IPADDRESS |
String containing only valid IP address characters (IPv4 or IPv6). |
|
3-character English month abbreviation. |
|
String that contains no whitespace. |
|
String. Because this matches any character, it must be followed by a delimiter in the pattern string. |
|
String that contains only whitespace (spaces and or tabs). |
Semantic Token | Description |
---|---|
|
2-digit day of month (1-31). |
|
Facility-priority integer. |
|
String hostname (unqualified or FQDN), IPv4 address, or IPv6 address. |
|
2-digit hour of day (0-23). |
|
Remaining string message. |
|
String message ID. |
|
2-digit minute (0-59). |
|
2-digit month (1-12). |
|
String generic parameter where the parameter’s key is the identifier following "parm" in the semantic token (e.x. parmComponentId maps to a string parameter with key "ComponentId"). |
|
String process ID. |
|
String process name. |
|
2-digit second (0-59). |
|
1- to 6-digit fractional second value as a string. |
|
String timezone value. |
|
Version. |
|
4-digit year. |
8.2.3. ReST
Posting an event in XML format to the appropriate endpoint in the OpenNMS Horizon ReST API will cause the creation of a corresponding event, just as with the XML-TCP interface.
8.2.4. XML-TCP
Any application or script can create custom events in OpenNMS Horizon by sending properly-formatted XML data over a TCP socket.
8.2.5. Receiving IBM Tivoli Event Integration Facility Events
OpenNMS can be configured to receive Events sent using the Tivoli Event Integration Facility.
These EIF events are translated into OpenNMS events using preconfigured rules. The resulting UEI are anchored in the uei.opennms.org/vendor/IBM/EIF/
namespace, with the name of the EIF event class appended.
A sample event configuration for the OMEGAMON_BASE
class is included with OpenNMS.
Configuring the EIF Adapter
Once OpenNMS is started and the Karaf shell is accessible, you can install the EIF Adapter feature and configure it to listen on a specific interface and port.
By default the EIF Adapter is configured to listen on TCP port 1828 on all interfaces. |
[root@localhost /root]# $ ssh -p 8101 admin@localhost
...
opennms> feature:install eif-adapter
opennms> config:edit org.opennms.features.eifadapter
opennms> config:property-set interface 0.0.0.0
opennms> config:property-set port 1828
opennms> config:update
You can check the routes status with the camel:*
commands and/or inspect the log with log:tail
for any obvious errors.
The feature has a debug level logging that can be used to debug operations.
Documentation on using the OSGi console embedded in OpenNMS and the related camel commands. |
Features installed through the Karaf shell persist only as long as the ${OPENNMS_HOME}/data directory remains intact. To enable the feature more permanently, add it to the featuresBoot list in ${OPENNMS_HOME}/etc/org.apache.karaf.features.cfg .
|
You should now be able to configure your EIF forwarders to send to this destination, and their events will be translated into OpenNMS Events and written to the event bus.
Troubleshooting
If events are not reaching OpenNMS, check whether the event source (EIF Forwarder) is correctly configured.
Check your event destination configuration. In particular review the HOSTNAME
and PORT
parameters. Also check that your situations are configured to forward to that EIF destination.
If those appear to be correct verify that the EIF Forwarder can communicate with OpenNMS over the configured port (default 1828).
Review the OSGi log with log:tail
or the camel:*
commands.
8.2.6. TL1 Autonomous Messages
Autonomous messages can be retrieved from certain TL1-enabled equipment and transformed into events.
The Tl1d service daemon, which enables OpenNMS Horizon to receive TL1 autonomous messages, must be enabled for this functionality to work. This service daemon is disabled by default.
:imagesdir: ../../../images
|
8.2.7. Sink
Events can also be created by routing them to a specific topic on Kafka / ActiveMQ.
The topic name should be of the form OpenNMS
.Sink.Events where OpenNMS
is default instance id of OpenNMS Horizon.
The instance id is configurable through a system property org.opennms.instance.id
.
8.3. The Event Bus
At the heart of OpenNMS Horizon lies an event bus. Any OpenNMS Horizon component can publish events to the bus, and any component can subscribe to receive events of interest that have been published on the bus. This publish-subscribe model enables components to use events as a mechanism to send messages to each other. For example, the provisioning subsystem of OpenNMS Horizon publishes a node-added event whenever a new node is added to the system. Other subsystems with an interest in new nodes subscribe to the node-added event and automatically receive these events, so they know to start monitoring and managing the new node if their configuration dictates. The publisher and subscriber components do not need to have any knowledge of each other, allowing for a clean division of labor and lessening the programming burden to add entirely new OpenNMS Horizon subsystems or modify the behavior of existing ones.
8.3.1. Associate an Event to a given node
There are 2 ways to associate an existing node to a given event prior sending it to the Event Bus:
-
Set the nodeId of the node in question to the event.
-
For requisitioned nodes, set the _foreignSource and _foreignId as parameters to the event. Then, any incoming event without a nodeId and these 2 parameters will trigger a lookup on the DB; if a node is found, the nodeId attribute will be dynamically set into the event, regardless which method has been used to send it to the Event Bus. :imagesdir: ../../images
8.4. Event Configuration
The back-end configuration surrounding events is broken into two areas: the configuration of Eventd
itself, and the configuration of all types of events known to OpenNMS Horizon.
8.4.1. The eventd-configuration.xml file
The overall behavior of Eventd
is configured in the file OPENNMS_HOME/etc/eventd-configuration.xml
.
This file does not need to be changed in most installations.
The configurable items include:
- TCPAddress
-
The IP address to which the
Eventd
XML/TCP listener will bind. Defaults to127.0.0.1
. - TCPPort
-
The TCP port number on
TCPAddress
to which theEventd
XML/TCP listener will bind. Defaults to5817
. - UDPAddress
-
The IP address to which the
Eventd
XML/UDP listener will bind. Defaults to127.0.0.1
. - UDPPort
-
The UDP port number on
TCPAddress
to which theEventd
XML/UDP listener will bind. Defaults to5817
. - receivers
-
The number of threads allocated to service the event intake work done by
Eventd
. - queueLength
-
The maximum number of events that may be queued for processing. Additional events will be dropped. Defaults to unlimited.
- getNextEventID
-
An SQL query statement used to retrieve the ID of the next new event. Changing this setting is not recommended.
- socketSoTimeoutRequired
-
Whether to set a timeout value on the
Eventd
receiver socket. - socketSoTimeoutPeriod
-
The socket timeout, in milliseconds, to set if
socketSoTimeoutRequired
is set toyes
. - logEventSummaries
-
Whether to log a simple (terse) summary of every event at level
INFO
. Useful when troubleshooting event processing on busy systems whereDEBUG
logging is not practical.
8.4.2. The eventconf.xml file and its tributaries
The set of known events is configured in OPENNMS_HOME/etc/eventconf.xml
.
This file opens with a <global>
element, whose <security>
child element defines which event fields may not be overridden in the body of an event submitted via any Eventd
listener.
This mechanism stops a mailicious actor from, for instance, sending an event whose operator-action
field amounts to a phishing attack.
After the <global>
element, this file consists of a series of <event-file>
elements.
The content of each <event-file>
element specifies the path of a tributary file whose contents will be read and incorporated into the event configuration.
These paths are resolved relative to the OPENNMS_HOME/etc
directory; absolute paths are not allowed.
Each tributary file contains a top-level <events>
element with one or more <event>
child elements.
Consider the following event definition:
<event>
<uei>uei.opennms.org/nodes/nodeLostService</uei>
<event-label>OpenNMS-defined node event: nodeLostService</event-label>
<descr><p>A %service% outage was identified on interface
%interface% because of the following condition: %parm[eventReason]%.</p> <p>
A new Outage record has been created and service level
availability calculations will be impacted until this outage is
resolved.</p></descr>
<logmsg dest="logndisplay">
%service% outage identified on interface %interface%.
</logmsg>
<severity>Minor</severity>
<alarm-data reduction-key="%uei%:%dpname%:%nodeid%:%interface%:%service%" alarm-type="1" auto-clean="false"/>
</event>
Every event definition has this same basic structure. See Anatomy of an Event for a discussion of the structural elements.
When setting severities of events, it’s important to consider each event in the context of your infrastructure as a whole.
Events whose severity is critical at the zoomed-in level of a single device may not merit a Critical
severity in the zoomed-out view of your entire enterprise.
Since an event with Critical
severity can never have its alarms escalated, this severity level should usually be reserved for events that unequivocally indicate a truly critical impact to the business.
Rock legend Nigel Tufnel offered some wisdom on the subject.
Various tokens can be included in the description, log message, operator instruction and automatic actions for each event. These tokens will be replaced by values from the current event when the text for the event is constructed. Not all events will have values for all tokens, and some refer specifically to information available only in events derived from SNMP traps.
%eventid%
-
The event’s numeric database ID
%uei%
-
The Universal Event Identifier for the event.
%source%
-
The source of the event (which OpenNMS Horizon service daemon created it).
%descr%
-
The event description.
%logmsg%
-
The event logmsg.
%time%
-
The time of the event.
%shorttime%
-
The time of the event formatted using DateFormat.SHORT for a completely numeric date/time.
%dpname%
-
The ID of the Minion (formerly distributed poller) that the event was received on.
%nodeid%
-
The numeric node ID of the device that caused the event, if any.
%nodelabel%
-
The node label for the node given in
%nodeid%
if available. %nodelocation%
-
The node location for the node given in
%nodeid%
if available. %host%
-
The host at which the event was generated.
%interface%
-
The IP interface associated with the event, if any.
%foreignsource%
-
The Requisition name for the node given in
%nodeid
if available. %foreignid%
-
The Requisition ID for the node given in
%nodeid
if available. %ifindex%
-
The interface’s SNMP ifIndex.
%interfaceresolv%
-
Does a reverse lookup on the
%interface%
and returns its name if available. %service%
-
The service associated with the event, if any.
%severity%
-
The severity of the event.
%snmphost%
-
The host of the SNMP agent that generated the event.
%id%
-
The SNMP Enterprise OID for the event.
%idtext%
-
The decoded (human-readable) SNMP Enterprise OID for the event (?).
%ifalias%
-
The interface’s SNMP ifAlias.
%generic%
-
The Generic trap-type number for the event.
%specific%
-
The Specific trap-type number for the event.
%community%
-
The community string for the trap.
%version%
-
The SNMP version of the trap.
%snmp%
-
The SNMP information associated with the event.
%operinstruct%
-
The operator instructions for the event.
%mouseovertext%
-
The mouse over text for the event.
%tticketid%
-
The trouble ticket id associated with the event if available.
%primaryinterface%
-
The primary interface IP address for the node given in
%nodeid%
if available.
A node may have additional asset records stored for it.
You can access these records using the asset
replacement token, which takes the form:
%asset[<token>]%
-
The asset field <token>'s value, or "Unknown" if it does not exist.
A node may have additional hardware details stored for it.
You can access these details using the hardware
replacement token, which takes the form:
%hardware[<token>]%
-
The hardware field <token>'s value.
Many events carry additional information in parameters (see Anatomy of an Event).
These parameters may start life as SNMP trap variable bindings, or varbinds for short.
You can access event parameters using the parm
replacement token, which takes several forms:
%parm[all]%
-
Space-separated list of all parameter values in the form
parmName1="parmValue1" parmName2="parmValue2"
and so on. %parm[values-all]%
-
Space-separated list of all parameter values (without their names) associated with the event.
%parm[names-all]%
-
Space-separated list of all parameter names (without their values) associated with the event.
%parm[<name>]%
-
Will return the value of the parameter named
<name>
if it exists. %parm[##]%
-
Will return the total number of parameters as an integer.
%parm[#<num>]%
-
Will return the value of parameter number
<num>
(one-indexed). %parm[name-#<num>]%
-
Will return the name of parameter number
<num>
(one-indexed).
eventconf.xml
tributary filesThe ordering of event definitions is very important, as an incoming event is matched against them in order. It is possible and often useful to have several event definitions which could match variant forms of a given event, for example based on the values of SNMP trap variable bindings.
The tributary files included via the <event-file>
tag have been broken up by vendor. When OpenNMS Horizon starts, each tributary file is loaded in order.
The ordering of events inside each tributary file is also preserved.
The tributary files listed at the very end of eventconf.xml
contain catch-all event definitions.
When slotting your own event definitions, take care not to place them below these catch-all files; otherwise your definitions will be effectively unreachable.
-
To save memory and shorten startup times, you may wish to remove event definition files that you know you do not need.
-
If you need to customize some events in one of the default tributary files, you may wish to make a copy of the file containing only the customized events, and slot the copy above the original; this practice will make it easier to maintain your customizations in case the default file changes in a future release of OpenNMS Horizon.
8.4.3. Reloading the event configuration
After making manual changes to OPENNMS_HOME/etc/eventconf.xml
or any of its tributary files, you can trigger a reload of the event configuration by issuing the following command on the OpenNMS Horizon server:
OPENNMS_HOME/bin/send-event.pl uei.opennms.org/internal/reloadDaemonConfig -p 'daemonName Eventd'
8.5. Debugging
When debugging events, it may be helpful to lower the minimum severity at which Eventd
will log from the default level of WARN
.
To change this setting, edit OPENNMS_HOME/etc/log4j2.xml
and locate the following line:
<KeyValuePair key="eventd" value="WARN" />
Changes to log42.xml
will be take effect within 60 seconds with no extra action needed.
At level DEBUG
, Eventd
will log a verbose description of every event it handles to OPENNMS_HOME/logs/eventd.log
.
On busy systems, this setting may create so much noise as to be impractical.
In these cases, you can get terse event summaries by setting Eventd
to log at level INFO
and setting logEventSummaries="yes"
in OPENNMS_HOME/etc/eventd-configuration.xml
.
Note that changes to eventd-configuration.xml
require a full restart of OpenNMS Horizon.
8.5.1. Karaf Shell
The opennms:show-event-config
command can be used to render the event definition for one or more event UEIs (matching a substring) to XML.
This command is useful for displaying event definitions which may not be easily accessible on disk, or verifying that particular events were actually loaded.
$ ssh -p 8101 admin@localhost
...
admin@opennms()> opennms:show-event-config -u uei.opennms.org/alarms
9. Alarms
OpenNMS Horizon has the ability to monitor the state of problems with its managed entities (ME), their resources, the services they provide, as well as the applications they host; or more simply, the Network. In OpenNMS Horizon, the state of these problems are characterized as Alarms.
Before Alarmd was created, OpenNMS' Events (or messages) were used not only as interprocess communication messages (IPC), but also as indications of problems in the network. Even today, OpenNMS Events still carry problem state attributes such as: Acknowledgement and Severity. However, these attributes have long since been functionally deprecated now that Alarms are used as the indicator for problems in the network, (see also Situations and Business Services).
A significant change occurred with the release of Horizon 23.0.0 (H23). Prior to H23 and since the introduction of Alarms in OpenNMS, Alarmd was designed and configured to track the state of a problem using two Alarms; a Down and an Up Alarm. Now, OpenNMS is designed with the intention to use a single Alarm to track the state of a problem. The old behavior can be re-enabled by setting the system property org.opennms.alarmd.legacyAlarmState = true. |
9.1. Single Alarm Tracking Problem States
9.2. Alarm Service Daemon
Alarmd, the Alarm Service Daemon, has the very simple task of processing Events representing problems in the Network. It either instantiates a new alarm for tracking a problem’s state or reducing a reoccurring Event of an existing problem into the same Alarm. (Also known as Alarm de-duplication)
Prior to OpenNMS Horizon version 23.0.0 (H23), Alarmd had no configuration. With the release of H23, Drools is now imbedded directly inline with Alarmd’s Event processing function. This provides users with a more robust infrastructure for the effective management of workflow and problem states in the Network. Business rules now replace the function of the ''Automations'' that were previously defined in Vacuumd’s configuration. You will find these new business rules in the etc/alarmd/drools-rules.d/ folder.
alarmd.drl
9.3. Configuring Alarms
Since Alarmd instantiates Alarms from Events, defining Alarms in OpenNMS Horizon entails defining an additional XML element of an Event indicating a problem or resolution in the Network. This additional element is the "alarm-data" element.
Any Event that is marked as "donotpersist" in the logmsg element’s "dest" attribute, will not be processed as an Alarm. |
<element name="alarm-data">
<annotation>
<documentation>This element is used for converting events into alarms.</documentation>
</annotation>
<complexType>
<sequence>
<element ref="this:update-field" minOccurs="0" maxOccurs="unbounded" />
</sequence>
<attribute name="reduction-key" type="string" use="required" />
<attribute name="alarm-type" use="required" >
<simpleType>
<restriction base="int">
<minInclusive value="1"/>
</restriction>
</simpleType>
</attribute>
<attribute name="clear-key" type="string" use="optional" />
<attribute name="auto-clean" type="boolean" use="optional" default="false" />
<attribute name="x733-alarm-type" type="this:x733-alarm-type" use="optional" />
<attribute name="x733-probable-cause" type="int" use="optional" />
</complexType>
</element>
<element name="update-field">
<complexType>
<attribute name="field-name" type="string" use="required" />
<attribute name="update-on-reduction" type="boolean" use="optional" default="true" />
<attribute name="value-expression" type="string" use="optional" default="" />
</complexType>
</element>
<simpleType name="x733-alarm-type">
<restriction base="string" >
<pattern value="CommunicationsAlarm|ProcessingErrorAlarm|EnvironmentalAlarm|QualityOfServiceAlarm|EquipmentAlarm|IntegrityViolation|SecurityViolation|TimeDomainViolation|OperationalViolation|PhysicalViolation" />
</restriction>
</simpleType>
NOTE See also: Anatomy of an Event
The critical attribute when defining the alarm-data of an Event, is the reduction-key. This attribute can contain literal strings as well as references to properties (fields and parameters) of the Event. The purpose of the reduction-key is to uniquely identify the signature of a problem and, as such, is used to reduce (de-duplicate) Events so that only one problem is instantiated. Most commonly, the event’s identifier (UEI) is used as the left most (least significant) portion of the reduction-key, followed by other properties of the Event from least to most significant and, traditionally, separated with the literal ':'.
<event>
<uei>uei.opennms.org/nodes/nodeDown</uei>
...
<alarm-data reduction-key="%uei%:%dpname%:%nodeid%" alarm-type="1" auto-clean="false"/>
</event>
Decreasing the significance of the reduction-key is a way to aggregate, for example, all nodes down in to a single alarm. However, there are caveats: |
<event>
<uei>uei.opennms.org/nodes/nodeDown</uei>
<alarm-data reduction-key="%uei%" alarm-type="1"/>
</event>
With this reduction-key, a single alarm would be instantiated for all nodes that were determined by the Poller to be down. There would be a single alarm with the count representing the number of nodes down. However, the UEI uei.opennms.org/nodes/nodeUp would not be a good ''pair wise'' reduction-key for resolving this alarm as it would take only a single ''node up'' to clear all nodes down tracked with this single alarm configuration.
The second most critical attribute is the alarm-type. There are currently three types of alarms: problem (1), resolution (2), and notification (3). The alarm-type attribute helps Alarmd with pair-wise resolution… the matching of resolution events to problem events.
This attribute is used in the pair-wise correlation feature of Alarmd. When configuring a resolution Alarm, set this attribute to match the reduction-key of a the corresponding problem Alarm.
This attribute instructs Alarmd to only retain the most recent Event reduced into an alarm. For alarms that are super chatty, this is a way to reduce the size of the most recent Events in the database.
Do not use this feature with Alarms that have pair-wise correlation (matching problems with resolutions). |
Use this element to override Alarmd’s default behavior for which some fields are updated during reduction. The Alarm fields that are currently allowed to be controlled this way are: .Bulleted * distpoller * ipaddr * mouseover * operinstruct * severity * descr * acktime * ackuser
With the new single alarm behavior in H23, if an Alarm transitions from an alarm-type 2 back to alarm-type 1 the Severity will be set to the most Event’s value. |
Alarmd is designed to reduce multiple occurrences of an Alarm into a single alarm.
Alarmd is also intrinsically designed to automatically match resolving events with an existing Alarm. Alarms with matching resolutions with problems (Ups with Downs), should be indicated with the alarm-type attribute. .Bulleted * alarm-type="1" (problem alarm) * alarm-type="2" (resolving alarm) * alarm-type="3" (notification alarm… alarm with no resolution such as SNMP Authentication Failures)
Instantiate new Alarms for existing cleared problem
Also new in H23, a global property setting that controls behavior of alarm reduction of currently cleared Alarms.
|
Create a properties file called alarmd.properties in the $OPENNMS_ETC/opennms.properties.d/ folder and add the following property set to true:
###### Alarmd Properties ######
#
# Enable this property to force Alarmd to create new alarms when an problem re-occurs and the
# existing Alarm is in a "Cleared" state.
#
# Default: false
#org.opennms.alarmd.newIfClearedAlarmExists = false
org.opennms.alarmd.newIfClearedAlarmExists = true
Now, with this property set, when a repeat incident occurs and the current state of the Alarm tracking the problem is "Cleared", instead of restating the current Alarm to it’s default severity and incrementing the counter, a new instance of the Alarm will be created. .New node down Alarm with existing cleared Alarm
What happens is that Alarmd will alter the existing Alarm’s reductionKey to be unique. Thus preventing it from ever again being reused for a reoccurring problem in the Network (the literal ":ID:" and the alarm ID is appended to the reductionKey).
Re-enable legacy dual Alarm state behavior
Now in H23, a global property setting can set to re-enable the legacy dual Alarm behavior.
|
Create a properties file called alarmd.properties in the $OPENNMS_ETC/opennms.properties.d/ folder and add the following property set to true:
###### Alarmd Properties ######
# Enable this property to have the traditional dual alarm handling of alarms state
# for Alarm pairwise correlation.
# Default: false
#org.opennms.alarmd.legacyAlarmState = false
org.opennms.alarmd.legacyAlarmState = true
Setting legacyAlarmState will nullify newIfClearedAlarmExists |
9.4. Alarm Notes
OpenNMS Horizon creates an Alarm for issues in the network. Working with a few people in a team, it is helpful to share information about a current Alarm. Alarm Notes can be used to assign comments to a specific Alarm or a whole class of Alarms. . The figure Alarm Detail View shows the component to add these information in Memos to the Alarm.
The Alarm Notes allows to add two types of notes on an existing Alarm or Alarm Class:
-
Sticky Memo: A user defined note for a specific instance of an Alarm. Deleting the Alarm will also delete the sticky memo.
-
Journal Memo: A user defined note for a whole class of alarms based on the resolved reduction key. The Journal Memo will be shown for all Alarms matching a specific reduction key. Deleting an Alarm doesn’t remove the Journal Memo, they can be removed by pressing the "Clear" button on an Alarm with the existing Journal Memo.
If an Alarm has a sticky and/or a Journal Memo it is indicated with two icons on the "Alarm list Summary" and "Alarm List Detail".
9.5. Alarm Sounds
Often users want an audible indication of a change in alarm state. The OpenNMS Horizon alarm list page has the optional ability to generate a sound either on each new alarm or (more annoyingly) on each change to an alarm event count on the page.
The figure Alarm Sounds View shows the alarm list page when alarms sounds are enabled.
By default the alarm sound feature is disabled. System Administrators must activate the sound feature and also set the default sound setting for all users. However users can modify the default sound setting for the duration of their logged-in session using a drop down menu with the following options:
-
Sound off: no sounds generated by the page.
-
Sound on new alarm: sounds generated for every new alarm on the page.
-
Sound on new alarm count: sounds generated for every increase in alarm event count for alarms on the page.
9.6. Flashing Unacknowledged Alarms
By default OpenNMS Horizon displays the alarm list page with acknowledged and unacknowledged alarms listed in separate search tabs. In a number of operational environments it is useful to see all of the alarms on the same page with unacknowledged alarms flashing to indicate that they haven’t yet been noticed by one of the team. This allows everyone to see at a glance the real time status of all alarms and which alarms still need attention.
The figure Alarm Sounds View also shows the alarm list page when flashing unacknowledged alarms are enabled. Alarms which are unacknowledged flash steadily. Alarms which have been acknowledged do not flash and also have a small tick beside the selection check box. All alarms can be selected to be escalated, cleared, acknowledged and unacknowledged.
9.7. Configuring Alarm Sounds and Flashing
By default OpenNMS Horizon does not enable alarm sounds or flashing alarms. The default settings are included in opennms.properties. However rather than editing the default opennms.properties file, the system administrator should enable these features by creating a new file in opennms.properties.d and applying the following settings;
${OPENNMS_HOME}/etc/opennms.properties.d/alarm.listpage.properties
# ###### Alarm List Page Options ######
# Several options are available to change the default behaviour of the Alarm List Page.
# <opennms url>/opennms/alarm/list.htm
#
# The alarm list page has the ability to generate a sound either on each new alarm
# or (more annoyingly) on each change to an alarm event count on the page.
#
# Turn on the sound feature. Set true and Alarm List Pages can generate sounds in the web browser.
opennms.alarmlist.sound.enable=true
#
# Set the default setting for how the Alarm List Pages generates sounds. The default setting can be
# modified by users for the duration of their logged-in session using a drop down menu .
# off = no sounds generated by the page.
# newalarm = sounds generated for every new alarm in the page
# newalarmcount = sounds generated for every increase in alarm event count for alarms on the page
#
opennms.alarmlist.sound.status=off
# By default the alarm list page displays acknowledged and unacknowledged alarms in separate search tabs
# Some users have asked to be able to see both on the same page. This option allows the alarm list page
# to display acknowledged and unacknowledged alarms on the same list but unacknowledged alarms
# flash until they are acknowledged.
#
opennms.alarmlist.unackflash=true
The sound played is determined by the contents of the following file ${OPENNMS_HOME}/jetty-webapps/opennms/sounds/alert.wav
If you want to change the sound, create a new wav file with your desired sound, name it alert.wav
and replace the default file in the same directory.
9.8. Alarm History
The Alarm History feature integrates with Elasticsearch to provide long term storage and maintain a history of alarm state changes.
When enabled, alarms are indexed in Elasticsearch when they are created, deleted, or when any of the "interesting" fields on the alarm are updated (more on this below.)
Alarms are indexed in such a fashion that allows operators to answer the following questions:
-
What were all the state changes of a particular alarm?
-
What was the last known state of an alarm at a given point in time?
-
Which alarms were present (i.e. not deleted) on the system at a given point in time?
-
Which alarms are currently present on the system?
A simple REST API is also made available for the purposes of evaluating the results, verifying the data that is stored and providing examples on how to query the data.
9.8.1. Requirements
This feature requires Elasticsearch 7.x.
9.8.2. Setup
Alarm history indexing can be enabled as follows:
First, login to the Karaf shell of your OpenNMS Horizon instance and configure the Elasticsearch client settings to point to your Elasticsearch cluster. See Elasticsearch Integration Configuration for a complete list of available options.
$ ssh -p 8101 admin@localhost
...
admin@opennms()> config:edit org.opennms.features.alarms.history.elastic
admin@opennms()> config:property-set elasticUrl http://es:9200
admin@opennms()> config:update
Next, install the opennms-alarm-history-elastic
feature from that same shell using:
admin@opennms()> feature:install opennms-alarm-history-elastic
In order to ensure that the feature continues to be installed as subsequent restarts, add opennms-alarm-history-elastic
to the featuresBoot
property in the ${OPENNMS_HOME}/etc/org.apache.karaf.features.cfg
.
9.8.3. Alarm indexing
When alarms are initially created, we push a document to Elasticsearch that includes all of the alarm fields as well as additional details on some of the related objects (i.e. the node.)
In order to avoid pushing a new document every time a new event is reduced on to an existing alarm, we only push a new document when (at least) one of these conditions are met:
-
We have not recently pushed a document for that alarm. (See
alarmReindexDurationMs
.) -
The severity of the alarm has changed.
-
The alarm has been acknowledged or unacknowledged.
-
Either of the associated sticky or journal memos have changed.
-
The state of the associated ticket has changed.
-
The alarm has been associated with, or removed, from a situation.
-
A related alarm has been added or removed from the situation.
To change this behaviour and push a new document for every change, you can set indexAllUpdates to true .
|
When alarms are deleted, we push a new document that contains the alarm id, reduction key, and deletion time.
The following table describes a subset of the fields in the alarm document:
Field | Description |
---|---|
|
Timestamp in milliseconds associated with the first event that triggered this alarm. |
|
Timestamp in milliseconds associated with the last event that triggered this alarm. |
|
Timestamp in milliseconds at which the document was created. |
|
Timestamp in milliseconds when the alarm was deleted. |
|
Database ID associated with the alarm. |
|
Key used to reduce events on to the alarm. |
|
Severity of the alarm. |
|
Numerical ID used to represent the severity. |
9.8.4. Options
In addition to those mentioned in Elasticsearch Integration Configuration, the following properties can be set in ${OPENNMS_HOME}/etc/org.opennms.features.alarms.history.elastic.cfg
:
Property | Description | Required | default |
---|---|---|---|
indexAllUpdates |