Copyright © 2004-2005 The OpenNMS Group, Inc.
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.1 or any later version published by the Free Software Foundation; with no Invariant Sections, with no Front-Cover Texts and with no Back-Cover Texts. A copy of the license is available at http://www.gnu.org/copyleft/fdl.html
Table of Contents
OpenNMS is the creation of numerous people and organizations, operating under the umbrella of the OpenNMS project. The original code base was developed and published under the GPL by the Oculan Corporation until 2002, when the project administration was passed on to Tarus Balog.
The current corporate sponsor of OpenNMS is The OpenNMS Group, which also owns the OpenNMS trademark.
OpenNMS is a derivative work, containing both original code, included code and modified code that was published under the GNU General Public License. Please see the source for detailed copyright notices, but some notable copyright owners are listed below:
Copyright © 2002-2005 The OpenNMS Group, Inc..
Original code base for OpenNMS version 1.0.0 Copyright © 1999-2001 Oculan Corporation.
Mapping code Copyright © 2003 Networked Knowledge Systems, Inc.
ScriptD code Copyright © 2003 Tavve Software Company.
Release 1.2.3 is another maintenance release in the OpenNMS stable branch. It addresses a polling issue that may cause polling to stop, as well as adding a final version of the poll-outages calendar. Also included are a number of new events and reports.
The code word for 1.2.3 is "sith".
Release 1.2.2 is 1.2.1 with one annoying webUI bug fixed.
Release 1.2.1 is a maintenance release in the OpenNMS stable branch. It contains a few bug fixes and several small features. "Stable" releases may contain new features, as long as it is determined that they pose a very small risk to current features. Usually, these features will add new functionality without modifying existing feature behavior.
The code word for 1.2.1 is "luckycharms".
Release 1.2.0 is the first stable, or production, release of OpenNMS in a long time. It is basically 1.1.5 with numerous bug fixes. With the release of 1.2.0, we will begin development on 1.3 - the next unstable release.
The code word for 1.2.0 is "valentine".
Release 1.1.5 is the second release candidate for 1.2. The biggest change involves poller code that has been nearly rewritten. This was in response to a new feature concerning notifications, and while we believe it to be very robust, it does represent a lot of code churn, so if you are using 1.1.4 in production, you may wish to wait a bit before upgrading to 1.1.5.
Also, with this version of the release notes, we are implementing a "code word". We try to make these notes useful, including tips on the majority of issues people will face while using the application, but still people tend to skip reading them. So now, if you want to post a 1.1.5-related question on the mailing lists, be sure to include the code word so people will know you have at least read this far into the release notes (grin).
The code word for 1.1.5 is "monsoon".
A quick summary of the new changes in 1.1.5, in no particular order:
New notification behavior - users can configure OpenNMS to send notices when an event is auto-acknowledged by notifd and they will only be sent to those people who received notice of the outage in the first place.
A new RADIUS poller
A number of new event definitions, including those from NetBotz, PATROL, Network Appliance and others.
Reorganized the datacollection-config.xml
and
snmp-graph.properties
files to make them easier to
manage. Also added a really cool CPU report for Net-SNMP agents.
KSC Reports can now be accessed from the main page.
Removed the need for a primary SNMP interface (i.e. the ability to map an ifIndex to an IP Address in the ipAddrTable. This will help with data collection from some devices.
Added database support for the ifAlias field.
Added a MATCH-ANY-UEI event. Useful in notifications, this event will cause a notification on any UEI. Now, for example, all events from a particular IP Address or Node can be sent as notifications.
There were also a large number of bug fixes and other small enhancements made to the code. Please test the heck out of this release and be sure to add issues to bugzilla (http://bugzilla.opennms.org) and we'll get 1.2.0 out as soon as possible.
Release 1.1.4 is the release candidate for 1.2 - the next stable branch of OpenNMS. It represents a tremendous amount of work in the areas of speed and performance. It incorporates many changes and bug fixes to make OpenNMS more Java centric - including a new installer. Current production users should note that this release contains a lot more changes to the underlying code than any release since 1.1.0, so if you are currently happy with 1.1.3 or an earlier version, you may want to wait a week or two before upgrading, or wait until 1.2.
OpenNMS 1.1.3 is a major milestone on the way to the next stable release, 1.2. It contains a number of improvements, especially under the covers, and is the first release created under the new build system.
OpenNMS 1.1.2 adds several new features and bug fixes. New features include a JDBC poller, a script-based poller and script-based event handler, as well as contributed support for a map.
OpenNMS 1.1.1 is the next step towards the production 1.2 release. It contains a number of new features and bug fixes.
OpenNMS 1.1 extends the work that was begun with 1.0 to make OpenNMS more powerful and easier to use. Almost all of the new functionality was suggested by current OpenNMS users. It is hoped that these improvements will prove useful, and will lead to even more suggestions on how to improve the product.
OpenNMS 1.0.2 is a maintenance release that fixes several code issues.
OpenNMS 1.0.1 is a maintenance release that fixes several code issues.
Please, let us know if you have any problems at all at the OpenNMS Bugzilla page.
Table of Contents
OpenNMS 1.1.0 represents a refinement of the functionality introduced in 1.0.0. The 1.1 tree is a development, or "unstable" tree, implementing a number of new features but without the testing that went into 1.0. When 1.1 is mature enough, it will become 1.2, the next production or "stable" release. Release 1.1.4 was a major step toward that next stable release, and version 1.1.5 continues that progress.
The final version of Craig Miskell's (OGP) poll-outages editor is included in this release. Accessed from the Admin menu, it allows one to configure weekly, monthly and specific poll-outages from the webUI. A poll outage is where polling will stop for a particular set of devices or a package.
Choose "Scheduled Outages" from the admin page, and the GUI will display the currently configured outages. You can edit those or create your own. You can then apply them to notifications, or to particular packages in the threshd, collectd and poller configurations.
Please note that when applied to polling, no checks occur during the outage period. The services are not automatically considered "up". So if you have a maintenance window that starts at 2am and ends a 4am on Sundays, if you take the service down at 1:59 am and OpenNMS marks it as down, it will stay down until 4am.
Hat off to Craig for this feature.
If you happen to run OpenNMS on OS X, you can now send notifications via Growl. Simply use "growlMessage" as one of the services in your destination path. Note that you have to be on the machine running OpenNMS to see them.
There have been a number of event definitions added to OpenNMS, including those for NORTEL Contivity and Foundry devices. In addition, more support has been added for the SNMP Informant agent.
Some people have reported an issue with service polling stopping. It was determined that, for some reason on some systems, a particular device or devices will cause a polling thread to hang. This would block polling from occuring. While the root cause of the issue has not been found (i.e. why the thread hangs) the code was modified so that a hung thread would not block the whole queue.
There is only one bug addressed in 1.2.2 concerning possible webUI
exceptions if there are null values in the
issnmpprimary
field in the interface
table of the database.
Chris Abernethy contributed some code to allow notifications using the XMPP instant messaging protocol. This is most commonly associated with the Jabber open-source messaging project.
Edit the xmpp-configuration.properties
file
to include the username and password of the user you want to use as the
originator of the notice. Also include the server you wish to connect to
in order to send the notice.
Finally, add an XMPP Address to each user that should receive the notification, such as "myname@jabber.org".
If your destination path command includes xmppMessage, then the notification text will be sent via XMPP to the user's XMPP address. Note: be sure that the user has allowed messages from the OpenNMS XMPP user. When I first tested this, I was blocking all messages from people not on my "buddy" list and I never received the notice.
This was enhancement bug 1168.
Craig Miskell contributed some code that allows for text strings to be collected via SNMP and used to label SNMP Performance reports.
For example, the Net-SNMP agents allows one to collect on disk statistics:
disk / disk /boot disk /opt/distros
This will create a table where the first row is disk space on root, the second row is the disk space on /boot, etc.
The problem is that this configuration:
disk /boot disk / disk /opt/distros
Is also valid, but the /boot parition is now index one, and / is index two.
Previously, one had to be a little saavy about how to configure the order of disk volumes, but now the disk label can just be collected and displayed.
For example, we currently collect on the first instance of the Net-SNMP disk table, and just assume it's root:
<group name = "net-snmp-root-disk" ifType = "ignore"> <mibObj oid=".1.3.6.1.4.1.2021.9.1.6" instance="1" alias="ns-disk-root-tot" type="gauge"/> <mibObj oid=".1.3.6.1.4.1.2021.9.1.7" instance="1" alias="ns-disk-root-avail" type="gauge"/> <mibObj oid=".1.3.6.1.4.1.2021.9.1.8" instance="1" alias="ns-disk-root-used" type="gauge"/> <mibObj oid=".1.3.6.1.4.1.2021.9.1.9" instance="1" alias="ns-disk-root-pct" type="gauge"/> </group>
To use this feature, we convert it to something more generic, disk 1 instead of root:
<mibObj oid=".1.3.6.1.4.1.2021.9.1.2" instance="1" alias="ns-disk-1-name" type="string"/> <mibObj oid=".1.3.6.1.4.1.2021.9.1.6" instance="1" alias="ns-disk-1-tot" type="gauge"/> <mibObj oid=".1.3.6.1.4.1.2021.9.1.7" instance="1" alias="ns-disk-1-avail" type="gauge"/> <mibObj oid=".1.3.6.1.4.1.2021.9.1.8" instance="1" alias="ns-disk-1-used" type="gauge"/> <mibObj oid=".1.3.6.1.4.1.2021.9.1.9" instance="1" alias="ns-disk-1-pct" type="gauge"/>
Note the new first line. The ".2" OID in this table is the disk
label, which we now collect. It gets stored in a file called
$OPENNMS_HOME/share/rrd/snmp/$NODEID/strings.properties
,
where the $NODEID
is the node number of the device
being collected.
A sample string.properties file looks like this:
#Wed Mar 16 17:46:25 EST 2005 ns-disk-3-name=/opt/distros ns-disk-2-name=/boot ns-disk-1-name=/
Once the datacollection-config.xml
file is
changed, and OpenNMS is restarted, collectd will begin storing the
collected strings.
So, how are they used? To display them in reports, edit the
snmp-graph.properties
file. Everything in this file
that is in "curly braces" is something that OpenNMS supplied to RRDTool
(or jRobin).
report.netsnmp.disk1percent.name=Percentage Disk Space on 1 report.netsnmp.disk1percent.columns=ns-disk-1-pct report.netsnmp.disk1percent.type=node report.netsnmp.disk1percent.propertiesValues=ns-disk-1-name report.netsnmp.disk1percent.command=--title="Percentage Disk Space on {ns-disk-1-name}" \ DEF:dpercent={rrd1}:ns-disk-1-pct:AVERAGE \ LINE2:dpercent#0000ff:"% Disk Space Used" \ GPRINT:dpercent:AVERAGE:"Avg \\: %8.2lf %s" \ GPRINT:dpercent:MIN:"Min \\: %8.2lf %s" \ GPRINT:dpercent:MAX:"Max \\: %8.2lf %s\n"
Note the new "properitesValues" line. This line will look for the
value(s) listed in the appropriate
strings.properties
file, such as
ns-disk-1-name
. Then putting that value in curly
braces, as in {ns-disk-1-name} will cause OpenNMS to replace it with the
proper value.
By default, OpenNMS now collects on the first 5 disks defined by Net-SNMP and disks 2 through 4 as defined by the host-resources MIB. Please feel free to improve on this configurations, and post your diff as an enchancement bug on http://bugzilla.opennms.org for inclusion in a future release.
Some caveats: Currently this only works for node level information. Also, the labels can't be used in notifications or events, so if you get a highThreshold alert on disk usage, you won't be able to tell exactly which disk (but it should be easy to look up).
Many thanks to Craig for this work.
Bill Ayres (OGP) has improved the DHCP poller. The new default
dhcp-configuration.xml
file now looks like
this:
<DhcpdConfiguration port="5818" macAddress="00:06:0D:BE:9C:B2" myIpAddress="127.0.0.1" extendedMode="false" requestIpAddress="127.0.0.1"> </DhcpdConfiguration>
The port
and macAddress
options were there before, but now you can add
myIpAddress
, extendedMode
and a
requestIpAddress
.
With the default configuration, the DHCP poller should act as before (although on my test network it now correctly sees that my Airport Express does not have a DHCP server on it that is accessible from the "wired" side). This is "broadcast" mode.
The following describes each of the new variables:
myIpAddress: This parameter will usually be set to the ip address of the OpenNMS server, which puts the DHCP poller in "relay" mode as opposed to "broadcast" mode. In "relay" mode, the DHCP server being polled will unicast its responses directly back to the specified ip address rather than broadcasting its responses. This allows DHCP servers to be polled even though they are not on the same subnet as the OpenNMS server, and without the aid of an external relay.
usage: myIpAddress="10.11.12.13" or myIpAddress="broadcast" (default)
extendedMode: When extendedMode is false, the DHCP poller will send a DISCOVER and expect an OFFER in return. When extendedMode is true, the DHCP poller will first send a DISCOVER. If no valid response is received it will send an INFORM. If no valid response is received it will then send a REQUEST. OFFER, ACK, and NAK are all considered valid responses in extendedMode.
usage: extendedMode="true" or extendedMode="false" (default)
requestIpAddress: This parameter only applies to REQUEST queries sent to the DHCP service when extendedMode is true. If an ip address is specified, that ip address will be requested in the query. If "targetHost" is specified, the DHCP server's own ip address will be requested. Since a well-managed server will probably not respond to a request for its own ip, this parameter can also be set to "targetSubnet". This is similar to "targetHost" except the DHCP server's ip address is incremented or decremented by 1 to obtain an ip address that is on the same subnet. (The resulting address will not be on the same subnet if the DHCP server's subnet is a /32 or /31. Otherwise, the algorithm used should be reliable.)
usage: requestIpAddress="10.77.88.99" or requestIpAddress="targetHost" or requestIpAddress="targetSubnet" (default)
Caution on usage: If in extended mode, the time required to complete the poll for an unresponsive node is increased by a factor of 3. Thus it is a good idea to limit the number of retries to a small number.
Thanks to Bill for his hard work on this.
snmp-config.xml
Gerald Turner has written a useful admin GUI for managing the
snmp-config.xml
file. It allows you to add specific
IP addresses and IP address ranges and associate them with community
strings.
For example, if you have the following default
snmp-config.xml
file:
<snmp-config retry="3" timeout="800" read-community="public" write-community="private"> </snmp-config>
this will use the default community string of "public" for SNMP requests. Using the GUI, one can add a range that, say, uses "public2":
<snmp-config retry="3" timeout="800" read-community="public" write-community="private"> <definition read-community="public2"> <range begin="192.168.0.1" end="192.168.0.254"/> </definition> </snmp-config>
You can also add specific IP addresses, say this one for "public3":
<snmp-config retry="3" timeout="800" read-community="public" write-community="private"> <definition read-community="public2"> <range begin="192.168.0.1" end="192.168.0.254"/> </definition> <definition read-community="public3"> <specific>192.168.1.1</specific> </definition> </snmp-config>
This feature will automatically do things like split ranges. For example, add "public4" as the string for 192.168.0.50:
<snmp-config retry="3" timeout="800" read-community="public" write-community="private"> <definition read-community="public2"> <range begin="192.168.0.1" end="192.168.0.49"/> <range begin="192.168.0.51" end="192.168.0.254"/> </definition> <definition read-community="public3"> <specific>192.168.1.1</specific> </definition> <definition read-community="public4"> <specific>192.168.0.50</specific> </definition> </snmp-config>
There is a caveat. If you add a specific entry like 192.168.1.1 above, and then go back and add a range that includes that specific IP, the range will overwrite the specific entry, as in this example for "public5":
<snmp-config retry="3" timeout="800" read-community="public" write-community="private"> <definition read-community="public2"> <range begin="192.168.0.1" end="192.168.0.49"/> <range begin="192.168.0.51" end="192.168.0.254"/> </definition> <definition read-community="public4"> <specific>192.168.0.50</specific> </definition> <definition read-community="public5"> <range begin="192.168.1.1" end="192.168.1.254"/> </definition> </snmp-config>
This feature was designed to help manage this file before
discovery It will not update the community string for an SNMP agent that
is already being collectd. In the case that the community string for a
device changes, and this results in a data collection failure event, it
is possible to change the community string to the correct one without
restarting OpenNMS. Use this GUI to make sure the community string is
correctly in the snmp-config.xml
file, and then go
to the device's node page. Click the link that says "Update SNMP" and
collection should resume.
This was bug 1167, and I'm leaving it open as an enhancement
because there is one thing left I'd like to see. I'd like to see the
current contents of the snmp-config.xml
file
displayed on this page as well. It is hoped that it will be available
for the next release.
Hats of to Gerald for this work.
There were a number of smaller features included in this release:
Added the RESOLVED tag to the body as well as the subject of notifications.
Added a Linksys event to stop connection traps from Linksys gear.
Added a "focus" to the Add Interface admin page so that it is no longer required to select the field before typing the IP address.
Greatly improved the execution speed of "Configure SNMP Data Collection per Interface" in the Admin webUI.
Added the ability to display the ifAlias in events or notifications using %ifalias%.
Improved the error message when entering an invalid Outage ID in a search.
Numerous small improvements to the node page in the web UI.
The following is a list of bug fixes included in this release:
There was an issue with SNMP version 2 data collection on large devices (i.e. lots of interfaces) where the ifIndex values weren't contiguous. This has been corrected. Bug 1141.
Applied the patch to help with 30 second outages. Bug 1180.
Many people reported issues with 1.2.0 errors with the new ifAlias field. It was too small to hold the entire string associated with some interfaces, so the size of this field has been increased to 256.
Fixed an issue with the service unresponsive behavior being active even when not configured. Bug 1151.
Corrected a problem with Top 20 availability reports including the wrong services. Bug 945.
The correct time is now shown in notifications, versus GMT. Bug 1114.
In addition, DJ Gregor (OGP) has done a number of improvements to the build system.
The only changes between 1.1.5 and 1.2.0 are some bug fixes and the addition of a few more datacollection values and reports.
The current development goal of the OpenNMS team is to get a new, stable release out as soon as time will allow. This will allow for a new development cycle that can be very daring in its scope while still providing a stable, production application. This release is our second release candidate for 1.2.0, and if testing goes well it will be released shortly.
The reason 1.1.4 did not become 1.2.0 was due to a lot of code changes to the poller. In fact, the poller has been pretty much rewritten and the code reorganized. Those who look at the new poller code should be impressed with the work Matt (OGP) did.
1.1.5 also represents a new development philosophy for the team. OpenNMS is now written using "test driven development". In a nutshell: when a new feature is desired, or a bug needs fixing, the first piece of code that is written is a test to insure that the feature works or that the bug does not occur. Since no change is made to the code at this point, the test obviously will fail. Then the feature is written or the bug fixed until the test passes, and the final step is to clean up the code to remove duplication, etc.
Over the next year, this main change in the development philosophy of OpenNMS will reap huge benefits in terms of speed of development and code quality.
The reason the poller code was rewritten was mainly in support of this one new feature. OpenNMS has three main functions: service polling, data collection and event management. This last feature was hampered by an issue with auto-acknowledgement.
Here's the scenario: A service is down and an event is generated. This event triggers a notification, which walks a destination path. The first step in the path is to wait for 2 minutes before doing anything, and then it is to page the on call person, and if the notification is not acknowledged in 15 minutes, the manager is to get paged.
Well, let's assume the outage occurs at 2 in the morning and lasts for four minutes. After the first two minutes, a page is sent to the on-call person and it wakes him up.
Now, prior to this release, when the service is restored two minutes later, the auto-acknowledge code would acknowledge the "down" notice, so the manager would not get paged. However, unless notices were configured for "up" events, the on call person would not know that the service has been restored, so they would have to wake up, get logged in and then notice that the problem had gone away.
If notices were configured for "up" events, the on call person would know that the service was up and could go back to bed. Unfortunately, if the "up" notice walked the same path as the "down" notice, it would have to be manually acknowledged or the manager would get a page saying that the service was now "up" without ever receiving the "down" page.
This obviously isn't very useful.
So now there is an option to send a "resolution" notice when a "down" event gets auto-acknowledged by an "up" event. The same notice that gets sent with the "down" will be sent with the "up" with the word "RESOLVED:" prefixed to the message. It will only be sent to those people who got the "down" notice and in the same manner, i.e. e-mail, page, etc.
There are a few things to consider:
Notices concerning "up" events will still get treated as separate notifications. This only concerns auto-acknowledged events. To avoid the scenario above, "up" notices should be turned off, and they have been in the default configuration.
OpenNMS "down" events, such as nodeLostService and nodeDown will now always be matched with the corresponding "up" event. For example: HTTP stops on a server, so a nodeLostService event is sent. A short while later the node itself goes down, causing a nodeDown event. When the server is restored, everything is running, so a nodeUp event gets generated and a nodeRegainedService event for HTTP. In the past this would not happen. Once the node came back up, if HTTP was running no event would be sent. If the service was still down, a second nodeLostService event would be generated.
For those of you who keep up with the code, notice that there is no longer a separate outage daemon. All of that functionality has been integrated into the poller.
To configure this functionality, look at the notifd-configuration.xml file. Here is an example:
<auto-acknowledge resolution-prefix="RESOLVED: " notify="true" uei="uei.opennms.org/nodes/nodeRegainedService" acknowledge="uei.opennms.org/nodes/nodeLostService"> <match>nodeid</match> <match>interfaceid</match> <match>serviceid</match> </auto-acknowledge>
This says "when a nodeRegainedService event is generated, auto-acknowledge all notices based on a nodeLostService event where the nodeid, interfaceid and serviceid match. In addition, send an notification to all parties who received the "down" notice prefixed with the word RESOLVED:".
To disable this feature, set notify="false". "true" is the default. The resolution-prefix can be changed to suit non-English languages or user preference.
Jonathan Sartin (OGP) has written a new poller to poll RADIUS servers. The relavent configuration in capsd is:
<protocol-plugin protocol="RadiusAuth" class-name="org.opennms.netmgt.capsd.RadiusAuthPlugin" scan="on" user-defined="false"> <property key="timeout" value="3000"/> <property key="user" value="TEST"/> <property key="password" value="test"/> <property key="secret" value="opennms"/> <property key="retry" value="2"/> </protocol-plugin>
Note that this requires that a username of "TEST" with a password of "test" and secret of "opennms" is configured on the server in order for this service to be discovered.
The poller configuration looks similar:
<service name="RadiusAuth" interval="300000" user-defined="false" status="on"> <parameter key="retry" value="3"/> <parameter key="timeout" value="3000"/> <parameter key="user" value="TEST"/> <parameter key="password" value="test"/> <parameter key="secret" value="opennms"/> <parameter key="rrd-repository" value="/var/opennms/rrd/response"/> <parameter key="ds-name" value="radiusauth"/> </service>
This will also store and graph the response time.
Thanks to Bill Ayres (OGP) we have some new behavior in capsd with respect to the primary SNMP interface.
Previously, there had to be a way to map an IP Address from the ipAddrTable to an ifIndex in the ifTable. This was so interface-level statistics could be matched up with IP addresses. There were cases, however, especially with devices like firewalls where no valid primary interface could be determined and thus no data could be collected from the device.
This code will address this, and OpenNMS will still be able to poll some SNMP data without having the ifIndex mapping.
This new event will never be generated internally, but it does serve a useful purpose. Suppose there is a need to get all events from a particular server sent as notifications. By choosing this UEI when building the notification, it will match the UEI of any event generated by the system.
Note that this can be dangerous, especially on systems that receive a lot of SNMP traps. To use this properly, set up a limiting rule, such as:
<notification name="Test-Match-any" status="off"> <uei>MATCH-ANY-UEI</uei> <description>Test</description> <rule>(IPADDR == '10.1.4.10')</rule> <destinationPath>Email-Admin</destinationPath> <text-message><p>This is a generic notice to match any event UEI that matches the filter rule.</p> <ul><li>uei: %uei% </text-message> <subject>Notice #%noticeid%: Match Any event: %uei%</subject> </notification>
This will send a notice on any event where the IP address is 10.1.4.10.
A number of people use the ifAlias field to add a useful description to a particular interface. Bill Ayres (OGP), Mike Huot (OGP) and Wrolf Courtney are responsible for adding the field to the database and printing it out on various pages in the webUI.
DJ Gregor (OGP) has made a lot of changes to our documentation as well as the installer. Please, please read the installation guide and note that the install program that sets up the database, etc., is no longer run when the OpenNMS packages are installed. It will need to be run manually after every install. The command:
$OPENNMS_HOME/bin/install -disU
should get things working most of the time.
A bug was fixed with Availability Reports using the JavaMailer
class. Some installations of Tomcat included the JavaMail API and
therefore our JavaMailer would work with those installations. Some users
are experiencing problems because their install does not have the API in
the Tomcat common/lib
directory. The build task was
modified to include the appropriate jar files in our
WEB-INF/lib
directory in case they were
missing.
An enhancement was also made to the JavaMailer include a Mail Transport Agent (MTA). This MTA is free to use and was provided by John Udell (found here: http://zoe.nu/). It's included in the OpenNMS libs as "jmta.jar". This enables OpenNMS to use a Java mailer without needing a relay SMTP host.
A new flag has been added to the
javamail-configuration.properties
file:
"org.opennms.core.utils.useJMTA". It defaults to "true".
There are a number of other enhancements worth noting:
Custom KSC reports can now be accessed directly from the main page.
A number of new event definitions have been added, including those for BMC PATROL, MGE UPSs, Network Appliance, Snort, NetBotz, and Compaq Insight Manager.
Both the datacollection-config.xml and snmp-graph.properties files (responsible for data collection and display) have been enhanced and reorganized. The default loadavg report for Net-SNMP agents has been replaced with a really nifty CPU report by Ray Van Dolson.
OpenNMS will recognize if a web page is listening on the Dell OpenManage port, and there will be a link from the webUI if so.
There are other small fixes and improvements too numerous to list. Please feel free to check out CVS and the CHANGELOG for more.
Longtime users of OpenNMS will be pleasantly surprised at the speed enhancements of this release. Most of the new code changes focused on performance or our goal of making OpenNMS more purely Java (eliminating the need for JNI calls to C++ code and perl).
A new queuing system for writing the data collection RRD files has been added. Experimental tests have shown that the I/O subsystem is the bottle neck for Data Collector performance.
In order to increase the capacity of our data collector we have added an intelligent queuing system for all of our RRD Output. This improves performance for us in two ways. One is that it frees up the data collection threads to get back to the work of data collection right away. As a side benefit it also frees up the polling threads that output RRD data related to latency. This makes the data much more accurate.
The second way that the queuing system provides performance improvements is by queuing collected data on a per file basis. Since writing a few extra bytes to an RRD file while its open anyway adds very little to the update time, this causes the average update cost per data point to go down the further the collector gets behind. What this means is that you will not lose any data points if the collector is configured to collect more than you I/O system can read/write without queuing. It just means that the data that is in the files is behind the actual collection.
A last way the queuing system improves data collector performance is
by giving priority to non-zero data. On many networks, as much as 60% of
the collected data is zero valued. In most situations, data that has
remained zero for a long period of time is unimportant data. For this
reason we give files with non-zero data points priority over files that
have only zero data points. Monitored interfaces that have only zero
ifErrors or ifDiscards, for example, will not hold up writing of the data
for systems that have positive ifErrors and ifDiscards. Please see
$OPENNMS_HOME/etc/rrd-configuration.properties
for
configurations details.
Thanks to Rackspace and Eric Evans for providing equipment to help us implement this and to General Electric's John Lee and Neeraj Malve for helping to test this.
Though the Collection Queuing above provides a great deal of performance benefit, for very high capacity data collection environments it was not sufficient. We have found the JRobin 1.4.0 provides performs better than RRDTool in our environment. The primary reason for this is that JRobin reads a much smaller amount of data on an RRD Open than RRDTool does. This increases throughput on systems like ours that max out the I/O subsystem doing an open, update, and close on a huge number of RRDs. An additional benefit of JRobin is that it is multi-thread enabled (unlike our libjrrd.so shared library that interacts with RRDTool) however this does not impact performance very much because the contention is not the bottle neck. In fact, on some systems that have as many as 100 data collection threads, we found that have as few as 2-10 RRD write threads were sufficient to populate the RRD files.
The only unfortunate part of using JRobin is that the file format for JRobin is not the same as for RRDTool. This means other tools you may be using will not work with JRobin RRD files and will have to converted or replaced. JRobin does have some tools available and you should check out their website at www.jrobin.org to see what is available. Additionally, JRobin supports programmatically all of the features of RRDTool, so it should be easy to write a compatible tool. If you do so, please consider contributing it either to us or to the JRobin team.
The last benefit of JRobin is that is helps us to reach our goal of a 100% Pure Java version of OpenNMS that can work 'out of the box' on any system.
Thanks go to the JRobin team for providing this excellent RRD implementation. Thanks also to Rackspace for providing the equipment to let us implement this.
Thanks to Chris Fedde of On Command, we have found that the SQL queries produced by our filtering code were, at times, extremely inefficient. On a moderately sized network of only 15K interfaces the queries used to define categories could take as long as three minutes on a 4 CPU system. As a result of this, we have reworked the filter parser and now produce much better queries. The equivalent query would now only take about a second on the same system. Thanks a lot to Ted Kaczmarek and Alexander Hoogerhuis for helping us test it.
There have been a large number of bugs reported with respect to deleting nodes from the OpenNMS GUI. This code has been completely overhauled in OpenNMS 1.1.4. The new strategy has been to mark a node/interface/service deleted rather than actually deleting it. (The node completely removed from the database by Vacuumd discussed below.) This allows any web pages that are currently viewing the node to react properly. Additionally, deletes now happen as database transactions providing for performance and data integrity (i.e. no orphaned interface records) and you are not left with a half-deleted node.
In addition to overhauling node deletion, we have also added the ability to delete interfaces and services. Administrators have a delete link at the top of service and interface detail pages.
All the users of OpenNMS helped by writing defects and complaining about this bug. Special thanks to Ted Kaczmarek, Jonathan Sartin, and Mike Huot for their help testing and characterizing these problems and their solution.
Since nodes, interfaces and services are only marked deleted and not actually removed from the database when a delete occurs, some way was needed to ensure that the data actually got removed eventually. To do this Vacuumd was introduced to provide a mechanism to periodically run database maintenance operations. Vacuumd's configuration consists of a set of database statements to be run daily (or whichever period is preferable). There are four maintenance operations provided by OpenNMS by default. These are as follows: delete nodes that have been marked as deleted, delete interfaces that have been marked as deleted, delete services that have been marked as deleted, and delete any event that is not related to a current outage and is older than 6 weeks.
In order to make sure that Vacuumd properly deletes all the data associated with the nodes, interfaces, services or other entries that it deletes we have added cascading delete constraints to the database. With these if you do a 'DELETE FROM nodes WHERE nodeid=11', for example. You delete all the related information, its interfaces, the services on those interfaces, the outages related to the node, the events, etc., etc..
Though not an actual leak because the connections are garbage collected, we found a problem in the code where database connections were being dropped and left for the garbage collector to clean up. This problem, which occurred primarily during discovery, would result in an abnormally large number of outstanding connections to the database to be cleaned up by the garbage collector. Discovery of large networks caused the number of connections to the database to easily grow beyond the number allowed. Thanks to Chris Fedde and OnCommand for helping us track down this problem.
An OpenNMS objective to become a 100% Java application, requires elimination the following dependencies.
/bin/mail
Perl
Metamail package
These dependencies are used for two functions of the OpenNMS application: Notification and Availability reporting.
Installing a fresh copy of OpenNMS will now use the Java Mail API by
default. Upgrade installs will create two new files that the user will
have to merge into their existing configurations:
destinationPaths.xml.rpmnew
and
notficationCommands.xml.rpmnew
.
The javamail-configuration.properties
file
should be modified to specify the sender's address and the SMTP server
address. This file also provides support for SMTP servers requiring user
and password authentication.
There are also major changes to the
notificationCommands.xml
file. The schema definition
for this file has changed requiring that the <command> tag now has
an attribute called 'binary' and a value of either "true" or "false". This
flag indicates to the OpenNMS notification processes that the command
definition is either a system command (as it always has been prior to this
release) or an OpenNMS Java class (see deveolper notes below). The 1.1.4
version of this file now has added the following new command
definitions:
<command binary="false"> <name>javaPagerEmail</name> <execute>org.opennms.netmgt.notifd.JavaMailNotificationStrategy</execute> <comment>class for sending pager email notifications</comment> <argument streamed="false"> <switch>-subject</switch> </argument> <argument streamed="false"> <switch>-pemail</switch> </argument> <argument streamed="false"> <switch>-tm</switch> </argument> </command> <command binary="false"> <name>javaEmail</name> <execute>org.opennms.netmgt.notifd.JavaMailNotificationStrategy</execute> <comment>class for sending email notifications</comment> <argument streamed="false"> <switch>-subject</switch> </argument> <argument streamed="false"> <switch>-email</switch> </argument> <argument streamed="false"> <switch>-tm</switch> </argument> </command>
Notice a few changes:
The <command>
tag now has
an attribute called 'binary' that when set to "true" will cause the
notification process to attempt to execute a system command specified
in the <execute> tag. When not set or set to anything other than
"true", the notification process assumes it is an OpenNMS notification
class and will instantiate it and execute its 'send()' method. By
convention, the "binary" attribute will be set to "false" when new
notification classes are implemented.
The <substitution>
arguments are not used for notification classes but
are still supported for notification commands. These "substitution"
arguments (allthough not gramatially intuitive) are passed to system
commands as a command line argument when the 'binary' attribute is set
to "true" and is otherwise ignored.
The <execute>
tag now
allows the specification of an OpenNMS Java class name. This class
name must implement the NotificationStrategy interface (see developer
notes below). The new JavaMailNotificationStrategy class excepts the
following <switch>
tags:
-subject, -email, -pemail, -tm.
The destinationPaths.xml
file has changed
making the default email and pager email notifications use the new Java
Mail API. All 'email' and 'pagerEmail' <command>
tags have been changed to the new 'javaEmail' and
'javaPagerEmail' commands in
$OPENNMS_HOME/etc/notificationCommands.xml
.
Thanks so much to new OpenNMS committer DJ Gregor for all his hard work on improving the OpenNMS build and install code. Thanks to DJ's efforts the installer has improved greatly and is now written completely in Java removing (along with the JavaMail feature and Dave Hustace) OpenNMS's dependency on Perl.
That said, we do expect some issues with this, as the installer does do a lot of database checks and conversions, and some users have large databases.
Some notifications, like one based on a highThresholdExceeded event, would benefit from being able to key off of not only the event UEI, but a value within the event itself.
With 1.1.4, the ability has been added to key off the parameter name and to match it with a particular value.
<notification name="HighThreshold" status="on"> <uei>uei.opennms.org/threshold/highThresholdExceeded</uei> <description>A high threshold event</description> <rule>(IPADDR IPLIKE *.*.*.*)</rule> <destinationPath>Email-Admin</destinationPath> <text-message>A high threshold was reached on data source %parm[#1]% with %parm[#2]% and %parm[#3]% and %parm[#4]%.</text-message> <subject>Notice #%noticeid%</subject> <varbind> <vbname>ds</vbname> <vbvalue>cpu</vbvalue> </varbind> </notification>
This will cause the notification to match if it is a high threshold event involving the data source name of "cpu" only.
Note that you must use the name of the parameter, and you can only use one. There is a TODO to add the ability to use more than one parameter for the match and to use a particular parameter number instead of just the name.
Here is a small list of other improvements in 1.1.4:
The OpenNMS Group is partnering with the creators of the SNMP Informant Windows SNMP agent (formally SNMP4PC), and so a number of SNMP Informant data collections and reports have been added.
A bug was fixed with respect to the RTC Categories section which could cause the services of new nodes to be counted more than once.
To make better use of Bill Ayres' displayCategory field in the Assets table, a new event was created that will update RTC when the Assets table is changed (which will cause such changes to be reflected without a restart).
Also to make this work better, new nodes are created with an
empty asset record. Thus queries like displayCategory
!= "test"
will work.
Michael Huot added some code that causes the webUI to turn to light blue, similar to "Calculating ..." when the webUI loses connectivity with OpenNMS. Code has also been added so the webUI will attempt to reconnect to OpenNMS when this happens. This should remove the dependency on needing to restart Tomcat after restarting OpenNMS.
RPM dependencies have been fixed (bug #906). A packaging error caused previous releases to only include a subset of package dependencies. Some dependencies have also been fixed for Mandrake distributions (bug #933).
A generalized solution has been implemented for choosing a
proper Java Runtime Environment at runtime. See
$OPENNMS_HOME/bin/runjava -h
for details.
The OpenNMS startup script was rewritten to be more
cross-platform friendly and uses the JRE chosen by
runjava
.
Upgraded to the latest PostgreSQL JDBC driver,
pg74.215.jdbc3.jar
.
WebUI to OpenNMS communication has been changed to use a separate connection for each request, instead of a single connection that persists until Tomcat is restarted (or the webapp is reloaded). This eliminates exceptions that are thrown when certain administrative actions are performed (e.g.: adding a new node through the webUI) if OpenNMS is restarted without restarting Tomcat. See bug #897.
A tremendous amount of work has been done "under the covers" to OpenNMS, and the following features were added in 1.1.3:
Prior to this release, the algorithm that OpenNMS used to determine if a particular interface belonged to a particular node was simple. An SNMP walk was done on the device, and all of the IP addresses on that device were associated with the node. If that walk discovered a "duplicate" address, say from a private network or some backup link, it would assume that all of the addresses on that device belonged to the device that was discovered with that IP address first.
This could result in "merged" nodes, especially in environments with HSRP.
This release now supports duplicate IP addresses. The nodes will not be merged and an event will be generated.
Note that networks are not supposed to have duplicate IP addresses. In other words, if there are two "10.1.1.1" addresses on a network, and OpenNMS sends a "ping" to 10.1.1.1, it will assume that a response means that interface 10.1.1.1 is "up", regardless of which "10.1.1.1" interface responds.
Since this feature was mainly written to support inactive or unreachable interfaces that were discovered by SNMP, this behavior should not present a problem, although it does have the added benefit of being able to monitor highly available IP addresses.
For example, if your website lives at 10.1.1.1, which lives on two devices, as long as an HTTP request to 10.1.1.1 is answered (by either machine) OpenNMS will mark the service as up.
The rules uses in categories and filter rules, usually along the lines of
<rule>IPADDR IPLIKE *.*.*.*</rule>are actually quite flexible, and can be built on almost anything in the database. However it would be nice to easily place a particular device into a category for display on the main page, notifications, etc..
There are four categories:
Display Category (database field displayCategory): This is to be used for grouping devices into a particular category.
Poller Category (database field pollerCategory): This is to be used to define devices in a particular poller package.
Notification Category (database field notifyCategory): This could be something like "serverAdmin" or "networkAdmin" to be used for directing notifications.
Threshold Category (database field thresholdCategory): This is to be used to define devices in a particular thresholding package.
Note that there is no "hard coded" meaning to these categories, you could use "poller" for "threshold" etc.. They are just labeled for convenience.
How would you use them? Well, you would need to modify the <filter> or <rule> tags in the configuration files. Suppose you had two types of polling packages, like "Gold" and "Silver". You would then have a filter like <filter>pollerCategory == "Gold"</filter> for that package. By just adding the name "Gold" or "Silver" to the proper category on the asset screen you can place a particular device into that poller package.
Note that once you have sorted all of your devices, you will need to restart OpenNMS for the poller to reload the proper configuration.
One user of OpenNMS has integrated it into their
provisioning/billing/support package. They use multiple instances of
OpenNMS to poll the services on their network, and all of these instances
talk to a single database. By sending events to
eventd
, they can affect changes in how these devices
are polled (without a restart).
In order to alert this system to events from OpenNMS, like
"nodeLostService", we send events out via
xmlrpcd
.
Unfortunately, there is not time to describe in detail how this system works, but it will be documented as soon as possible (the hope is by 1.2).
People who used previous versions of OpenNMS on Java 1.4.2 found out that it would use up all of the resources on the system and then die. This turned out to be due to a very obscure bug in Java. The code was re-written to avoid this and now we recommend that OpenNMS is run on 1.4.2.
John Rodriguez has created a great MIB Compiler to convert native MIB information into a format that can be used by datacollection-config.xml.
We hope to import into the webUI in the future. But for now it is
located in the contrib
directory under
mibparser
.
In that directory is the complete code (in Java) as well as a helpful README. In a nutshell this is how you would use it.
Change into the dist directory and run the parseMib.sh wrapper script. The format is:
Usage: parseMib.sh <MIB File 1> [<MIB file
2>...] Example: parseMib.sh RFC-1213.my
Thus:
$ ./parseMib.sh /usr/share/snmp/mibs/RFC1213-MIB.txt Looking for a good java... Using java in user's path... Checking Java version for 1.4+... Version is: java version "1.4.2_04" Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.2_04-b05) Java HotSpot(TM) Client VM (build 1.4.2_04-b05, mixed mode) Checking for JAVA_HOME... JAVA_HOME not set, trying to find it... JAVA_HOME set to: . Calling parser...
will generate output that is very familiar to people used to modifying datacollection-config.xml. For example:
<mibObj oid=".1.3.6.1.2.1.11.1" instance="0" alias="snmpInPkts" type="Counter" /> <mibObj oid=".1.3.6.1.2.1.11.2" instance="0" alias="snmpOutPkts" type="Counter" /> <mibObj oid=".1.3.6.1.2.1.11.3" instance="0" alias="snmpInBadVersions" type="Counter" /> <mibObj oid=".1.3.6.1.2.1.11.4" instance="0" alias="snmpInBadCommunityNamesTOOLONG" type="Counter" /> <mibObj oid=".1.3.6.1.2.1.11.5" instance="0" alias="snmpInBadCommunityUsesTOOLONG" type="Counter" /> <mibObj oid=".1.3.6.1.2.1.11.6" instance="0" alias="snmpInASNParseErrs" type="Counter" /> <mibObj oid=".1.3.6.1.2.1.11.8" instance="0" alias="snmpInTooBigs" type="Counter" /> <mibObj oid=".1.3.6.1.2.1.11.9" instance="0" alias="snmpInNoSuchNames" type="Counter" />
This could be put into a new MIB group,
snmp-stats
or some such, directly without having to
explore the MIB by hand.
I love this app.
There are some caveats. In using this I have sometimes seen errors where the MIB compiler could not find a referenced variable because it is defined in another MIB file. Simply list it first in the list of MIBs to parse.
I have also come across some MIBs that define custom object types and the parser does not handle it all that well. It is often possible just to delete the offending line from the MIB file (after making a copy of course) and try it again.
OpenNMS can only handle numeric data types, or DisplayStrings that can be converted into numbers, so keep that in mind when choosing which values to collect.
We use RRDTool, and RRDTool has a 19 character limit on filenames
(the part before .rrd
). Since the "alias" field
becomes the file name, you cannot have an alias longer than 19 characters.
The parser will append "TOOLONG" to overlength aliases, and you can edit
them by hand (it would be possible to truncate the name, but you cannot
have duplicate aliases and that might occur).
Finally, OpenNMS can handle a numeric instance (0, 1, 2, ... etc.)
or an instance of "ifIndex". So an instance of
"tcpConnState
" would cause an error.
Our goal is to make OpenNMS as pure Java as possible. However, for a
variety of reasons we cannot do that yet. When OpenNMS was started,
ant
(the program used to build other Java programs)
had limitations that had to be worked around. This resulted in a workable,
but somewhat confusing, build system.
DJ Gregor (building on work started by Edwin Buck) rebuilt the build
system, making it almost pure ant
. This was great for
those doing development, so hats off to Deej and Edwin.
Mike Huot has written a new NTP poller. You will notice it in capsd-configuration.xml and poller-configuration.xml. Hat off to Mike.
Small additions that deserve mention:
APC data collection was added.
Added "maxval" and "minval" attributed to the
mibObj
definition in
datacollection-config.xml
to help eliminate
spikes.
Started improving start up times on large systems.
Added a sort to KSC reports.
Added an initial delay to notification paths.
We have lots of bug fixes. I am really tired. Check out the CHANGELOG.
The following features were added in 1.1.2:
There are three new pollers available:
Ssh: Previously, the SSH service was polled and discovered using the generic TCP class. This worked fine, except that SSH expects a version to be sent with the query. This causes numerous logs, thus the TCP class was modified into an SSH class that sends the correct version string.
JDBC: The database pollers also use the TCP class to connect to well known ports. Jose Nunez Vicente Zuleta created a poller that uses the particular JDBC database driver to make a connection, get the system catelogs, and if successful, mark the database service as "up". Since this requires a valid username and password that can access the database, it is not the default class, but it is pretty simple to set up.
In order to automatically detect and monitor databases, a few changes need to be made to both the network and the OpenNMS configuration. First, be sure that the username and password you plan to use actually works from the OpenNMS server. This will involve changes to pg_hba.conf for PostgreSQL, and I am not sure about others.
Second, you will need to insure that you have a jar file with the JDBC driver for your particular database. Copy it to $OPENNMS_HOME/lib (the one for PostgreSQL is already included).
Okay, now you need to modify the capsd configuration to discover the service and modify the poller configuration to poll the service.
capsd: Here is an example for Sybase:
<protocol-plugin protocol="Sybase-JDBC" class-name="org.opennms.netmgt.capsd.JDBCPlugin" scan="on"> <property key="user" value="sa"/> <property key="password" value="XXXX"/> <property key="retry" value="3"/> <property key="timeout" value="5000"/> <property key="driver" value="com.sybase.jdbc2.jdbc.SybDriver"/> <!-- jdbc:sybase:Tds::/ --> <property key="url" value="jdbc:sybase:Tds:OPENNMS_JDBC_HOSTNAME:4100/tempdb"/> </protocol-plugin>
and one for MySql:
<protocol-plugin protocol="MySQL-JDBC" class-name="org.opennms.netmgt.capsd.JDBCPlugin" scan="on"> <property key="user" value="root"/> <property key="password" value="XXXX"/> <property key="retry" value="3"/> <property key="timeout" value="5000"/> <property key="driver" value="org.gjt.mm.mysql.Driver"/> <!-- jdbc:mysql://[<:3306>]/ --> <property key="url" value="jdbc:mysql://OPENNMS_JDBC_HOSTNAME:3306/mysql"/> </protocol-plugin>
and one for PostgreSQL:
<protocol-plugin protocol="PostgreSQL-JDBC" class-name="org.opennms.netmgt.capsd.JDBCPlugin" scan="on"> <property key="user" value="opennms"/> <property key="password" value="opennms"/> <property key="retry" value="3"/> <property key="timeout" value="5000"/> <property key="driver" value="org.postgresql.Driver"/> <!-- jdbc:postgresql:[[:<5432>/]] --> <property key="url" value="jdbc:postgresql://OPENNMS_JDBC_HOSTNAME:5432/opennms"/> </protocol-plugin>
Note that the service names for all three of these examples have "-JDBC" added to the end of their names. This means you can run them separately from the standard database protocols, or if you like, you can completely replace the standard protocols. In fact, if you wish, you can use the standard port check in capsd, and then use the JDBC poller configuration to do the actual polling.
Here are the poller configuration examples:
<service name="Sybase-JDBC" user-defined="false" interval="6000" status="on"> <parameter key="user" value="sa"/> <parameter key="password" value="XXXX"/> <parameter key="timeout" value="3000"/> <parameter key="driver" value="com.sybase.jdbc2.jdbc.SybDriver"/> <!-- jdbc:sybase:Tds::/ --> <parameter key="url" value="jdbc:sybase:Tds:OPENNMS_JDBC_HOSTNAME:4100/tempdb"/> </service>
<service name="MySQL-JDBC" user-defined="false" interval="6000" status="on"> <parameter key="user" value="root"/> <parameter key="password" value="XXXX"/> <parameter key="timeout" value="3000"/> <parameter key="driver" value="org.gjt.mm.mysql.Driver"/> <!-- jdbc:mysql://[<:3306>]/ --> <parameter key="url" value="jdbc:mysql:// OPENNMS_JDBC_HOSTNAME:3306/mysql"/> </service>
<service name="PostgreSQL-JDBC" user-defined="false" interval="9000" status="on"> <parameter key="user" value="opennms"/> <parameter key="password" value="opennms"/> <parameter key="timeout" value="9000"/> <parameter key="driver" value="org.postgresql.Driver"/> <!-- jdbc:postgresql:[[:<5432>/]] --> <parameter key="url" value="jdbc:postgresql://OPENNMS_JDBC_HOSTNAME:5432/opennms"/> </service>
One more thing in the poller-configuration file, you will need to add <monitor> tags at the bottom:
<monitor service="Sybase-JDBC" class-name="org.opennms.netmgt.poller.monitors.JDBCMonitor"/> <monitor service="MySQL-JDBC" class-name="org.opennms.netmgt.poller.monitors.JDBCMonitor"/> <monitor service="PostgreSQL-JDBC" class-name="org.opennms.netmgt.poller.monitors.JDBCMonitor"/>
Hats off to Jose for this work.
General Purpose Script Poller: Bill Ayres has written a poller that will execute a script, and based on the response from that script it will mark the service as being "up" or "down", called the "General Purpose" or "Gp" Poller. He has used it to monitor RADIUS servers, for example.
GpPlugin and GpMonitor work much like TcpPlugin and TcpMonitor in that you can use them to define as many custom services as you need, each with a unique service name.
GpPlugin and GpMonitor call an external script or program to test a particular service. The script will be passed the IP address of the interface OpenNMS is testing ( as --hostname [IP Address]), followed by the timeout (as --timeout [timeout]), followed by any optional arguments that may need to passed.
The script is expected to return a string as standard output which is then compared to the banner property or parameter to determine success or failure of the test.
The timeout is implemented in GpPlugin and GpMonitor. However, some scripts may want to know how long OpenNMS is going to wait for a reply, so the timeout value is passed to the script, and can be ignored by the script if it is not needed.
GpPlugin and GpMonitor also check the exit status of the script or program. If it is not zero, then the test fails. They will also gather and log any standard error output from the script, but the presence of error output does not prevent the test from succeeding if the banner matches the standard output.
Example poller parameters are shown below. All of these are optional except script, which is required, and will cause an exception to be logged if it is missing.
Example plugin properties are also shown below. All of these are optional except script, which is required, and will cause an exception to be logged if it is missing.
These programs use the exec method from Java's Runtime class. Exec is known to have pitfalls. (See When Runtime.exec() won't) Also, exec does not have a built-in timeout feature. In deciding what to do about these shortcomings, Bill discovered that Scott McCrory has already done it with his ExecRunner class. ExecRunner and StreamGobbler are at SourceForge as part of Spumoni.
One more word about the timeout. ExecRunner expects the timeout in integer seconds, not milliseconds, and a value of zero means wait indefinitely. To avoid confusion, Bill maintained the OpenNMS practice of specifying the timeout in milliseconds. Before passing it on to ExecRunner, it gets converted to seconds in the following manner: Zero remains zero, 1 thru 1999 gets converted to 1 second, 2000 thru 2999 -> 2 seconds, 3000 thru 3999 -> 3 seconds, etc..
Included in contrib is a simple perl test script,
gptest.pl
, that is handy for testing, since it is
easy to edit and change its behaviour.
To implement Gp, add the following entries, substituting your information as needed.
For capsd configuration:
<protocol-plugin protocol="GPtest" class-name="org.opennms.netmgt.capsd.GpPlugin" scan="on" user-defined="true"> <property key="script" value="/opt/OpenNMS/contrib/gptest.pl"/> <property key="banner" value="success"/> <property key="args" value="caps-arg1 caps-arg2"/> <property key="timeout" value="3000"/> <property key="retry" value="1"/> </protocol-plugin>
And for poller configuration:
<service name="GPtest" interval="300000" user-defined="false" status="on"> <parameter key="script" value="/opt/OpenNMS/contrib/gptest.pl"/> <parameter key="banner" value="successful"/> <parameter key="args" value="poll-arg1 poll-arg2"/> <parameter key="retry" value="1"/> <parameter key="timeout" value="2000"/> <parameter key="rrd-repository" value="/var/opennms/rrd/response"/> <parameter key="ds-name" value="GPtest"/> </service>
and the monitor service entry:
<monitor service="GPtest" class-name="org.opennms.netmgt.poller.monitors.GpMonitor"/>
Hats off to Bill for this work, and to Scott for ExecRunner.
Speaking of scripts, Jim Doble has written a daemon that will execute scripts based on events received or generated by OpenNMS, called ScriptD. This process, governed as usual from a configuration file, allows one to generally or specifically execute actions based on events in OpenNMS.
The scripting language, as I understand it, is beanshell. As Jim writes: “You will notice that BeanShell is a lot like Java, but with some relaxed syntax. For example you do not have to define types for your variables, and attributes for which there are simple get methods can be accessed as properties (i.e. you can say event.uei or event.getUei() interchangably.) ”
There are 4 types of scripts that can be run: start-script,
reload-script, stop-script, and event-script. When ScriptD starts it will
run all of the commands in the <start-script>
tags. Likewise, when ScriptD stops, it will run all of the commands in the
<stop-script>
tags. Also, there is a new event,
uei.opennms.org/internal/reloadScriptConfig
, which when
received will run all of the <reload-script>
tags.
The final script type, <event-script>
gets
run when events are received. Event scripts can have one or more UEI
elements, which specify the UEI's for which that script should run. If no
UEI element is present, the script will run for all events.
The scripts can make use of the SnmpTrapHelper
class, which is a utility to make it easier to manipulate traps from a
script.
There is an example scriptd-configuration.xml
file included in the $OPENNMS_HOME/etc
directory.
If you want to forward all SNMP traps to another machine as an SNMP trap, you would use the following event script:
<event-script language="beanshell"> import org.opennms.core.utils.ThreadCategory; import org.apache.log4j.Category; ThreadCategory.setPrefix("scriptd-event"); Category log = ThreadCategory.getInstance(); event = bsf.lookupBean("event"); if (event.snmp != null) { log.debug("forwarding a trap"); snmpTrapHelper.forwardTrap(event, "10.1.1.1", 162); } </event-script>
This will forward the trap to 10.1.1.1, port 162. Note that the event will have SNMP information if the event is indeed an SNMP trap. Since internal OpenNMS events do not, you could use that to forward OpenNMS events as an SNMP trap to another system:
<event-script language="beanshell"> import org.opennms.core.utils.ThreadCategory; import org.apache.log4j.Category; ThreadCategory.setPrefix("scriptd-event"); Category log = ThreadCategory.getInstance(); event = bsf.lookupBean("event"); if (event.snmp == null) { try { log.debug("Forwarding an OpenNMS event."); SnmpPduTrap trap = snmpTrapHelper.createV1Trap(".1.3.6.1.4.1.5813.1", "10.1.1.16", 6, 1, 0); t_dbid = new Integer(event.dbid).toString(); if (t_dbid != null) snmpTrapHelper.addVarBinding(trap, ".1.3.6.1.4.1.5813.2.1", "OctetString", "text", t_dbid); else snmpTrapHelper.addVarBinding(trap, ".1.3.6.1.4.1.5813.2.1", "OctetString", "text", "null"); if (event.distPoller != null) snmpTrapHelper.addVarBinding(trap, ".1.3.6.1.4.1.5813.2.2", "OctetString", "text", event.distPoller); else snmpTrapHelper.addVarBinding(trap, ".1.3.6.1.4.1.5813.2.2", "OctetString", "text", "null"); if (event.creationTime != null) snmpTrapHelper.addVarBinding(trap, ".1.3.6.1.4.1.5813.2.3", "OctetString", "text", event.creationTime); else snmpTrapHelper.addVarBinding(trap, ".1.3.6.1.4.1.5813.2.3", "OctetString", "text", "null"); if (event.masterStation != null) snmpTrapHelper.addVarBinding(trap, ".1.3.6.1.4.1.5813.2.4", "OctetString", "text", event.masterStation); else snmpTrapHelper.addVarBinding(trap, ".1.3.6.1.4.1.5813.2.4", "OctetString", "text", "null"); if (event.uei != null) snmpTrapHelper.addVarBinding(trap, ".1.3.6.1.4.1.5813.2.6", "OctetString", "text", event.uei); else snmpTrapHelper.addVarBinding(trap, ".1.3.6.1.4.1.5813.2.6", "OctetString", "text", "null"); if (event.source != null) snmpTrapHelper.addVarBinding(trap, ".1.3.6.1.4.1.5813.2.7", "OctetString", "text", event.source); else snmpTrapHelper.addVarBinding(trap, ".1.3.6.1.4.1.5813.2.7", "OctetString", "text", "null"); t_nodeid = new Long(event.nodeid).toString(); if (t_nodeid != null) snmpTrapHelper.addVarBinding(trap, ".1.3.6.1.4.1.5813.2.8", "OctetString", "text", t_nodeid); else snmpTrapHelper.addVarBinding(trap, ".1.3.6.1.4.1.5813.2.8", "OctetString", "text", "null"); if (event.time != null) snmpTrapHelper.addVarBinding(trap, ".1.3.6.1.4.1.5813.2.9", "OctetString", "text", event.time); else snmpTrapHelper.addVarBinding(trap, ".1.3.6.1.4.1.5813.2.9", "OctetString", "text", "null"); if (event.host != null) snmpTrapHelper.addVarBinding(trap, ".1.3.6.1.4.1.5813.2.10", "OctetString", "text", event.host); else snmpTrapHelper.addVarBinding(trap, ".1.3.6.1.4.1.5813.2.10", "OctetString", "text", "null"); t_interface = event.getInterface(); if (t_interface != null) snmpTrapHelper.addVarBinding(trap, ".1.3.6.1.4.1.5813.2.11", "OctetString", "text", t_interface); else snmpTrapHelper.addVarBinding(trap, ".1.3.6.1.4.1.5813.2.11", "OctetString", "text", "null"); if (event.snmphost != null) snmpTrapHelper.addVarBinding(trap, ".1.3.6.1.4.1.5813.2.12", "OctetString", "text", event.snmphost); else snmpTrapHelper.addVarBinding(trap, ".1.3.6.1.4.1.5813.2.12", "OctetString", "text", "forge.opennms.com"); if (event.service != null) snmpTrapHelper.addVarBinding(trap, ".1.3.6.1.4.1.5813.2.13", "OctetString", "text", event.service); else snmpTrapHelper.addVarBinding(trap, ".1.3.6.1.4.1.5813.2.13", "OctetString", "text", "null"); if (event.descr != null) snmpTrapHelper.addVarBinding(trap, ".1.3.6.1.4.1.5813.2.16", "OctetString", "text", event.descr); else snmpTrapHelper.addVarBinding(trap, ".1.3.6.1.4.1.5813.2.16", "OctetString", "text", "null"); if (event.severity != null) snmpTrapHelper.addVarBinding(trap, ".1.3.6.1.4.1.5813.2.18", "OctetString", "text", event.severity); else snmpTrapHelper.addVarBinding(trap, ".1.3.6.1.4.1.5813.2.18", "OctetString", "text", "null"); if (event.pathoutage != null) snmpTrapHelper.addVarBinding(trap, ".1.3.6.1.4.1.5813.2.19", "OctetString", "text", event.pathoutage); else snmpTrapHelper.addVarBinding(trap, ".1.3.6.1.4.1.5813.2.19", "OctetString", "text", "null"); if (event.operinstruct != null) snmpTrapHelper.addVarBinding(trap, ".1.3.6.1.4.1.5813.2.20", "OctetString", "text", event.operinstruct); else snmpTrapHelper.addVarBinding(trap, ".1.3.6.1.4.1.5813.2.20", "OctetString", "text", "null"); snmpTrapHelper.sendTrap("public", trap, "10.1.1.15", 162); } catch (e) { sw = new StringWriter(); pw = new PrintWriter(sw); e.printStackTrace(pw); log.debug(sw.toString()); } } </event-script>
This will send an newly defined OpenNMS trap with the important event information embedded as varbinds.
If you wanted to limit the forwarded OpenNMS events to
nodeLostService
and
nodeRegainedService
, you can add a
<uei>
tag:
<event-script language="beanshell"> <uei name="uei.opennms.org/nodes/nodeLostService"/> <uei name="uei.opennms.org/nodes/nodeRegainedService"/>
To the first part of the <event-script>
tag.
Hats off to Jim for this work.
Okay, let us get this out in the open. I do not care for maps in network management. Yes, they are nice looking, but truely useful maps cannot be automated, and the manual process of generating maps takes more time than they are worth.
That said, my opinions do not mean much in this project (grin) and if someone is willing to put in some work and write solid code I am more than willing to accept it. Thus, Derek Glidden decided to go and write a mapping system for OpenNMS.
This will display the nodes as icons, and the current availability is displayed in color underneath it. You can view it in a tree mode or just as a list of icons, and the image will automatically refresh. The parenting relationships have to be manually set.
The image is built and displayed using Scalable Vector Graphics (SVG). I think this is a great decision, but the downside is that the only SVG viewer I was able to get to work was from Adobe for Internet Explorer on Windows. I was not able to get SVG to work with Mozilla or Safari (on Mac). Using the system on IE was very clean and fast.
There is the option to convert the SVG image to a PNG image. This is extremely processor intensive, takes a long time on a network of any size, and often fails. It is not recommended.
Scared yet? (grin)
For these reasons I am treating the current map implementation as contributed code (i.e. not supported). It is hoped, however, that Derek and others will work to make more improvements to the system.
Okay, to get started, read the map.disable
file
in $OPENNMS_HOME/etc
. You will need to copy this file
to map.enable
. This will add a "Map" menu item in the
WebUI.
You will also have to make some changes to the tomcat4
configuration. First, you need to set headless
equal to
true
, and second you should probably increase the
memory available to Tomcat (especially if you are trying to use the SVG to
PNG transcoder). OutOfMemory
exceptions in Tomcat are
indicative of a too small memory setting when trying to render the
map.
I know this whole things sounds a bit negative, but that is no reflection on Derek's work. He wrote very clean code and I like the architecture (SVG especially) that he came up with. The icons are cool, too. So hats off to Derek.
But I am bracing myself for the onslaught of questions like “"Can I add a background?"”, “"Can I change the icons based on systemOID?"”, and “"Can I make submaps?"”. Patience, please.
The following little changes and improvements have been made:
Added RFC2325 to the data collection configuration
Added a "bits" report (to replace bytes) and made it the default report for KSC reports.
Added the ability to define a "null" filter (can speed up OpenNMS starting)
Added new Cisco and UCD-SNMP reports (Thanks Tony and Stuart)
Added new trap definitions for IBM and Intel
As we move toward the next stable release of OpenNMS, a number of bugs have been fixed, including:
Added a check to handle null terminated strings in traps (Thanks Dave W.)
Corrected issues with day/week/month/year buttons in WebUI on various browsers
Changed the open count in notifications to reflect those for the user instead of the system
Fixed a typo in mail.pl
in the
contrib
directory
Added a small fix to the HTTP and HTTPS monitors that could
cause a ClassCastException
(Thanks Jim)
Added an ORDER BY statement to insure that categories reflect the correct values.
Added code to explicitly close sockets in plugins and monitors (see the Known Issues below for Java 1.4.2)
Bug 708: Fixed issues with viewing events when nodes are deleted
Bug 715: Added security roles to web.xml
(Thanks DJ)
Bug 741: Fixed issues with the SNMP admin page and null
issnmpprimary
values.
Bug 748: Added code to catch rrdUpdate
exceptions that could cause false nodeDown
events
Bug 752: Fixed a bug that caused certain rules to match all events
The following features were added in 1.1.1:
SNMP Traps will now be associated with nodes if the IP address in the trap matches a known IP address in the database.
If the IP Address is not known, OpenNMS will generate a newSuspect event to attempt to discover the device. This behavior can be disabled in the trapd-configuration.xml file.
Added new trap definitions for Dell OpenManage, Foundry Networks and ADIC. Also added an updated mib2opennms program which improves the look of the output.
Added a new custom reporting module which allows one to create and save custom performance reports. It is called the Key SNMP Custom (KSC) Reporting Tool.
Added buttons on the standard Performance and Response Time pages to allow the range to be changed between the last Day/Week/Month/Year.
Added the ability to collect response time on the following pollers: Citrix, FTP, HTTPS, IMAP, POP3, SMTP and TCP.
The RRAs for Response Time data are now part of the poller configuration file.
There is now a Response Time link on the node and interface pages.
If a node or interface supports HTTP, there is now a link to that service.
Added a two minute refresh to the event listing page.
Added non-blocking I/O to the HTTPS service. Now all monitors and plug-ins should be non-blocking.
If you set the IP Address in a poll-outages calendar to "match-any" it will match all addresses in the poller package that uses that calendar.
Increased the size of the contactinfo field in the usersnotified table, and changed create.sql to make this easier.
Fixed numerous bugs, including 650 where "down" events could be written to the database after the corresponding "up" event. See the CHANGELOG for a full list.
For a variety of reasons, OpenNMS 1.1.1 and beyond will require Tomcat4 version 4.1.18 or higher.
There were many changes to OpenNMS between 1.0 and 1.1. Here are a few listed by functional area.
The events and notifications part of OpenNMS saw the most changes
with 1.1.0. First, there was a new tag added to the
eventconf.xml
file called <event-file>. This
allows for external files to be included in the event
configuration.
Also, the order in which events appear is now strictly enforced.
When trying to match an event with an event definition, OpenNMS takes the
first match. The events in the eventconf.xml
are read
first, followed by the files identified by <event-file> tags (in the
order in which they are listed). In the configuration that ships with
OpenNMS, the file with the default events is loaded last. Be sure to add
any custom files before that one.
Prior to this release, the SNMP generic traps 0-5 (coldStart, warmStart, linkDown, etc.) were hard-coded. Now they must be defined (and that definition is included in the default events file), but this allows for generic traps other than type 6 to be configured differently for, say, different hosts.
Speaking of event files, over 2750 events were added out of the box, including those from vendors such as Cisco, HP and 3Com. Please let us know if anything is misconfigured or if we need to add some events.
The ability to configure events based on parameters (varbinds) was also added. This is best demonstrated with an example. In the new HP event definitions there is an event called hpicfFaultFinderTrap. It is defined as:
<event> <mask> <maskelement> <mename>id</mename> <mevalue>.1.3.6.1.4.1.11.2.14.12.1</mevalue> </maskelement> <maskelement> <mename>generic</mename> <mevalue>6</mevalue> </maskelement> <maskelement> <mename>specific</mename> <mevalue>5</mevalue> </maskelement> </mask> <uei>uei.opennms.org/vendor/HP/traps/hpicfFaultFinderTrap</uei> <event-label>HP-ICF-FAULT-FINDER-MIB defined trap event: hpicfFaultFinderTrap</event-label> <descr> <p>This notification is sent whenever the Fault Finder creates an entry in the hpicfFfLogTable.</p> <table> <tr> <td><b>hpicfFfLogFaultType</b></td> <td>%parm[#1]%</td> <td><p> badDriver(1) badXcvr(2) badCable(3) tooLongCable(4) overBandwidth(5) bcastStorm(6) partition(7) misconfiguredSQE(8) polarityReversal(9) networkLoop(10) lossOfLink(11) portSecurityViolation(12) backupLinkTransition(13) meshingFault(14) fanFault(15) rpsFault(16) stuck10MbFault(17) lossOfStackMember(18) hotSwapReboot(19) </p></td> </tr> <tr> <td><b> hpicfFfLogAction</b></td> <td>%parm[#2]% </td> <td><p;> none(1) warn(2) warnAndDisable(3) warnAndSpeedReduce(4) warnAndSpeedReduceAndDisable(5) </p></td;> </tr> <tr> <td><b>hpicfFfLogSeverity</b></td> <td>%parm[#3]%</td> <td><p> informational(1) medium(2) critical(3) </p></td;> </tr> <tr> <td><b> hpicfFfFaultInfoURL</b></td> <td>%parm[#4]%</td> <td><p;></p></td;> </tr> </table> </descr> <logmsg dest='logndisplay'><p>HP Event: ICF Hub Fault Found.</p></logmsg> <severity>Warning</severity> </event>
Note that the third parameter denotes the severity of the event. By default this event has a severity of Warning, but what if it was desired to make the "critical" event a severity of Major? Using the new varbind extension to the mask tag:
<event> <mask> <maskelement> <mename>id</mename> <mevalue>.1.3.6.1.4.1.11.2.14.12.1</mevalue> </maskelement> <maskelement> <mename>generic</mename> <mevalue>6</mevalue> </maskelement> <maskelement> <mename>specific</mename> <mevalue>5</mevalue> </maskelement> <varbind> <vbnumber>specific</vbnumber> <vbvalue>5</vbvalue> </varbind> </mask> <uei>uei.opennms.org/vendor/HP/traps/hpicfFaultFinderTrap</uei> <event-label>HP-ICF-FAULT-FINDER-MIB defined trap event: hpicfFaultFinderTrap</event-label> <descr> <p>This notification is sent whenever the Fault Finder creates an entry in the hpicfFfLogTable.</p> <table> <tr> <td><b>hpicfFfLogFaultType</b></td> <td>%parm[#1]%</td> <td><p> badDriver(1) badXcvr(2) badCable(3) tooLongCable(4) overBandwidth(5) bcastStorm(6) partition(7) misconfiguredSQE(8) polarityReversal(9) networkLoop(10) lossOfLink(11) portSecurityViolation(12) backupLinkTransition(13) meshingFault(14) fanFault(15) rpsFault(16) stuck10MbFault(17) lossOfStackMember(18) hotSwapReboot(19)</p></td> </tr> <tr> <td><b>hpicfFfLogAction</b></td> <td>%parm[#2]%</td> <td><p> none(1) warn(2) warnAndDisable(3) warnAndSpeedReduce(4) warnAndSpeedReduceAndDisable(5)</p></td> </tr> <tr> <td><b>hpicfFfLogSeverity</b></td> <td>%parm[#3]%</td> <td><p;> informational(1) medium(2) critical(3)</p></td;> </tr> <tr> <td><b>hpicfFfFaultInfoURL</b></td> <td>%parm[#4]%</td> <td><p;></p></td;> </tr> </table> </descr> <logmsg dest='logndisplay'><p>HP Event: ICF Hub Fault Found.</p></logmsg> <severity>Major</severity> </event>
This event, when added before the previous event since it is more specific, will try to match on the enterprise id, the generic trap value of 6, the specific trap value of 5 and the value of the third parameter, or varbind, of 3.
There was also the addition of a low and high threshold rearm events. When a threshold is exceeded in consecutive polls equal to the trigger number, the threshold event is generated. Another event will not be generated until the polled value drops below the rearm number. The rearm event is thus similar to a "cleared" event. Since the first parameter passed with the threshold event is the data source name, using the "varbind" tag above, each data source can now have its own event.
One of the more noticeable changes is that the Unique Event Identifier no longer contains "http://". The original intent was that the UEI would act something like an XML namespace, but in practice it is just a label, so the "http://" was removed to avoid confusion.
Notifications also received some attention with this release. Due to popular demand, the tags %nodelabel% and %interfaceresolve% are now available. The former will display the label of the nodeid associated with the event, and the latter will attempt to resolve the name associated with the IP Address of the interface of the event.
In notifd-configuration.xml
there are now two
new attributes. In the global properties, there is "match-all". By
default, this is set to false, which means that the first notification
that matches an event will be the only notification sent. If it is set to
true, then all notifications that match a given event will be sent.
(Thanks Nick) In the auto-acknowledge section, there is a new attribute
called "clear". By adding "clear=true" to the auto-acknowledge tag, both
the event being auto acknowledged and the event that
caused the acknowledgement will be acknowledged. Thus the "up" event that
clears a "down" will also be cleared.
In addition to these enhancements, various bugs were fixed. Notification rules now actually work, and you can filter node level events via IP address. Also, threshold events can now generate notifications.
The biggest change to polling would have to be the addition of response time information for DHCP, DNS, HTTP and ICMP based pollers. Similar to data collection, the response time information can be graphed and it can have threshold alarms placed on it.
Also, all of the plugins and monitors (except HTTPS) have been re-written to use the non-blocking I/O available in the 1.4 JDK.
There has been some discussion on how OpenNMS determines node labels. Currently, this is set to the resolved SNMP Primary Interface IP Address. However, it is common practice on routers to have a software-loopback address. OpenNMS will now discover such interfaces (as long as they do not have an address that starts with 127) and mark them as the primary SNMP Interface. Note that no services will be polled on such interfaces.
A few changes were made to the WebUI. There is now a webui-colors.xml file that will allow for dynamic changes to the background colors used in the categories list on the main page (more pages to follow). Also under "Admin" the ability to delete nodes was added.
In addition, there is a new Admin page that will allow one to choose
which non-IP interfaces will be used in data collection. By setting the
snmpStorageFlag in datacollection-config.xml
to
"select" (now the default), OpenNMS will only store data from those
interfaces that could serve as a primary SNMP interface. One can then
select which other interfaces to collect on using the GUI. The previous
values of snmpStorageFlag ("primary" and "all") still work.
Also, the "Destination Path" interface now has the ability to choose NOT to include a service (thanks Nick) which will create a rule like "match the events where service is NOT FTP", and by placing the mouse over the categories on the main page, the last time the category was updated should be displayed.
The poller downtime model allows for a service to be deleted if it has been down for a certain amount of time. This did not work correctly and has been fixed.
During discovery, the ifTable is collected from each device that is found to support SNMP. On some HP switches, this would fail due to a limitation on the SNMP maximum packet size. All non-essential ifTable elements were removed from the request that appears to resolve the problem.
Spaces in Notification Path names have been known to cause problems. The Web UI was modified to disallow spaces in path names. Bug 657.
In the Custom Performance Report Web UI, 11 PM was followed by 12 PM, when it should have been 12 AM. This has been corrected. Bug 515.
The "contrib" directory now contains code, such as nifty utilities, that exists outside of the main OpenNMS source but may prove useful. One such example is Tomas Carlsson's "mib2opennms" program. These programs are not supported.
capsd-configuration.xml
Both LDAP and Citrix protocol plug-ins were listed twice. This would slow down the capabilities scan considerably.
Added new entries to datacollection-config.xml
and snmp-graph.properties
.
Many bugfixes, including allowing Threshold events to generate notifications, AdminStatus and OperStatus values causing exceptions, and rescans with certain devices.
The following major changes occurred between 0.9.9 and 1.0.0:
The OpenSSH service has been renamed to "SSH" and changed to detect common versions of SSH servers other than OpenSSH. Upgrades will retain the "OpenSSH" service as well for the sake of reports.
There is now the possibility of having a state between "up" and "down" that flags a service as being unresponsive. This state can be reached when the service's port can be connected to, but it does not respond in a reasonable amount of time.
Many small bugfixes, including the "Calculating..." problem if RTC has not come up yet when tomcat starts.
Table of Contents
Here is the list of known issues in this release of OpenNMS.
On Mandrake 10, and perhaps others, the symbolic link created in
/etc/init.d
for opennms doesn't seem to work. Do the
following:
# rm /etc/init.d/opennms # ln -s /opt/OpenNMS/bin/opennms.sh /etc/init.d/opennms
I expect this to be the biggest question on the discussion lists. As of 1.1.5, the installer must be run manually after the packages are installed. Please see the documentation for more information.
With 1.1.5 a new format has been created for SNMP graphs in the snmp-graph.properties file. An attempt has been made to organize the different reports, and so default reports like "bits" are now prefixed by "mib2" (i.e. the report is now mib2.bits).
If you have KSC reports on bits, errors, etc., you will need to edit
the ksc-performance-reports.xml
file and replace the
old names with the new ones.
In 1.1.4 the way OpenNMS configures Tomcat4 has changed. All of the necessary configuration information is now in opennms.xml in the Tomcat4 webapps directory.
Unfortunately, if you are upgrading, there are two things you will have to manually do in order to get OpenNMS to work correctly.
Remove the opennms
directory (or symbolic
link) from the $TOMCAT4_HOME/webapps
directory.
Depending on your operating system, you will need to find the
webapps
directory (usually
/var/tomcat4/webapps)
. In that directory should
be an opennms.xml
file and an
opennms
directory (or symlink). Remove it and
leave just the opennms.xml
file.
Locate the server.xml file, usually in $TOMCAT4_HOME/conf directory (/var/tomcat4/conf for many distributions). Remove the four <context> lines that refer to OpenNMS:
<Context path="/opennms" docBase="opennms" debug="0" reloadable="true"> <Logger className="org.opennms.web.log.Log4JLogger" homeDir="/opt/OpenNMS"/> <Realm className="org.opennms.web.authenticate.OpenNMSTomcatRealm" homeDir="/opt/OpenNMS"/> </Context >
You should then be able to restart Tomcat and everything should work just fine. Note that this does not apply to new installs, just upgrades.
Version 1.1.2 and beyond of OpenNMS will require at a minimum Tomcat version 4.1.18 and PostgreSQL 7.2. OpenNMS will no longer supply "onms" versions of these applications, and instead will use main distributions from their maintainers.
Note that upgrading these programs is not simple.
For Tomcat, the best thing to do is uninstall version 4.0 and then install 4.1. Version 4.1 is not seen as an upgrade to 4.0, but is instead seen as a separate product by rpm and apt. You will need to make the following changes to the tomcat4.conf file (located in /etc/tomcat4 on Red Hat):
# you could also override JAVA_HOME here # Where your java installation lives # JAVA_HOME="/usr/java/jdk" # JAVA_HOME="/opt/IBMJava2-131" JAVA_HOME="[location of your Java Home dir]" # What user should run tomcat TOMCAT_USER="root"
You do not have to run Tomcat as root if you change the permissions
on $OPENNMS/logs
and
$OPENNMS/etc
so that the Tomcat user can write to
them.
Two main changes need to be made to the Postgres configuration in order to allow OpenNMS to access it properly. Postgres needs to have been started at least once to create the "data" directory that will contain the configuration files.
Edit postgresql.conf
(located in
/var/pgsql/data
on Red Hat) and insure the following
values exist:
tcpip_socket = true max_connections = 256 shared_buffers = 1024
Edit pg_hba.conf
(host based authentication) to
allow all users to access the database from the local host by
un-commenting and/or adding these lines (do not add the last line if your
system does not support IPv6):
# TYPE DATABASE USER IP-ADDRESS IP-MASK METHOD local all all trust host all all 127.0.0.1 255.255.255.255 trust host all all ::1 ffff:ffff:ffff:ffff:ffff:ffff:ffff:ffff trust
and you may need to uncomment:
# Using sockets credentials for improved security. Not available everywhere, # but works on Linux, *BSD (and probably some others) # local all all ident sameuser
Note that this opens up Postgres to all users on the system (as long as they know the database password). Contact your database administrator if you want to limit this to a specific user, like root.
Table of Contents
OpenNMS is written almost entirely in Java, and should be able to run on any system that suuports the Java 1.4 Virtual Machine. There are requirements for other programs such as PostgreSQL, Perl, RRDTool and Tomcat4, but the 1.4 JDK is the key requirement (as most of the other packages can be compiled from source).
The following are the systems that support or are known to run OpenNMS.
The following Linux distributions and other unix-like systems are supported out-of-the-box with native installation packages.
PostgreSQL 7.2 and later has shipped with Red Hat Linux since version 7.3. Be sure to follow the above instructions.
Fedora Core 1, Fedora Core 2 and Fedora Core 3. OpenNMS is known to build and run on Fedora Core 1, Fedora Core 2 and Fedora Core 3.
Debian Woody, Sarge and Sid. Debian packages should be available on ftp.opennms.org, and at the following apt-repository:
deb http://debian.opennms.org/apt debian/opennms stable
Special Tomcat 4.1 packages were created, since only 4.0 is supported in stable. For instructions see this How-To written by Ian MacDonald.
Solaris 8 and Solaris 9 on x86 and SPARC. Packages are available at ftp.opennms.org for Solaris 8 and Solaris 9 running on SPARC and Solaris 9 running on x86.
Mandrake 8, Mandrake 9 and Mandrake 10. Please note that while we build packages for Mandrake 8.x, we do not do any formal testing on it. Packages are provided as a convenience.
SuSE 8 and SuSE 9. OpenNMS is known to build and run on SuSE.
MacOSX 10.3 and 10.4. On MacOSX, the Fink distribution packages of OpenNMS are supported. See the Fink web site for more information on installing and using Fink.
Also note that on MacOSX, PostgreSQL must be configured in the same manner as above for Linux. However, to do so you will need to update the SHM settings so that the OS allows enough resources for PostgreSQL to run with larger buffers.
To do so, you must edit /System/Library/StartupItems/SystemTuning/SystemTuning so that the sysctl lines look like so (at a minimum):
sysctl -w kern.sysv.shmmax=16777216 sysctl -w kern.sysv.shmmin=1 sysctl -w kern.sysv.shmmni=128 sysctl -w kern.sysv.shmseg=32 sysctl -w kern.sysv.shmall=4096