Configure Alarms

Since Alarmd instantiates alarms from events, defining alarms in Horizon entails defining an additional XML element of an event indicating a problem or resolution in the Network. This additional element is the "alarm-data" element.

Any Event that is marked as donotpersist in the logmsg element’s dest attribute, will not be processed as an alarm.
alarm-data Schema Definition (XSD)
<element name="alarm-data">
  <annotation>
    <documentation>This element is used for converting events into alarms.</documentation>
  </annotation>
  <complexType>
    <sequence>
      <element ref="this:update-field" minOccurs="0" maxOccurs="unbounded" />
    </sequence>
    <attribute name="reduction-key" type="string" use="required" />
    <attribute name="alarm-type" use="required" >
      <simpleType>
        <restriction base="int">
          <minInclusive value="1"/>
        </restriction>
      </simpleType>
    </attribute>
    <attribute name="clear-key" type="string" use="optional" />
    <attribute name="auto-clean" type="boolean" use="optional" default="false" />
    <attribute name="x733-alarm-type" type="this:x733-alarm-type" use="optional" />
    <attribute name="x733-probable-cause" type="int" use="optional" />
  </complexType>
</element>

<element name="update-field">
  <complexType>
    <attribute name="field-name" type="string" use="required" />
    <attribute name="update-on-reduction" type="boolean" use="optional" default="true" />
    <attribute name="value-expression" type="string" use="optional" default="" />
  </complexType>
</element>

<simpleType name="x733-alarm-type">
  <restriction base="string" >
    <pattern value="CommunicationsAlarm|ProcessingErrorAlarm|EnvironmentalAlarm|QualityOfServiceAlarm|EquipmentAlarm|IntegrityViolation|SecurityViolation|TimeDomainViolation|OperationalViolation|PhysicalViolation" />
  </restriction>
</simpleType>

NOTE See also: anatomy of an event

The reduction-key

Alarmd is designed to reduce multiple occurrences of an alarm into a single alarm. The critical attribute when defining the alarm-data of an event is the reduction-key. This attribute can contain literal strings as well as references to properties (fields and parameters) of the event. The reduction-key uniquely identifies the signature of a problem and, as such, is used to reduce (de-duplicate) events so that only one problem is instantiated. Most commonly, the event’s identifier (UEI) is used as the left-most (least significant) portion of the reduction-key, followed by other properties of the event from least to most significant and, traditionally, separated with the literal :.

Example 1. Multi-part reduction-key
<event>
    <uei>uei.opennms.org/nodes/nodeDown</uei>
...
    <alarm-data reduction-key="%uei%:%dpname%:%nodeid%" alarm-type="1" auto-clean="false"/>
</event>

The $dpname% portion of the key refers to the "distributed poller name", which would be the name of the Minion that originated the event.

Example 2. Least Significant reduction-key Attribute

Decreasing the significance of the reduction-key is a way to aggregate, for example, all nodes into a single alarm. However, there are caveats:

<!-- Don't do this in production -->
<event>
  <uei>uei.opennms.org/nodes/nodeDown</uei>
  <alarm-data reduction-key="%uei%" alarm-type="1"/>
</event>

With this reduction-key, a single alarm would be instantiated for all nodes that were determined by the Poller to be down. There would be a single alarm with the count representing the number of nodes down. However, the UEI uei.opennms.org/nodes/nodeUp would not be a good 'pairwise' reduction-key for resolving this alarm as it would take only a single 'node up' to clear all nodes down tracked with this single alarm configuration.

Alarm-type attribute

Alarmd is designed to automatically match resolving events with an existing alarm. Alarms with matching resolutions with problems (Ups with Downs), should be indicated with the alarm-type attribute. There are currently three types of alarms:

  • alarm-type="1" (problem alarm)

  • alarm-type="2" (resolving alarm)

  • alarm-type="3" (notification alarm …​ alarm with no resolution, such as SNMP authentication failures)

The alarm-type attribute helps Alarmd with pairwise resolution, matching the resolution events to problem events.

If an alarm transitions from an alarm-type 2 back to alarm-type 1, the severity will be set to the most recent event’s value.

Clear-key attribute

This attribute is used in the pairwise correlation feature of Alarmd. When configuring a resolution alarm, set this attribute to match the reduction-key of the corresponding problem alarm.

Example of an interfaceUp event clearing an interfaceDown alarm
   <event>
      <uei>uei.opennms.org/nodes/interfaceDown</uei>
      ...
      <alarm-data reduction-key="%uei%:%dpname%:%nodeid%:%interface%"(1)
                  alarm-type="1"
                  auto-clean="false"/>
   </event>

    <event>
      <uei>uei.opennms.org/nodes/interfaceUp</uei>
      ...
      <alarm-data reduction-key="%uei%:%dpname%:%nodeid%:%interface%"
                  alarm-type="2"
                  clear-key="uei.opennms.org/nodes/interfaceDown:%dpname%:%nodeid%:%interface%"(2)
                  auto-clean="false"/>
   </event>
1 The interfaceDown event sets a reduction-key that includes enough information to identify a specific interface on a specific node.
2 The interfaceUp has a clear-key that matches the reduction-key of an interfaceDown alarm, allowing a match to automatically clear the previous alarm.

Auto-clean attribute

This attribute instructs Alarmd to retain only the most recent event reduced into an alarm. For alarms that are super chatty, this is a way to reduce the size of the most recent events in the database.

Do not use this feature with alarms that have pairwise correlation (matching problems with resolutions).

Update-field element

Use this element to override Alarmd’s default behavior for which some fields are updated during reduction. The Alarm fields that are currently allowed to be controlled this way are:

  • distpoller

  • ipaddr

  • mouseover

  • operinstruct

  • severity

  • descr

  • acktime

  • ackuser

Instantiate new alarms for existing cleared problem

Alarmd includes a global property setting that controls the behavior of alarm reduction of currently cleared alarms.

Create or edit the alarmd.properties file in the ${OPENNMS_HOME}/etc/opennms.properties.d/ folder and add the following property set to true:

###### Alarmd Properties ######
#
# Enable this property to force Alarmd to create new alarms when an problem re-occurs and the
# existing Alarm is in a "Cleared" state.
#
# Default: false
#org.opennms.alarmd.newIfClearedAlarmExists = false
org.opennms.alarmd.newIfClearedAlarmExists = true

Now, with this property set, when a repeat incident occurs and the current state of the alarm tracking the problem is "Cleared", instead of restating the current alarm to its default severity and incrementing the counter, a new instance of the alarm will be created.

new after clear 3
Figure 1. New node-down alarm with existing cleared alarm

What happens is that Alarmd will alter the existing alarm’s reductionKey to be unique. Thus preventing it from ever again being reused for a reoccurring problem in the Network (the literal ":ID:" and the alarm ID is appended to the reductionKey).

new after clear 4
Figure 2. Altered reductionKey

Re-enable legacy dual alarm state behavior

You can set a global property setting to re-enable the legacy dual alarm behavior as it was prior to version 23.

Create or edit the alarmd.properties file in the ${OPENNMS_HOME}/etc/opennms.properties.d/ folder and add the following property set to true:

###### Alarmd Properties ######
# Enable this property to have the traditional dual alarm handling of alarms state
# for alarm pairwise correlation.
# Default: false
org.opennms.alarmd.legacyAlarmState = true
Setting org.opennms.alarmd.legacyAlarmState nullifies org.opennms.alarmd.newIfClearedAlarmExists, if also configured.