1Chapter 1
Theory of Operation
1.1 Fundamental Tenets

Bluebird is a service and network management platform for automatic node discovery, network services monitoring, operator notification of problems, events consolidation, automatic action launching and service level performance monitoring.

Scalability

The architecture must be scalable to handle the small to medium companies on a single computer and handle multiple poller approaches in a large international environment. To accomplish this, Bluebird uses a distributed architecture of a master station and one or more distributed pollers. Distributed Pollers can reside on the same processor as the master station.

Resilient and Autonomous

The architecture aspires to the goal of allowing a distributed poller to communicate with the master station only once a day. Though most people won't do this, the architecture becomes resilient to long term disconnects from the master station to a distributed poller.

For example, assume a master station process requests new nodes from a distributed poller every 5 minutes. Should the connection from the master station to the distributed poller fail, the first time it will ask for 5 minutes of data. The next time the master station talks to the distributed poller, it will ask for 10 minutes of new nodes. The third time, it will ask for 15 minutes of data etc. When the connection between the two is established, it will download all the missing data and they will be back in synch.

Open Architecture

Using JAVA, XML, SOAP, JSDT and servlets to communicate between the various components allows an open approach where any application which can understand XML can communicate with the most intimate areas of the product. For example, if an application would like to get events from a distributed poller, it merely queries the servlet on the distributed poller using a well defined XML data stream. The poller returns the requested events in XML. This approach opens the architecture to non-programmers and scripters. The only requirement is an XML parser.

Service Management

Rather than focus solely on ICMP for determining availability of the network, Bluebird uses "synthetic transactions" to probe various services on a device. Bluebird views a network device as a series of services, including ICMP, SMTP, DNS, HTTP etc.

Service Level Orientation

Bluebird eschews the idea of topological presentation of the network for a rule-based, statistical view of the network. Rather than watching red and green icons blink on and off, Bluebird presents problems as histograms effecting service levels. If a machine fails but is not of interest, the service levels are not effected and notification is suppressed.

Deterministic

For limiting traffic generated by a network management system, Bluebird deploys "Bandwidth Trolls" to provide deterministic control of critical polling functions. Bandwidth Trolls throttle back Bluebird processes to pre-determined, user-defined traffic levels.

Discovery

Network devices are automatically discovered using IP, (or SNMP or other protocols) queried for applicability, and added to an object database.

Thread Technology

Pollers and other real-time components take advantage of threaded technology to eliminate queuing overhead, improve responsiveness and take advantage of multi-processor hardware technology.

Graphical

Bluebird administration utilizes graphical metaphors for all configuration and maintenance tasks to reduce the time to learn and understand the product.

Filters

Bluebird administration is rule based to reduce administration and maintenance of the platform.

1.2 Bluebird Functional Overview
Figure: Bluebird Overview

Each functional area of the architecture is described below:

Configuration Files

Configuration files used by the Bluebird system control the behavior and actions of the various parts of Bluebird. Configuration files are maintained and stored in XML format on the file system of the Master Station but are pushed to the Distributed Pollers as necessary.

Configuration files are assembled into bundles called poller packages. These packages are assigned to specific distributed pollers to control how the distributed poller operates. Packages are re-usable and allow for redundancy and ease of administration.

SCM (Service Control Manager)

SCM is responsible for starting, stopping and controlling the various Bluebird processes (services) on the Master Station and the distributed pollers. SCM uses the serviceconfig.xml file to determine the services to run, how to run them and dependencies between services.

JAVA Admin Tools

The Administrator tools are used to graphically manipulate the configuration files. One could edit the config files directly with editors or perl, but the admin tools are designed to be user friendly ways to configure the system.

Discovery

Discovery performs an advanced ICMP sweep of devices in a discovery range. All responding devices in the range are then tested against discovery filters to see if they are "interesting" to Bluebird. Discovery is a threaded system allowing pools to pollers. Discovery is limited by Bandwidth trolls which control how much traffic is consumed by discovery.

capsd

capsd is the capability checker. When a device is found by discovery, capsd checks it against the discovery filter. If it passes through the filter, it is added to the object database and added to the known node list.

Service Poller/Poller Framework

Service pollers provide the actual probing of the service under test. Initial service pollers include ICMP, HTTP, SNMP, SMTP, DNS, FTP and others. Additional pollers can be bolted into the architecture as needed.

EUI Extractors

When the Real-time Console (EUI) and event browser need information about the network, they register with an extractor channel to receive network updates. The extractors deliver a "tree" of information in XML format so that additional EUIs can be built and integrated.

Extractors are shared by multiple EUIs so that additional overhead is not incurred for users viewing similar information.

trapd

Trapd is the SNMP trap listener which waits on a UDP port form messages. When received, trapd converts the event into XML and sends it to the eventd process.

eventd

Events are processed and expanded by eventd. An event is received by eventd from one of several sources; trapd, a third party application, a Bluebird module or from a TCP/UDP port from a different computer. Once received, the event is "expanded"; i.e. additional information for the the event is appended. Additional information would typically be event description, operator instructions, automatic actions, log groups and many other fields.

After the event is expanded, it is sent to persistd for committing to the database.

persistd

persistd takes a fully expanded event from eventd, writes it to the database and broadcasts it to the various event listeners for processing. persistd insures that the event gets sent to the database before event listeners receive it.

actiond

actiond is one of the event listeners. actiond takes an event and looks for automatic actions. If enabled for that event, the action is launched and tracked. actiond uses threads to allow processes to run in parallel.