1 | Chapter 1 | |
Theory of Operation |
Bluebird is a service and network management platform for automatic node discovery, network services monitoring, operator notification of problems, events consolidation, automatic action launching and service level performance monitoring.
The architecture must be scalable to handle the small to medium companies on a single computer and handle multiple poller approaches in a large international environment. To accomplish this, Bluebird uses a distributed architecture of a master station and one or more distributed pollers. Distributed Pollers can reside on the same processor as the master station.
The architecture aspires to the goal of allowing a distributed poller to communicate with the master station only once a day. Though most people won't do this, the architecture becomes resilient to long term disconnects from the master station to a distributed poller.
For example, assume a master station process requests new nodes from a distributed poller every 5 minutes. Should the connection from the master station to the distributed poller fail, the first time it will ask for 5 minutes of data. The next time the master station talks to the distributed poller, it will ask for 10 minutes of new nodes. The third time, it will ask for 15 minutes of data etc. When the connection between the two is established, it will download all the missing data and they will be back in synch.
Using JAVA, XML, SOAP, JSDT and servlets to communicate between the various components allows an open approach where any application which can understand XML can communicate with the most intimate areas of the product. For example, if an application would like to get events from a distributed poller, it merely queries the servlet on the distributed poller using a well defined XML data stream. The poller returns the requested events in XML. This approach opens the architecture to non-programmers and scripters. The only requirement is an XML parser.
Rather than focus solely on ICMP for determining availability of the network, Bluebird uses "synthetic transactions" to probe various services on a device. Bluebird views a network device as a series of services, including ICMP, SMTP, DNS, HTTP etc.
Bluebird eschews the idea of topological presentation of the network for a rule-based, statistical view of the network. Rather than watching red and green icons blink on and off, Bluebird presents problems as histograms effecting service levels. If a machine fails but is not of interest, the service levels are not effected and notification is suppressed.
For limiting traffic generated by a network management system, Bluebird deploys "Bandwidth Trolls" to provide deterministic control of critical polling functions. Bandwidth Trolls throttle back Bluebird processes to pre-determined, user-defined traffic levels.
Network devices are automatically discovered using IP, (or SNMP or other protocols) queried for applicability, and added to an object database.
Pollers and other real-time components take advantage of threaded technology to eliminate queuing overhead, improve responsiveness and take advantage of multi-processor hardware technology.
Bluebird administration utilizes graphical metaphors for all configuration and maintenance tasks to reduce the time to learn and understand the product.
Bluebird administration is rule based to reduce administration and maintenance of the platform.
![]() |
Figure: Bluebird Overview |
Each functional area of the architecture is described below:
Configuration files used by the Bluebird system control the behavior and actions of the various parts of Bluebird. Configuration files are maintained and stored in XML format on the file system of the Master Station but are pushed to the Distributed Pollers as necessary.
Configuration files are assembled into bundles called poller packages. These packages are assigned to specific distributed pollers to control how the distributed poller operates. Packages are re-usable and allow for redundancy and ease of administration.
SCM is responsible for starting, stopping and controlling the various Bluebird processes (services) on the Master Station and the distributed pollers. SCM uses the serviceconfig.xml file to determine the services to run, how to run them and dependencies between services.
The Administrator tools are used to graphically manipulate the configuration files. One could edit the config files directly with editors or perl, but the admin tools are designed to be user friendly ways to configure the system.
Discovery performs an advanced ICMP sweep of devices in a discovery range. All responding devices in the range are then tested against discovery filters to see if they are "interesting" to Bluebird. Discovery is a threaded system allowing pools to pollers. Discovery is limited by Bandwidth trolls which control how much traffic is consumed by discovery.
capsd is the capability checker. When a device is found by discovery, capsd checks it against the discovery filter. If it passes through the filter, it is added to the object database and added to the known node list.
Service pollers provide the actual probing of the service under test. Initial service pollers include ICMP, HTTP, SNMP, SMTP, DNS, FTP and others. Additional pollers can be bolted into the architecture as needed.
When the Real-time Console (EUI) and event browser need information about the network, they register with an extractor channel to receive network updates. The extractors deliver a "tree" of information in XML format so that additional EUIs can be built and integrated.
Extractors are shared by multiple EUIs so that additional overhead is not incurred for users viewing similar information.
Trapd is the SNMP trap listener which waits on a UDP port form messages. When received, trapd converts the event into XML and sends it to the eventd process.
Events are processed and expanded by eventd. An event is received by eventd from one of several sources; trapd, a third party application, a Bluebird module or from a TCP/UDP port from a different computer. Once received, the event is "expanded"; i.e. additional information for the the event is appended. Additional information would typically be event description, operator instructions, automatic actions, log groups and many other fields.
After the event is expanded, it is sent to persistd for committing to the database.
persistd takes a fully expanded event from eventd, writes it to the database and broadcasts it to the various event listeners for processing. persistd insures that the event gets sent to the database before event listeners receive it.
actiond is one of the event listeners. actiond takes an event and looks for automatic actions. If enabled for that event, the action is launched and tracked. actiond uses threads to allow processes to run in parallel.