SiteScope User's Guide


Monitoring SiteScope Server Health

For reliability of operations monitoring depends in part on the reliability of the monitoring application. SiteScope can monitor several key aspects of its own environment to help uncover monitor configuration problems as well as SiteScope server load. Optionally, SiteScope can also monitor its connectivity and related data events when connected to Mercury Application Management.

This section describes:

About the SiteScope Health Group

Beginning in SiteScope version 7.9.1.0, SiteScope Health is a specially designed group of monitors that display information about SiteScope's own health. This includes monitoring server resource usage, key processes, monitor load, and the integrity of key configuration files used by SiteScope. SiteScope Health monitoring data is also recorded in the daily monitor logs, by default, so you can create reports on SiteScope Health performance.

The Health monitor group is not displayed as a named link with other groups on the SiteScope main panel. You access the group detail page for the Health monitor group by clicking the Health button in the navigation menu at the top of each SiteScope screen.

The Health button graphic includes a status icon that indicates if the SiteScope Health monitoring has detected a problem that could be impacting monitoring performance.

The default SiteScope Health Monitors include:

Monitor Type

Default Name

Description

MG Health Monitor MG files Checker

Checks the integrity of monitor group configuration files

History Health Monitor History file Checker

Checks the integrity of the configuration file for reports

Master Health Monitor Master file Checker

Checks the integrity of the SiteScope main configuration file

Log Event Health Monitor Log Event Checker

Checks for certain events logged to the SiteScope error log

Monitor Load Monitor Monitor Load Checker

Checks for data about the number of monitors being run or waiting to run

Health of SiteScope Server Monitor Health of SiteScope Server

Checks a large number of server process and resources for the server on which SiteScope is running

See the SiteScope Health Monitor Reference section for more information about the configuration of the individual SiteScope Health Monitors.

As with other monitors and groups, you may associate alerts and reports with individual Health monitors to be notified of problems and review SiteScope performance over time.

Health Group Actions

The group actions for the Health Monitor group are the same as the group actions for other monitor groups. This includes using the Manage Monitors and Groups page to update monitor settings. The following are group actions that are unique to the Health monitor group:

  • The Default Monitors option replaces the Monitor Set option for adding predefined monitors to the group. At present, you can not deploy other Monitor Sets or Solution Templates into the Health group. You can add your own subgroups and other monitors individually.
  • The Health group includes a Disable Health Logging group action. You use this option to disable the logging of monitor data from the Health group into the SiteScope monitor data logs.

    Note: Disabling Health logging will interrupt or make reports for the monitors in the Health group unavailable.

    Note: Disabled Health Logging was the behavior for earlier implementations of SiteScope Health monitoring. With the SiteScope 7.9.1.0 release, the default is to log the monitor data in the same way as other monitor types unless it is explicitly disabled by using this option.

Adding SiteScope Health Monitors

Adding SiteScope Health monitoring is simple. The monitors are deployed as a special monitor set template from within the Health monitor group. You use the following steps to deploy a set of SiteScope Health monitors.

To deploy SiteScope Health Monitors:

  1. Click the Health button on the SiteScope navigation menu. The Health monitor detail page opens.
  2. Below the monitor detail table, click the Default Monitors link in the Add to Health Monitoring section. SiteScope adds the six default SiteScope Health monitors to the group.

SiteScope Health monitoring is now active. The worst-case status reported by the monitors in the Health group will be displayed in the status icon embedded in the Health button in the SiteScope main navigation menu.

Understanding SiteScope Health Monitoring

SiteScope Configuration Integrity

The SiteScope Configuration Integrity section includes special monitors that check on the integrity of several key files that are essential to the correct operation of the SiteScope application. The following Health monitors check SiteScope configuration files.

  • MG files Checker
  • History file Checker
  • Master file Checker

In most installations and deployments, the integrity of the configuration files will be managed correctly and no errors will be detected. However, due to the high degree of flexibility in configuring and managing SiteScope, there are a number of situations where configuration files may be changed, copied, or created manually rather than by the SiteScope program itself. In these cases, the Configuration Integrity checks will help detect when errors have been introduced.

SiteScope Log Events

The Log Event Monitor is the equivalent of a SiteScope monitor group that watches the SiteScope Error Log (error.log) for certain events. These events include Log entries indicating that:

  • a monitor run has been "skipped"
  • SiteScope has restarted itself unexpectedly
  • the process pool has reached its limit
  • there was a fault in the data transfer to Mercury Application Management

A SiteScope monitor will be reported as "skipped" if the monitor fails to complete its actions before it is scheduled to run again. This can occur with monitors that have complex actions to perform, such as querying databases, stepping through multi-page URL sequences, waiting for scripts to run, or waiting for an application that has hung. This can also happen if there are too many monitors waiting to run that require a process from the process pool.

For example, assume you have a URL Sequence Monitor that is configured to transit a series of eight Web pages. This sequence includes performing a search which may have a slow response time. The monitor is set to run once every 60 seconds. When the system is responding well, the monitor can run to completion in 45 seconds. However, at times, the search request takes longer and then it takes up to 90 seconds to complete the transaction. In this case, the monitor will not have completed before SiteScope is scheduled to run the monitor again. SiteScope will detect this and make a log event in the SiteScope error log. The SiteScope Log Event Monitor will detect this signal an error status.

A monitor may also skip if it is a monitor type that requires a process from the process pool but the process pool limit has been reached. Generally, this is not likely to happen but may occur in some situations with high monitoring load. The SiteScope Health Log Event Monitor also watches for process pool events.

Skipped monitors cause a number of problems. One is the loss of data when a monitor run is suspended due because a previous run has not completed or has become hung by a unresponsive application. Skipped monitors can also cause SiteScope to automatically stop and restart itself, an event that is also monitored by the SiteScope Health Log Event Monitor. A restart is done in an effort to clear problems and reset monitors. However, this can also lead to gaps in monitoring coverage and data. Adjusting the run frequency (Update every) at which a monitor is set to run or specifying an applicable timeout value can often correct the problem of skipping monitors. Investigation of unresponsive systems that are being monitored may also be necessary.

Note: A Max Monitor Skipping setting has been added to allow monitors that are skipping to be disabled automatically. If this occurs, SiteScope is not restarted but an e-mail is sent to the SiteScope administrator about the skipping monitor to signal the disable event. This optional functionality is disabled by default but can be enabled by changing the _shutdownOnSkips to remove the value in the master.config file or remove the setting entirely.

Note: A setting for controlling the maximum number of processes available is available in the master.config file. The default is _processPoolMaxPerPool=50. You should only change this setting if adjustments to monitor configurations do not resolve the monitor performance problems.

The Log Event Monitor is also configured to report log events that indicate a problem with the transfer of SiteScope monitor and configuration data to a Mercury Application Management installation. See the section on Integration with Mercury Application Management for more information on Troubleshooting Data Reporting to Application Management

SiteScope Monitor Load

The Monitor Load Monitor is the equivalent of a SiteScope monitor group that watches how many monitors are running and how many are waiting to be run. This information is taken from the SiteScope Progress Report page. Watching monitor load is important to help maintain monitoring performance and continuity. As noted in the section on SiteScope Progress Report: Understanding Monitor Load, if the number monitors waiting approaches or exceeds the number of monitors running, adjustments should be made to monitor configurations to reduce the number of monitors waiting to run. Generally, this can be done by reducing the run frequency of some monitors.

SiteScope Server Health

The Health of SiteScope Server Monitor is the equivalent of a SiteScope monitor group that monitors server resources on the server where SiteScope is running. This includes monitors for CPU, disk space, memory, and key processes. A problem with resource usage on the SiteScope server may be caused by monitors with configuration problems or may simply indicate that a particular SiteScope is reaching it performance capacity. For example, high CPU usage by SiteScope may indicate that the total number of monitors being run is reaching a limit. High disk space usage may indicate that the SiteScope monitor data logs are about to exceed the capacity of the local disk drives (see Log Preferences for SiteScope data logging options).