23. Monitoring Interface

The AMPS monitoring interface has two distinct components:

  1. A basic monitoring interface that provides statistics for the AMPS instance in common machine-readable formats. This interface also provides administrative functions, such as enabling and disabling transports, disconnecting clients, and upgrading and downgrading replication links.
  2. The AMPS Galvanometer, a browser-based monitoring tool that shows a graphical representation of the statistics for AMPS. The Galvanometer includes information about replication flow across the set of connected instances. It includes the ability to enter subscriptions and queries, and display the results in a grid.

Statistics Collection

Most data provided in the AMPS monitoring interface is collected at the interval specified in the statistics configuration.

This means that the statistics presented are not a continuous sample, or a sample when “all operations are complete”. Instead, the statistics reflect the point in time when the statistics are collected.

When monitoring statistics, consider not only the values reported, but the change in values over time and how those values compare to other values.

For example, the transport_rx_queue for a client connecting over the network indicates the number of bytes currently in the TCP buffer for a given connection. However, a single sample that shows a nonzero count for this metric simply means that, at the time statistics were collected, there were bytes in that buffer. Whether this indicates a problem or not depends on the context. If the next sample for that client shows that AMPS is consuming a large number of bytes for the client, then the fact that there are bytes in the queue simply means that the connection is active. On the other hand, if the bytes_in (total bytes consumed for this connection) and bytes_in_per_second (bytes per second consumed for this connection, averaged over the last statistics interval) indicate a slowdown, that might be cause for concern. Likewise, an empty transport_rx_queue does not mean that no messages are being received for the client if bytes_in and bytes_in_per_second show the expected traffic.

Likewise, a client that connects, runs a query, consumes the results and disconnects in a period of time less than the statistics interval, may not be captured in the statistics database at all. If the client connection does not exist at any of the times that statistics are recorded, AMPS will not capture statistics for that client connection, so that client will not appear in the statistics database.

Configuration

The AMPS monitoring interface is defined in the configuration file used on AMPS start up. Below is an example configuration of the Admin tag.

<!-- Configure the monitoring interface: this
     starts an http server in the AMPS process. -->

<Admin>
    <FileName>stats.db</FileName>
    <InetAddr>localhost:8085</InetAddr>
    <Interval>10s</Interval>
</Admin>

In this example localhost is the hostname and 8085 is the port assigned to the monitoring interface. With this configuration:

http://localhost:8085/ Root URI for Galvanometer.
http://localhost:8085/amps Root URI for simple monitoring interface.

The Interval tag is used to set the update interval for the AMPS monitoring interface. In this example, statistics will be updated every 10 seconds.

Tip

By default, AMPS will store the monitoring interface database information in system memory. If the AMPS instance is going to be up for a long time, or the monitoring interface statistics interval will be updated frequently, or if this is a production system where it is important to be able to troubleshoot problems, it is strongly recommended that the FileName setting be specified to allow persistence of the data to a local file. See the AMPS Configuration Reference for more information.

The basic monitoring interface is accessible through a web browser, but also follows a Representational State Transfer (RESTful) URI style for programmatic traversal of the directory structure of the monitoring interface.

Basic Monitoring Interface

The basic monitoring interface is useful for examining many important aspects about an AMPS instance. This includes health and monitoring information for the AMPS engine as well as the host AMPS is running on. All of this information is designed to be easily accessible to make gathering performance and availability information from AMPS easy. The monitoring interface also provides easy access to perform administrative actions.

The root of the AMPS Monitoring interface URI contains the following child resources:

  • The host resource provides information about the current operating system state
  • The instance resource provides information about the instance of AMPS
  • The administration resource provides access to functions that modify the state of the instance (such as disconnecting a client)

The information in the monitoring database is taken from the statistics database for the AMPS instance. AMPS provides actions for managing the statistics database, as described in the section on Actions, under Manage the Statistics Database.

The fields provided through the basic monitoring interface (and the statistics database) are described in the AMPS Monitoring Reference.

Historical Time Based Selection

AMPS keeps a history of the monitoring interface statistics, and allows that data to be queried. By selecting a leaf node of the monitoring interface resources, a time-based query can be constructed to view a historical report of the information.

A time-based query is created by appending either one or both of the t0 or t1 query parameters to a url of the admin REST interface.

For example, if an administrator wanted to see the number of messages per second consumed by all processors from midnight UTC on November 30, 2011 until 23:25:00 UTC on November 30, 2011, then pointing a browser to:

http://localhost:8085/amps/instance/processors/all/messages_received_per_sec?t0=20111130T0&t1=20111130T232500

will generate the report and output it in the following plain text format (note: entire dataset is not presented, but is truncated).

20111130T000000.000000Z,0
20111130T000010.000000Z,0
20111130T000020.000000Z,0
20111130T000030.000000Z,94244
20111130T000040.000000Z,304661
20111130T000050.000000Z,301078
20111130T000100.000000Z,308922
20111130T000110.000000Z,306177
20111130T000120.000000Z,302140
20111130T000130.000000Z,302390
20111130T000140.000000Z,307637
20111130T000150.000000Z,310109
20111130T000200.000000Z,309888
20111130T000210.000000Z,299993
20111130T000220.000000Z,310002
20111130T000230.000000Z,300612
20111130T000240.000000Z,299387

All times used for the report generation and presentation are ISO-8601 formatted. ISO-8601 formatting is of the following form: YYYYMMDDThhmmss, where YYYY is the year, MM is the month, DD is the year, T is a separator between the date and time, hh is the hours, mm is the minutes and ss is the seconds. Decimals are permitted after the ss units.

All times used for the report generation and presentation are stored and returned in UTC time.

Tip

As discussed in the following sections, the date-time range can be used with plain text (html), comma-separated values (csv), json, and XML formats.

Time Based Query Behavior

All times used for the t0 and t1 parameters must be ISO-8601 formatted. ISO-8601 formatting is of the following form: YYYYMMDDThhmmss, where YYYY is the year, MM is the month, DD is the year, T is a separator between the date and time, hh is the hours, mm is the minutes and ss is the seconds. Decimals are permitted after the ss units.

All times used for the t0 and t1 parameters must be in UTC time. All times in the admin interface are stored and compared in UTC time.

The behavior of time based queries is affected by the combination of the t0 and t1 query parameters. These behaviors are described in the table below:

Query Parameter Values Behavior
Only t0 is set The result set is the list of values recorded from time t0 until the latest recorded admin interval. The range is inclusive.
Only t1 is set The result set is the list of values recorded from the first admin interval recorded in the stats.db of the AMPS instance until time t1. The range is inclusive.
Both t0 and t1 are set to different timestamp values The result set is the list of values recorded starting from time t0 until time t1. This range is inclusive.
Both t0 and t1 are set to the same timestamp value

Returns a single value that represents the recorded value of that statistic at the specific admin interval set by t0 and t1.

Important

Unlike the other time query behaviors, when selecting a single data point by setting t0 and t1 to the same value, that value must be a timestamp that is present in the statistics. Valid timestamps can be obtained via the instance/timestamp admin API endpoint.

Leaf Nodes

A leaf node of the monitoring interface represents a single recorded statistic. Leaf nodes fully support time-range selections using the t0 and t1 parameters as described above.

Examples of leaf node endpoints include:

http://localhost:8085/amps/instance/processors/all/messages_received_per_sec

or

http://localhost:8085/amps/instance/transaction_log/write_latency

Non-Leaf Nodes

A non-leaf node represents an aggregate of related statistics. Non-leaf nodes do not support time-range selections.

Non-leaf nodes support historical queries for a specific valid historical admin interval timestamp. A query for a specific timestamp is achieved by setting the t0 and t1 parameters to the same timestamp value as described above.

Examples of non-leaf node endpoints include:

http://localhost:8085/amps/instance/processors/all

or

http://localhost:8085/amps/instance/transaction_log

Output Formatting

The AMPS monitoring interface offers several possible output formats to ease the consumption of monitoring reporting data. The possible options are XML, CSV and RNC output formats, each of which is discussed in more detail below.

XML Document Output

All monitoring interface resources can have the current node, along with all child nodes, list its output as an XML document by appending the .xml file extension to the end of the resource name. For example, if an administrator would like to have an XML document of all of the currently running processors, including all the relevant statistics about those processors, then the following URI will generate that information:

http://localhost:8085/amps/instance/processors/all.xml

The document that is returned will be similar to the following:

<amps>
    <instance>
        <processors>
            <processor id='all'>
                <denied_reads>0</denied_reads>
                <denied_writes>0</denied_writes>
                <description>AMPS Aggregate Processor Stats</description>
                <last_active>1855</last_active>
                <matches_found>0</matches_found>
                <matches_found_per_sec>0</matches_found_per_sec>
                <messages_received>0</messages_received>
                <messages_received_per_sec>0</messages_received_per_sec>
                <throttle_count>0</throttle_count>
            </processor>
        </processors>
    </instance>
</amps>

Appending the .xml file extension to any AMPS monitoring interface resource will generate the corresponding XML document.

CSV Document Output

The .csv file extension can be appended to any leaf node resource to have a CSV file generated to examine those values.

This can also be coupled with the time range selection to generate reports. See Historical Time Based Selection for more details on time range selection.

Below is a sample of the .csv output from the monitoring interface from the following URL:

http://localhost:8085/amps/instance/processors/all/matches_found_per_sec.csv?t0=20230830T0

This resource will create a file with the following contents:

20230830T000000.000000Z,94244
20230830T000010.000000Z,304661
20230830T000020.000000Z,301078
20230830T000030.000000Z,304661
20230830T000040.000000Z,0
20230830T000050.000000Z,0
20230830T000100.000000Z,0
20230830T000110.000000Z,0
20230830T000120.000000Z,302390
20230830T000130.000000Z,307637
20230830T000140.000000Z,0
20230830T000150.000000Z,0
20230830T000200.000000Z,0

Leaf Nodes

A leaf node of the monitoring interface represents a single recorded statistic. Leaf nodes do support CSV document output.

Examples of leaf node endpoints include:

http://localhost:8085/amps/instance/processors/all/messages_received_per_sec

or

http://localhost:8085/amps/instance/transaction_log/write_latency

Non-Leaf Nodes

A non-leaf node represents an aggregate of related statistics. Non-leaf nodes do not support CSV document output. Due to the limitations of the tabular format of CSV, there is no clear way to translate the hierarchical structure of a non-leaf node into a CSV document.

Examples of non-leaf node endpoints include:

http://localhost:8085/amps/instance/processors/all

or

http://localhost:8085/amps/instance/transaction_log

JSON Document Output

All monitoring interface resources can have the current node, along with all child nodes, list its output as a JSON document by appending the .json file extension to the end of the resource name. For example, if an administrator would like to have a JSON document of all of the CPUs on the server, including all the relevant statistics about those CPUs, then the following URI will generate that information:

http://localhost:8085/amps/host/cpus.json

The document that is returned will be similar to the following:

{
    "amps": {
        "host": {
            "cpus": [
                {
                    "id":"all",
                    "idle_percent":"62.452316076294",
                    "iowait_percent":"0.490463215259",
                    "system_percent":"10.681198910082",
                    "user_percent":"26.376021798365"
                },
                {
                    "id":"cpu0",
                    "idle_percent":"75.417130144605",
                    "iowait_percent":"0.333704115684",
                    "system_percent":"7.563959955506",
                    "user_percent":"16.685205784205"
                },
                {
                    "id":"cpu1",
                    "idle_percent":"50.000000000000",
                    "iowait_percent":"0.642398286938",
                    "system_percent":"13.597430406852",
                    "user_percent":"35.760171306210"
                }
            ]
        }
    }
}

Appending the .json file extension to any AMPS monitoring interface resource will generate the corresponding JSON document.

RNC Document Output

AMPS supports generation of an XML schema via the Relax NG Compact (RNC) specification language. To generate an RNC file, enter the following URL in a browser:

http://localhost:port/amps.rnc

AMPS will display the RNC schema.

To convert the RNC schema into an XML schema, first save the RNC output to a file:

%> wget http://localhost:9090/amps.rnc

The output can then be converted to an xml schema using Trang (available at http://code.google.com/p/jing-trang/) with:

trang -I rnc -O xsd amps.rnc amps.xsd

Galvanometer

The AMPS Galvanometer provides an extensive set of visualizations of the state of the instance. Galvanometer also provides the ability to query the instance and display the results.

Using TLS/SSL with Galvanometer

When the Admin interface is configured to use TLS/SSL, Galvanometer will also use TLS/SSL with the certificate and key file specified.

For the replication graph to be correctly displayed, the instances that replicate to each other must either all use TLS/SSL for the Admin interface or none of the instances can use TLS/SSL for the Admin interface.

If some of the instances in the replication graph use TLS/SSL for the Admin interface and some do not, the information shown in the replication graph will be incomplete.

Authorization and Entitlement in Galvanometer

In order to enable Galvanometer to provide credentials to the AMPS instance (in case it is required to access AMPS monitoring information), the special WWWAuthenticate option is supported. This option specifies how credentials will be provided to AMPS.

The option can have the following values:

  • Negotiate (Kerberos)
  • NTLM
  • Basic realm=”<SECURITY_DOMAIN>” (Basic Auth)

When using Negotiate or NTLM, Galvanometer will automatically supply corresponding authorization tokens to AMPS. If Basic Auth is used for authorization, the Login/Password dialog will require a user to enter credentials.

<Admin>
    ...

    <WWWAuthenticate>Basic realm="AMPS Admin"</WWWAuthenticate>

    ...
</Admin>

Using Anonymous Paths

The AnonymousPaths option allows Galvanometer to bypass authentication and/or entitlement for Admin paths that match a regular expression. For resources that match the AnonymousPaths option, the Admin interface does not require authentication and does not check entitlements.

The most common use of AnonymousPaths is to allow Galvanometer to correctly display the replication graph when the instance is configured to use Negotiate or NTLM for authorization. Galvanometer determines the replication graph by polling the instances that participate in replication. Since most browsers disallow sending cross-domain authorization tokens, it is necessary to provide access to replication paths without requiring authorization for Galvanometer to be able to display the replication graph. For installations that use Negotiate or NTLM, Galvanometer may not be allowed to construct a replication graph if this option is not set.

AnonymousPaths can also be used to provide access to a specific resource, without allowing access to any other information in the Admin interface. For example, an instance might specify ^/amps$ for unauthenticated users to be able to verify that the instance is running and processing Admin requests, but without allowing those users to obtain any other data about the instance.

The following example shows how to add an AnonymousPaths directive that allows any connection to access replication information about the instance.

<Admin>
   <!-- ... other configuration here ... -->

   <!--  Specify anonymous paths. In this
         case, allow any user to access replication
         information -->

   <AnonymousPaths>^/amps/instance/replication</AnonymousPaths>
</Admin>

The AnonymousPaths option is disabled by default.

Make Replication Page Work with NTLM / Negotiate Authentication

When using Negotiate or NTLM for authorization and/or entitlement, it prevents Galvanometer from correctly displaying replication graphs by forbidding access to destination instances of AMPS since most browsers disallow sending cross-domain authorization tokens that are required in order to authorize AJAX data requests from a browser.

Enabling Queries and Subscriptions in Galvanometer

Much of the functionality available in Galvanometer uses the basic monitoring interface.

Galvanometer submits queries and subscriptions to AMPS using the websocket protocol. To use these functions in Galvanometer, you must provide the name of a Transport of type websocket for Galvanometer to use.

For example, the following directive specifies that Galvanometer will use the Transport with the Name of websocket-any to submit commands to AMPS.

<Admin>
   <!-- ... existing configuration ... -->

   <!-- look up the transport named websocket-any in
        this config file, and make connections to
        that Transport for sending commands to AMPS -->
   <SQLTransport>websocket-any</SQLTransport>
</Admin>

The configuration block above requires that the AMPSConfig file contains a Transport with the Name of websocket-any of Type websocket.

When this configuration item is specified, Galvanometer will enable the query and subscription capabilities, and submit commands to AMPS over the specified Transport.

For example, the websocket-any transport referenced in the snippet above might be defined as follows:

<Transports>
   <!-- ...
        existing transports remain -->

    <Transport>
        <Name>websocket-any</Name>
        <Protocol>websocket</Protocol>
        <Type>tcp</Type>
        <InetAddr>9008</InetAddr>
    </Transport>

</Transports>

Notice that Galvanometer connects as a client using this Transport. There is no special transport or protocol for Galvanometer, and the security configured for the instance (or the Transport) applies to Galvanometer.

Caution

If the Transport is configured to use TLS/SSL, it must use certificates signed by a certificate authority (CA) known to the browser that will be used to access AMPS. For security reasons, browsers disallow self-signed certificates by default. This means that, although a client application may be able to connect, a browser will not allow a websocket connection to a transport that uses a self-signed certificate.

Queries and Subscriptions with Basic Auth in Galvanometer

When Basic Auth is used for authorization and entitlement, an additional option TrustedAdmin allows Galvanometer to use a valid session cookie created after successful authorization for queries and subscriptions. This option forces AMPS to reuse credentials supplied by Galvanometer for websocket connections created by Galvanometer.

<Protocols>

    ...

    <Protocol>
        <Name>websocket-portal</Name>
        <Module>websocket</Module>

        <!-- disabled by default -->
        <TrustedAdmin>enabled</TrustedAdmin>

    </Protocol>

    ...

</Protocols>

TrustedAdmin is only supported by the websocket-based protocols and is disabled by default.

Disabling Galvanometer

Galvanometer is enabled in the monitoring interface by default. To disable Galvanometer, add the following directive to the Admin configuration block:

<Admin>
   <!-- ... existing configuration ... -->
   <Galvanometer>disabled</Galvanometer>
</Admin>

Disabling Galvanometer with this configuration item has no effect on the basic monitoring interface.