4. State of the World (SOW): The Message Database

One of the core features of AMPS is the ability to persist the most recent update for each distinct message published to a topic. To enable this for a topic, you add the topic to the State of the World (SOW).

You can think of the SOW as a database that maintains a specific set of topics, equivalent to tables. Each distinct message published to that topic is equivalent to updating a row in the table. AMPS allows applications to query the table for the current state of the topic.

SOW topics also provide full support for pub/sub messaging. Applications can use a combination of queries and subscriptions as necessary. AMPS also includes a set of commands that perform an atomic query and subscribe, allowing an application to query a SOW topic and register for updates to the topic in a single operation, without risk of missing messages or receiving duplicates.

The most common uses of SOW topics include:

  • Quickly loading initial state for an application. For example, an application that tracks open orders can quickly retrieve a snapshot of all of the orders that are currently open, without having to wait for updates to the orders to be published.
  • Queryable snapshots of data flows. For example, an application that monitors telemetry data may need to quickly determine if any telemetry source has not provided an update within a given period of time. With a SOW topic, the application can run a simple query over the current state of the topic.
  • NoSQL document stores. SOW topics are frequently used as high-performance key/value stores: an application can choose to explicitly provide a key and store a document in the SOW. Documents can be efficiently retrieved by key, queried over the full content of the document, or any combination. As mentioned above, a consumer can retrieve the document and be automatically notified when the content of the document changes.

SOW topics are also the foundation of many of the more advanced capabilities of AMPS, including out-of-focus tracking, aggregation, and delta messaging. These are described later in this chapter.

For applications that are transitioning from topic-based routing and that, therefore, need to maintain the last value per topic for a large number of topics (hundreds, thousands, or more), AMPS provides the ability to reduce the overhead in creating a large number of identical topics that contain a single message. More details on the State of the World are available in the AMPS User Guide.

How Does the State of the World Work?

AMPS SOW topics persist the most recent update for each message, in the same way that a relational database stores the current state of each record. For performance, AMPS SOW topics store the full content of the message verbatim rather than storing a deserialized or “shredded” version of the message.

Each distinct record in a SOW topic is identified by a SOW key. AMPS treats the SOW key for a SOW topic the same way a relational database uses the primary key for a table: each distinct SOW key value is a unique message.

There are several ways to create a SOW key for a message. Each topic defines one of the following strategies:

  • Most applications specify that AMPS will calculate a SOW key based on the content of the message. The configuration of the topic specifies the field, or fields, to be used for the key.
  • A topic can also be configured to require that a publisher provide a SOW key for each message when publishing the message to AMPS. This is less commonly used than determining the key based on the message content, however, since this strategy does not require any explicit configuration, AMPS will default to this strategy for identifying messages if no other strategy is specified.
  • AMPS also supports the ability for custom SOW key generation logic to be defined in an AMPS module, which will be invoked to generate the SOW key for each message.

Although the SOW key is derived from the content of the message in many cases, the SOW key itself is metadata, distinct from the content of the message. Each record in a SOW topic has a distinct SOW key, which is stored with the record.

For example, the diagram below shows how AMPS computes the SOW key for a topic named ORDERS with a key definition of /orderId. For each publish to the topic, AMPS uses the value of the key fields (in this case, simply /orderId) to compute a SowKey, then uses that SowKey to insert or update the appropriate record.

../_images/sow_overview_1.svg

A SOW topic named ORDERS with a key definition of /orderId

Configuration

To create a SOW topic, you configure the topic in the SOW section of the AMPS configuration file.

At a minimum, SOW topics require a Name, and the MessageType of the messages to store in the SOW. If the SOW will be persistent, a FileName is required. Most often, SOW topics use AMPS to generate the SOW Key, and one or more Key definition elements are required to specify the fields that AMPS will use for the SOW Key.

For example, the following configuration file fragment specifies a SOW topic named test-sow. The topic stores JSON-format messages, and uses the /id field of incoming messages to that topic to uniquely identify messages. Records in this topic will be both maintained in memory and persisted to a file in the ./sow/ directory, so the contents of the topic will be retained across restarts of the AMPS instance. Notice that the file name specification uses the special format character %n as a placeholder for the topic name and message type.

<SOW>
   <Topic>
        <Name>test-sow</Name>
        <MessageType>json</MessageType>
        <FileName>./sow/%n.sow</FileName>
        <Key>/id</Key>
    </Topic>
</SOW>

The AMPS User Guide and AMPS Configuration Guide contain full details on configuring a SOW topic.

The practical examples later in this section use the configuration above.

Queries

At any point in time, applications can issue SOW queries to retrieve all of the messages that match a given topic and content filter. When a query is executed, AMPS will test each message in the SOW against the content filter specified and all messages matching the filter will be returned to the client. The topic can be a literal topic name or a regular expression pattern. For more information on issuing queries, please see SOW Queries in the AMPS User Guide.

A SOW query is atomic. Updates that occur while the query is running, or while a client is receiving results, are not returned as part of the query.

Spark: Basic SOW Query Example

Here’s how to use spark to query the current state of an AMPS SOW topic.

This example assumes that:

  • You have configured a topic named test-sow in the AMPS server of message type JSON.
  • The test-sow topic uses the /id field of the message to calculate the key for the topic.

To retrieve the current state of the topic, an application issues the sow command. Unlike a subscription, which stays active until it is explicitly stopped (or the application disconnects), the sow command provides results for a specific point in time. Once the results are returned, the command is over.

First, publish a message or two to the test-sow topic:

  1. Open a new terminal in your Linux environment.

  2. Use the following command (with AMPS_DIR set to the directory where you installed AMPS) to send a single message to AMPS:

    $ echo '{"id":1,"note":"Crank it up with a SOW!"}' | \
      $AMPS_DIR/bin/spark publish -server localhost:9007 \
      -type json -topic test-sow
    
  3. spark automatically connects to AMPS and sends a logon command with the default credentials (the current username and an empty password). With the publish command, spark reads the message from the standard input and publishes the message to the JSON topic test-sow. The command produces output similar to the following line (the rate calculation will likely be different:

    total messages published: 1 (333.33/s)
    
  4. When the publisher sends the message, AMPS parses the message to determine the value of the Key fields in the message, and then either inserts the message for that key, or overwrites the existing message with that key.

  5. You can publish any number of messages this way. Each distinct id value will create a distinct record in the topic.

Next, retrieve the current contents of the topic:

  1. Open a new terminal in your Linux environment.

  2. Use the following command (with AMPS_DIR set to the directory where you installed AMPS) to retrieve the contents of the topic:

    $ $AMPS_DIR/bin/spark sow -server localhost:9007 \
      -type json -topic test-sow
    
  3. spark automatically connects to AMPS and sends a logon command with the default credentials (the current username and an empty password). spark then sends the sow command to AMPS. This command requests the current contents of the test-sow topic. Since the command is finished once the query is complete, spark will exit when the query results are complete.

  4. spark shows the current contents of the topic. Notice that the output is strictly the message data, separated by newline characters. spark does not show any of the metadata for a message.

Atomic Query and Subscribe

When a topic is recorded in the SOW, an application can request the current state of the topic and simultaneously subscribe to updates from the topic. In this case, AMPS first delivers all of the messages that match the query and then provides any update to a record that matches the query. AMPS guarantees that no updates are missed or duplicated between the query and the subscription. As with a simple query, AMPS will test each message currently in the SOW against the content filter specified and all messages matching the filter will be returned to the client. When the query begins, AMPS enters a subscription with the provided filter. After the query completes, AMPS delivers messages from the subscription. In the event that a record is updated while the query is running, AMPS saves the update and delivers it immediately after the query completes.

As with a simple SOW query, the topic can be a literal topic name or a regular expression pattern. For more information on issuing queries, please see SOW Queries in the AMPS User Guide.

Spark: Basic SOW Query and Subscribe Example

Here’s how to use spark to query the current state of an AMPS SOW topic and subscribe to updates.

This example assumes that:

  • You have configured a topic named test-sow in the AMPS server of message type JSON.
  • The test-sow topic uses the /id field of the message to calculate the key for the topic.

To retrieve the current state of the topic and subscribe, an application issues the sow_and_subscribe command. Since the command includes a subscription, the command stays active until it is explicitly stopped (or the application disconnects).

First, publish a message or two to the test-sow topic:

  1. Open a new terminal in your Linux environment.

  2. Use the following command (with AMPS_DIR set to the directory where you installed AMPS) to send a single message to AMPS:

    $ echo '{"id":1,"note":"Crank it up with a SOW!"}' | \
      $AMPS_DIR/bin/spark publish -server localhost:9007 \
      -type json -topic test-sow
    
  3. spark automatically connects to AMPS and sends a logon command with the default credentials (the current username and an empty password). With the publish command, spark reads the message from the standard input and publishes the message to the JSON topic test-sow. The command produces output similar to the following line (the rate calculation will likely be different:

    total messages published: 1 (333.33/s)
    
  4. When the publisher sends the message, AMPS parses the message to determine the value of the Key fields in the message, and then either inserts the message for that key, or overwrites the existing message with that key.

  5. You can publish any number of messages this way. Each distinct id value will create a distinct record in the topic.

Next, retrieve the current contents of the topic:

  1. Open a new terminal in your Linux environment.

  2. Use the following command (with AMPS_DIR set to the directory where you installed AMPS) to retrieve the contents of the topic:

    $ $AMPS_DIR/bin/spark sow_and_subscribe -server localhost:9007 \
      -type json -topic test-sow
    
  3. spark automatically connects to AMPS and sends a logon command with the default credentials (the current username and an empty password). spark then sends the sow_and_subscribe command to AMPS. This command requests the current contents of the test-sow topic and creates a subscription to the topic.

  4. spark shows the current contents of the topic. Notice that the output is strictly the message data, separated by newline characters. spark does not show any of the metadata for a message.

  5. spark remains running after the query completes, waiting for new publishes to arrive.

Publish more messages (or updates to the existing messages) to the topic. In the terminal you opened to publish the first messages:

  1. Use the following command (with AMPS_DIR set to the directory where you installed AMPS) to send a message to AMPS:

    $ echo '{"id":1,"note":"Crank it up with a SOW!"}' | \
      $AMPS_DIR/bin/spark publish -server localhost:9007 \
      -type json -topic test-sow
    
  2. Notice that the subscription receives the message.

If you close the subscriber and re-run it, you will see that the second time the subscriber runs, it receives the updated messages in the query and, again, waits for changes to arrive.

Advanced Messaging and the SOW

A SOW topic is the basis for many of the advanced messaging features in AMPS. While not all of these features are discussed in detail in this introduction, many features of AMPS are made possible because AMPS can retain the current state of each unique message.

The advanced messaging features that the SOW enables include:

  • Views and aggregations over topics (including joins between topics)
  • Publishing incremental updates to a message (called delta publishing in AMPS)
  • Receiving incremental updates to a message (called delta subscription in AMPS)
  • Determining when a message no longer matches a filter (called out-of-focus notification in AMPS)
  • Providing a snapshot of an update to a rapidly changing record at regular intervals, rather than providing every update (called conflation in AMPS)

These features can greatly simplify the processing an application needs to perform, making it easier to develop applications and increasing application performance. However, for a messaging system to provide these features, whenever a message arrives, the messaging system must have access to both the current message and the previous, saved state of the message. SOW topics provide that access for AMPS, and enable the advanced messaging features.

When Should I Store a Topic in the SOW?

Below you will find common uses of a SOW topic, which include examples of practical use cases:

  • An application needs the current state of a record, but does not need to recreate the message flow that created that record:

    An order fulfillment system presents a view of all currently pending orders when the application starts up.

  • An application needs the current state of a record or set of records, even when the topic is high-volume or quickly changing:

    A warehouse management application locates the current inventory level for a product.

    A taxi dispatch company locates taxis currently within 10 blocks of an event.

  • An application wants to be able to publish incremental updates to a record:

    A customer updates her shipping address. All pending orders for the customer are automatically updated without affecting any other information in the order, and processors working with the orders are notified of the change.

  • An application wants to receive only the changed fields of a record:

    A mobile application displays the status of an order as the order progresses through the stages of validation: the application receives only the identifier for the record and the changed fields.

  • An application needs the AMPS server to calculate values based on the current values of a record or set of records:

    A management console constantly calculates the real-time value of pending orders. The console uses a view, calculated based on data saved in a topic in the SOW.

  • An application wants to store application state for quick retrieval:

    An order processing system publishes statistics on each step of the process: a separate process monitors and aggregates those statistics. The SOW also maintains historical state for the topic so the monitor can easily recreate a snapshot of the state at a point in time and compare day over day status.

Of course, the examples above are just a small sample of the ways the AMPS SOW can be used.