7. State of the World (SOW)

One of the core features of AMPS is the ability to persist the most recent update for each distinct message published to a given topic. The State of the World (SOW) can be thought of as a database where messages published to AMPS are filtered into topics, and where the topics store the latest update to each distinct message. The State of the World gives subscribers the ability to quickly resolve any differences between their data and updated data in the SOW by querying the current state of a topic, or any set of messages inside a topic. Topics recorded in the State of the World are also used for caching data, providing “point in time” snapshots of active data flows, providing key/value stores over data flows, and so on. Topics recorded in the State of the World are the underlying sources for AMPS aggregation and analytics capabilities, and the ability to store the previous state of a message is the foundation of advanced messaging features such as delta messaging and out of focus notifications.

AMPS also provides the ability to keep historical snapshots of the contents of the State of the World, which allows subscribers to query the contents of the SOW at a particular point in time and replay changes from that point in time.

AMPS can maintain the SOW for a topic in a persistent file, which will be available across restarts of the AMPS server. The SOW can also be transient, in which case the state of the SOW does not persist across server restarts.

Topics do not keep the current values in the SOW by default. To provide this capability for a topic, you must configure AMPS to maintain the topic in the State of the World by adding a definition for the Topic to the SOW section of the AMPS configuration file.

How Does the State of the World Work?

Much like tables in a relational database, topics in the AMPS State of the World persist the most recent update for each message. AMPS identifies a message by using a unique key for the message. The SOW key for a given message is similar to the primary key in a relational database: each value of the key is a unique message. The first time a message is received with a particular SOW key, AMPS adds the message to the SOW. Subsequent messages with the same SOW key value update the message.

There are several ways to create a SOW key for a message:

  • Most applications specify that AMPS assigns a SOW key based on the content of the message. The fields to use for the key are specified in the SOW topic definition, and consist of one or more XPath expressions. AMPS finds the specified fields in the message and computes a SOW key based on the name of the topic and the values in these fields. 60East recommends this approach unless an application has a specific need for a different approach.
  • A topic can also be configured to require that a publisher provide a SOW key for each message when publishing the message to AMPS.
  • AMPS also supports the ability for custom SOW key generation logic to be defined in an AMPS module, which will be invoked to generate the SOW key for each message. While these SOW keys are generated automatically by AMPS, rather than being provided by the publisher, the logic to generate these keys is provided by the module, and the configuration required (if any) is determined by the module.

The following diagrams demonstrate how the SOW works, using a SOW topic that is configured to have AMPS determine the SOW key based on the /orderId field within the message. As each message comes in, AMPS uses the contents of the /orderId field to generate a SOW key for the message. The SOW key is used to identify unique records in the SOW, so AMPS will store a distinct record for each distinct /orderId value published to this topic. The calculated SOW key will be returned in the SowKey header of messages received from the topic in the SOW.

../_images/sow_overview_1.svg

Figure 7.1: A SOW topic named ORDERS with a key definition of /orderId

In Figure 7.1, two messages are published where neither of the messages have matching keys existing in the ORDERS topic, the messages are both inserted as new messages. Some time after these messages are processed, an update comes in for the order with an orderId of 2. This message changes the price from 120 to 95. Since the incoming message has an orderId of 2, this matches an existing record and overwrites the existing message for the same SOW key, as seen in Figure 7.2. AMPS replaces the entire record with the contents of the update.

../_images/sow_overview_2.svg

Figure 7.2: Updating the IBM record by matching incoming message keys

Although the SOW key is derived from the content of the message in many cases, the SOW key is distinct from the content of the message. Each record in a SOW topic has a distinct SOW key, which is stored with the record.

By default, a topic recorded in the State of the World is persistent. For these topics, AMPS stores the contents of the state of the world for that topic in a dedicated, memory-mapped file. This means that the total state of the world does not need to fit into memory, and that the contents of the state of the world database are maintained across server restarts. You can also define a transient state of the world topic, which does not store the contents of the SOW to a persisted file.

The state of the world file is separate from the transaction log, and you do not need to configure a transaction log to use a SOW. When a transaction log is present that covers the SOW topic, on restart AMPS uses the transaction log to keep the SOW up to date. When the latest transaction in the SOW is more recent than the last transaction in the transaction log (for example, if the transaction log has been deleted), AMPS takes no action. If the transaction log has newer transactions than the SOW, AMPS replays those transactions into the SOW to bring the SOW file up to date. If the SOW file is missing or damaged, AMPS rebuilds the state of the world by replaying the transaction log from the beginning of the log.

When the State of the World for a topic is transient, AMPS does not store the state of the world for this topic across restarts. In this case, AMPS will synchronize the state of the world with the transaction log when the server starts by default. You can use the RecoveryPoint contfiguration option to specify that the topic should have only new publishes, or should recover from a specific point in time (for example, you could use an environment variable to provide a timestamp to the RecoveryPoint so that AMPS recovers only the last day’s worth of messages.)

Queries

At any point in time, applications can issue SOW queries to retrieve all of the messages that match a given topic and content filter. When a query is executed, AMPS will test each message in the SOW against the content filter specified and all messages matching the filter will be returned to the client. The topic can be a literal topic name or a regular expression pattern. For more information on issuing queries, please see SOW Queries.

SOW Keys

This section describes AMPS SOW keys in detail, including information on how AMPS generates SOW keys and considerations for applications that generate SOW keys. An individual SOW topic may use either AMPS-generated SOW keys or user-generated SOW keys. Every message in the SOW must use the same type of key generation.

Regardless of how the SOW key is generated, AMPS creates an opaque value from the SOW key and uses this value for efficient lookup internally. For SOW keys that AMPS generates, this opaque value is returned in the message header for SOW messages and is used in commands that reference SOW keys. When the SOW key is provided with a message, AMPS returns the original value in the SOW key header, and the original value is used in commands that reference SOW keys.

For topics that have a SOW key (including views and conflated topics), commands that directly use the SOW for a topic (for example, sow, sow_and_subscribe, sow_delete) can provide a SOW key, or a set of SOW keys with the command. When a set of SOW keys is provided with one of these commands, the command will only operate on messages that have a SOW key in the provided set.

AMPS-Generated SOW Keys

AMPS-generated SOW keys are often the easiest and most reliable way to define the SOW key for a message. The advantages of this approach are that AMPS handles all of the mechanics of generating the key, the key will always match the data in the message, and there is no need for a publisher to be concerned with how AMPS assigns the key. The publisher simply publishes messages, and AMPS handles all of the details.

AMPS generates SOW keys based on the message content when you define one or more Key fields in the SOW configuration. For example, if your SOW tracks unique orders that are identified by an orderId field in the message, you could provide the following Key element in your SOW configuration:

<Key>/orderId</Key>

This configuration item tells AMPS to use that field of the message to generate SOW keys. AMPS supports composite SOW keys when multiple Key elements are provided. For example, the following configuration specifies that every unique combination of /orderId and /customerId is a unique record in the SOW:

<Key>/orderId</Key>
<Key>/customerId</Key>

When AMPS generates a key, it creates the key based on the key domain (which is the name of the topic by default) and the values of the fields specified as SOW keys. AMPS concatenates these values together with a unique separator and then calculates a checksum over the value. This ensures that different values create different keys, and ensures that records in different topics have different keys.

In some cases, you may need AMPS to calculate consistent SOW key values for identical messages even when the messages are published to different topics. The SOW topic definition allows to you to set an explicit key domain in the configuration, which AMPS will use instead of the topic name when generating SOW keys. For example, if your application uses the orderId field of a message as a SOW key in both a ShippingStatus topic and a OpenOrders topic, having AMPS generate a consistent key for the same orderId value may make it easier to correlate messages from those topics in your application. By setting the same KeyDomain value in the Topic configuration for those SOW topics, you can ensure that AMPS generates consistent SOW keys for the same order ID across topics.

An application should treat SOW keys generated with the AMPS default SOW key generator as opaque tokens. The value of a generated SOW key is guaranteed to be consistent for the same fields, values, and key domain. However, an application should not make assumptions as to the specific value that the AMPS default key generator will produce from a given set of values. If an application requires a specific value for the SOW key, the application should generate a SOW key, as described in the following section.

Using Enrichment with SOW Keys

The preprocessor phase of AMPS enrichment occurs before AMPS generates SOW keys for a message. You can use this phase of enrichment to construct fields that are then used to generate the SOW key a message.

Customizing AMPS-Generated SOW Keys

AMPS allows you to customize how the server generates SOW keys for a topic. To customize SOW key generation, you implement a SOW key generator module and specify that the module should be used to generate keys for that SOW topic.

To use a custom SOW key generator, you first load the module in the Modules section of the configuration file, then specify the module as the KeyGenerator for the SOW topic.

<AMPSConfig>
    ...

    <!-- load the module -->
    <Modules>
        <Module>
            <Name>key-generator</Name>
            <Library>libmy_key_generator.so</Library>
        </Module>
    </Modules>

    <!-- use the module to generate keys -->
    <SOW>
        <Topic>
            <Name>custom-keyed-sow</Name>
            <FileName>./sow/%n.sow</FileName>
            <KeyGenerator>
                <Module>key-generator</Module>
                <Options>
                    <OptionOne>module-specific-option</OptionOne>
                    <OptionTwo>another-specific-option</OptionTwo>
                </Options>
            </KeyGenerator>
        </Topic>
    </SOW>

    ...
</AMPSConfig>

For information on implementing a custom SOW key generator, contact 60East support for the AMPS Server SDK.

User-Generated SOW Keys

AMPS allows applications to explicity generate and assign SOW keys. In this case, the publisher calculates the SOW key for the message and includes that key on the message when it is published. AMPS does not interpret the data in the message to decide whether the message is unique: AMPS uses only the value of the SOW key.

When using a user-generated SOW key, applications should consider the following:

  • All publishers should use a consistent method for generating SOW Keys
  • SOW Keys must contain only characters that are valid in Base64 encoding
  • The application must ensure that messages intended to be logically different do not receive the same SOW key

User-generated SOW keys are particularly useful for the binary message type. For this message type, AMPS does not parse the message, so providing an explicit SOW key allows you to create a SOW that contains only binary messages.

To specify that AMPS will require publishers to this topic to submit the SOW key, the Topic configuration does not specify any Key fields and does not specify a KeyGenerator for the topic.

SOW Indexing

AMPS maintains indexes over SOW topics to improve query efficiency. There are two types of indexes available:

  • Memo indexes are created automatically when AMPS needs to use a particular field for a query. These indexes maintain the value of a key, and can be used for any type of query, including regular expression queries, range queries, and comparisons such as less than or greater than. You can also request that AMPS pre-create an index of this type with the Index directive of the SOW topic configuration.

  • Hash indexes are defined by the SOW configuration. These indexes maintain a hash derived from the values provided for the fields in the key. When the topic is configured so that AMPS generates the SOW key, AMPS automatically creates a hash index that contains all of the fields in the SOW Key. You can create any number of hash indexes for a SOW topic, with any combination of fields. Hash index queries are significantly faster than queries using memo indexes.

    The values of hash indexes are always evaluated as strings. Hash indexes are only used for exact matches on the value of the fields or with the IN operator, and only for queries that use the exact set of fields in the hash index. For example, if your configuration specifies a hash index that uses the fields /address/postalCode and /customerType, a filter such as /address/postalCode = '99705' AND /customerType = 'retail' will use the hash index. A filter such as /address/postalCode = '99705' AND /customerType LIKE 'retail|remainder' will not use the hash index, since this filter uses the LIKE operator rather than exact matching.

AMPS uses a hash index for filters wherever possible. If there is no hash index that includes exactly the keys specified in the filter, or if the filter uses operations other than equality comparison, AMPS uses a memo index if one is available. If no memo index is available, AMPS creates one during the query.

If your application frequently uses queries for an exact match on a specific set of fields (for example, retrieving a set of customers by the /address/postalCode field), creating a hash index can significantly improve the speed of those queries.

Removing SOW Records

AMPS allows applications to explicitly remove records from a SOW topic using the sow_delete command.

When removing records from a SOW, there are three different ways to indicate which message, or messages, will be deleted:

  • Using a content filter. AMPS will delete all messages in the SOW that match the content filter. To delete every message in the SOW, use the special filter 1=1 to indicate that the filter is true for every message, regardless of the contents of the message.
  • Using the SOW key assigned to the message. AMPS accepts a list of SOW keys, and will remove the messages indicated by those SOW keys.
  • Using message data. The application provides message data with the sow_delete command. AMPS finds the record that would be updated if the command were a publish, and deletes that record.

When a record is removed from the SOW, AMPS sends an out-of-focus (OOF) message to any subscriptions that have requested OOF notifications. AMPS also updates any views that use the SOW topic, and the record will be removed from conflated topics at the next conflation interval.

When the SOW is configured with the History option to enable historical queries, the sow_delete command removes the message from the current set of messages in the SOW. The command does not remove previously-saved versions of the message: the historical state of the SOW is unaffected by the sow_delete.

The most efficient way to delete a specific message or specific set of messages is to use the SOW key that AMPS assigns, when that key is available. You can provide these keys in the SowKeys header (a delete by keys), or by providing a filter expression that is an exact string match on the SOW key field.

SOW Message Expiration

By default, a topic in the SOW stores all distinct records until a record is explicitly deleted. For scenarios where message persistence needs to be limited in duration, AMPS provides the ability to set a time limit on the lifespan of SOW topic messages. This limit on duration is known as message expiration and can be thought of as a “Time to Live” feature for messages stored in a SOW topic.

Usage

Expiration on SOW topics is disabled by default. For AMPS to expire messages in a SOW topic, you must explicitly enable expiration on the SOW topic.

There are two ways message expiration time can be set. First, a topic recorded in the SOW can specify a default lifespan for all messages stored for that topic. Second, each message can provide an expiration as part of the message header.

AMPS stores the expiration time for each message individually, as a property of the message in the SOW. The expiration for a given message is first determined based on the message expiration specified in the message header. If a message has no expiration specified in the header, then the message will inherit the expiration setting for the topic expiration. If there is no message expiration and no topic expiration, then it is implicit that a SOW topic message will not expire. When an expiration of 0 is provided in the message header, this indicates that AMPS should not provide expiration for this message.

Enabling Expiration for a Topic

AMPS configuration supports the ability to specify a default message expiration for all messages in a single SOW topic. Below is an example of a configuration section for a SOW topic definition with an expiration. Chapter 7 has more detail on how to configure the SOW topic.

<SOW>
    <Topic>
        <Name>ORDERS</Name>
        <FileName>sow/%n.sow</FileName>

        <Expiration>30s</Expiration>

        <Key>/55</Key>
        <Key>/109</Key>
        <MessageType>fix</MessageType>
  </Topic>
</SOW>

Example 7.1: Topic Expiration

In this case, messages with no lifetime specified on the message have a 30 second lifetime in the SOW. When a message arrives and that message has an expiration set, the message expiration overrides the default expiration for the topic.

AMPS also allows you to enable expiration on a SOW topic, but to only expire messages that have message-level expiration set:

<SOW>
    <Topic>
        <Name>ORDERS</Name>
        <FileName>sow/%n.sow</FileName>

        <Expiration>enabled</Expiration>

        <Key>/55</Key>
        <Key>/109</Key>
        <MessageType>fix</MessageType>
    </Topic>
</SOW>

Example 7.2: Topic Expiration

With this configuration file, expiration is enabled for the topic. The message lifetime is specified on each individual message. When expiration is disabled for a SOW topic, AMPS preserves any message expiration set on an individual message but does not expire messages.

AMPS processes expirations during startup when SOW expiration is enabled. This means that any record in the SOW which needs to be expired will be expired as AMPS starts. Notice that if expiration has been disabled in the configuration file, AMPS will not process expiration for the topic.

Setting Expiration for a Message

When expiration is enabled for a topic in the SOW, each message published to that topic expires at the configured time by default.

Individual messages have the ability to specify the expiration for that individual message. When an expiration time is provided on a message, that value overrides the default expiration set for the topic. For example, the SOW configuration for a topic might specify an expiration of 5 minutes for a pending order. For large orders, however, a publisher might explicitly prevent messages from expiring by providing a 0 for the expiration time when publishing the message.

AMPS does not process expiration for any messages in a topic recorded in the SOW unless expiration is enabled for the topic. When expiration is not configured for a topic, messages published to that topic do not expire, regardless of the expiration setting on an individual message.

Example Message Lifecycle

When a message arrives, AMPS calculates the expiration time for the message and stores a timestamp at which the message expires in the SOW with the message. When the message contains an expiration time, AMPS uses that time to create the timestamp. When the message does not include an expiration time, if the topic contains an expiration time, AMPS uses the topic expiration for the message. Otherwise, there is no expiration set on the message, and AMPS records a timestamp value that indicates no expiration.

Messages in the SOW topic can receive updates before expiration. When a message is updated, the message’s expiration lifespan is reset. For example, a message is first published to a SOW topic with an expiration of 45 seconds. The message is updated 15 seconds after publication of the initial message, and the update resets the expiration to a new 45 second lifespan. This process can continue for the entire lifespan of the message, causing a new 45 second lifespan renewal for the message with every update.

If a message expires, then the message is deleted from the SOW topic. This event will trigger delete processing to be executed for the message, similar to the process of executing a sow_delete command on a message stored in a SOW topic.

Recovery and Expiration

When using message expiration, one common scenario is that the message has an expiration set, but the AMPS instance is shut down during the lifetime of the message.

To handle such a scenario, AMPS calculates and stores a timestamp for the expiration, as described above. Therefore, if the AMPS instance is shutdown, then upon recovery the engine will check to see which messages have expired since the occurrence of the shutdown. Any expired messages will be deleted as soon as possible.

Notice that, because the timestamp is stored with each message, changing the default expiration of a SOW topic does not affect the lifetime of messages already in the SOW. Those timestamps have already been calculated, and AMPS does not recalculate them when the instance is restarted or when the defaults on the SOW topic change. If expiration is not enabled for the topic after the configuration change, AMPS does not process expirations for that topic and messages will not expire.

SOW Maintenance

Applications that store topics in the SOW must consider the ongoing storage needs and file management for the SOW.

There are two aspects to SOW maintenance:

  • Ensuring that the host system has enough capacity to efficiently store and manage the topics in the SOW. Capacity planning guidelines are discussed in Chapter 25 Capacity Planning in the operations section of this Guide.
  • Setting and implementing a data retention policy for the contents of each topic in the SOW.

The data retention policy for a topic in the SOW is determined by the needs of your application. Consider the following questions:

  1. Does the topic have a data set that tends to stay at a consistent size? If so, there may be no need to explicitly manage data retention. Many AMPS applications have topics that fall into this category.

    For example, an application that uses a SOW topic to track the current price of a specific set of ticker symbols has little need to set a data retention policy. The SOW will always contain the same number of records (one for each ticker symbol), and those records will always contain data of a consistent size. The application may choose to remove a record when a symbol is removed from the set, but otherwise rely on publishers to keep the data current.

  2. Is the data only valid for a specific period of time after the data is published? If so, SOW expiration may be a good way to manage the SOW.

    For example, an application that needs to ensure that quotes are removed from the system after 10 minutes from the time the quote is published could use SOW expiration to remove records after 10 minutes.

  3. Is the data valid until a certain condition becomes true? If so, having the application remove records from the SOW or using AMPS actions may be a good way to manage the SOW.

    For example, an application that needs to clear the state of the SOW every 24 hours during a maintenance window could use an action to remove those records. An application that can determine when a record is no longer needed can remove the record immediately, which means that the topic only contains data that the application needs at any given time.

Regardless of the approach an application takes, 60East recommends that every application that uses a SOW consider capacity and explicitly consider the data retention needs of each topic and each the application.

Configuration

Topics where SOW persistence is desired are individually configured within the SOW section of the configuration file. Each topic will be defined with a Topic section enclosed within SOW. The AMPS Configuration Reference contains a description of the attributes that can be configured per topic. TopicMetaData is a synonym for SOW provided for compatibility with previous versions of AMPS. Likewise, TopicDefinition is a synonym for the Topic element of the SOW section, provided for compatibility with versions of AMPS prior to 5.0.

For the set of configuration options available in a SOW topic, see SOW/Topic in the AMPS Configuration Reference.

The listing in Example 7.3 is an example of using Topic to add a SOW topic to the AMPS configuration. One topic named ORDERS is defined as having key /invoice, /customerId and MessageType of json. The persistence file for this topic be saved in the sow/ORDERS.json.sow file. For every message published to the ORDERS topic, a unique key will be assigned to each record with a unique combination of the fields /invoice and /customerId. A second topic named ALERTS is also defined with a MessageType of xml keyed off of /client/id. The SOW persistence file for ALERTS is saved in the sow/ALERTS.xml.sow file.

<SOW>
    <Topic>
        <Name>ORDERS</Name>
        <FileName>sow/%n.sow</FileName>
        <Key>/invoice</Key>
        <Key>/customerId</Key>
        <MessageType>json</MessageType>
        <SlabSize>1MB</SlabSize>
        <HashIndex>
            <Key>/region</Key>
        </HashIndex>
    </Topic>

    <Topic>
        <Name>ALERTS</Name>
        <FileName>sow/%n.sow</FileName>
        <Key>/alert/id</Key>
        <MessageType>xml</MessageType>
        <!-- Pregenerate an index for the /alert/type element. This is seldom necessary,
             since AMPS will generate the index when it is needed, but the directive is included here
             for example purposes. -->
        <Index>/alert/type</Index>
    </Topic>
</SOW>

Example 7.3: Sample SOW Configuration

tip

Topics are scoped by their message type.

For example, two topics named Orders can be created one which supports MessageType of json and another which supports MessageType of xml.

Each of the MessageType entries that are defined for the Orders topic will require that Transport in the configuration file can accept messages of that type. Otherwise, there is no way for a publisher to publish messages of that type to this instance or for a subscriber to receive messages of that type from this instance.

This means that messages published to the Orders topic must know the type of message they are sending (json or xml) and the port defined by the transport.