4. State of the World (SOW)¶
One of the core features of AMPS is the ability to persist the most recent update for each message matching a topic. The State of the World can be thought of as a database where messages published to AMPS are filtered into topics, and where the topics store the latest update to a message. Since AMPS subscriptions are based on the combination of topics and filters, the State of the World (SOW) gives subscribers the ability to quickly resolve any differences between their data and updated data in the SOW by querying the current state of a topic, or a set of messages inside a topic.
How Does the State of the World Work?¶
Much like tables in a relational database, topics in the AMPS State of the World persist the most recent update for each message. AMPS identifies a message by using a unique key for the message. The SOW key for a given message is similar to the primary key in a relational database: each value of the key is a unique message. The first time a message is received with a particular SOW key, AMPS adds the message to the SOW. Subsequent messages with the same SOW key value update the message.
There are several ways to create a SOW key for a message:
- Most applications specify that AMPS assigns a SOW key based on the content of the message. The fields to use for the key are specified in the SOW topic definition, and consist of one or more XPath expressions. AMPS finds the specified fields in the message and computes a SOW key based on the name of the topic and the values in these fields. 60East recommends this approach unless an application has a specific need for a different approach.
- A topic can also be configured to require that a publisher provide a SOW key for each message when publishing the message to AMPS.
- AMPS also supports the ability for custom SOW key generation logic to be defined in an AMPS module, which will be invoked to generate the SOW key for each message. While these SOW keys are generated automatically by AMPS, rather than being provided by the publisher, the logic to generate these keys is provided by the module, and the configuration required (if any) is determined by the module.
The following diagrams demonstrate how the SOW works, using a SOW topic
that is configured to have AMPS determine the SOW key based on the
/orderId
field within the message. As each message comes in, AMPS
uses the contents of the /orderId
field to generate a SOW key for
the message. The SOW key is used to identify unique records in the SOW,
so AMPS will store a distinct record for each distinct /orderId
value published to this topic. The calculated SOW key will be returned
in the SowKey
header of messages received from the topic in the SOW.
Figure 4.1: A SOW topic named ORDERS with a key definition of /orderId
In Figure 4.1, two messages are published where neither of the
messages have matching keys existing in the ORDERS
topic, the
messages are both inserted as new messages. Some time after these
messages are processed, an update comes in for the order with an
orderId
of 2
. This message changes the price from 120 to 95.
Since the incoming message has an orderId
of 2, this matches an
existing record and overwrites the existing message for the same SOW
key, as seen in Figure 4.2. AMPS replaces the entire record
with the contents of the update.
Figure 4.2: Updating the IBM record by matching incoming message keys
Although the SOW key is derived from the content of the message in many cases, the SOW key is distinct from the content of the message. Each record in a SOW topic has a distinct SOW key, which is stored with the record.
By default, a topic recorded in the State of the World is persistent. For these topics, AMPS stores the contents of the state of the world for that topic in a dedicated, memory-mapped file. This means that the total state of the world does not need to fit into memory, and that the contents of the state of the world database are maintained across server restarts. You can also define a transient state of the world topic, which does not store the contents of the SOW to a persisted file.
The state of the world file is separate from the transaction log, and you do not need to configure a transaction log to use a SOW. When a transaction log is present that covers the SOW topic, on restart AMPS uses the transaction log to keep the SOW up to date. When the latest transaction in the SOW is more recent than the last transaction in the transaction log (for example, if the transaction log has been deleted), AMPS takes no action. If the transaction log has newer transactions than the SOW, AMPS replays those transactions into the SOW to bring the SOW file up to date. If the SOW file is missing or damaged, AMPS rebuilds the state of the world by replaying the transaction log from the beginning of the log.
Important
When a SOW topic is persistent
, each topic must be stored in
a separate file. Only one instance of AMPS can access a given
file; the same copy of the SOW file cannot be used by multiple
instances of AMPS.
When the State of the World for a topic is transient, AMPS does not store the state of the world for this topic across restarts. In this case, AMPS will synchronize the state of the world with the transaction log when the server starts by default. You can use the RecoveryPoint configuration option to specify that the topic should have only new publishes, or should recover from a specific point in time (for example, you could use an environment variable to provide a timestamp to the RecoveryPoint so that AMPS recovers only the last day’s worth of messages.)
Queries¶
At any point in time, applications can issue SOW queries to retrieve all of the messages that match a given topic and content filter. When a query is executed, AMPS will test each message in the SOW against the content filter specified and all messages matching the filter will be returned to the client. The topic can be a literal topic name or a regular expression pattern. For more information on issuing queries, please see SOW Queries in the AMPS User Guide.
Configuration¶
Topics where SOW persistence is desired are individually configured
within the SOW
section of the configuration file. Each topic will be
defined with a Topic
section enclosed within SOW
. The AMPS
Configuration Reference contains a description of the attributes that
can be configured per topic. TopicMetaData
is a synonym for SOW
provided for compatibility with previous versions of AMPS. Likewise,
TopicDefinition
is a synonym for the Topic
element of the SOW
section, provided for compatibility with versions of AMPS prior to 5.0.
For the set of configuration options available in a SOW topic, see SOW/Topic in the AMPS Configuration Reference.
The listing in Example 4.1
is an example of using Topic
to add a SOW topic to the AMPS configuration. One topic named
ORDERS
is defined as having key /invoice
, /customerId
and MessageType
of json
. The persistence file for this topic be saved in the
sow/ORDERS.json.sow
file. For every message published to the
ORDERS
topic, a unique key will be assigned to each record with a
unique combination of the fields /invoice
and /customerId
. A
second topic named ALERTS
is also defined with a MessageType
of
xml
keyed off of /client/id
. The SOW persistence file for
ALERTS
is saved in the sow/ALERTS.xml.sow
file.
<SOW>
<Topic>
<Name>ORDERS</Name>
<FileName>sow/%n.sow</FileName>
<Key>/invoice</Key>
<Key>/customerId</Key>
<MessageType>json</MessageType>
<SlabSize>1MB</SlabSize>
<HashIndex>
<Key>/region</Key>
</HashIndex>
</Topic>
<Topic>
<Name>ALERTS</Name>
<FileName>sow/%n.sow</FileName>
<Key>/alert/id</Key>
<MessageType>xml</MessageType>
<!-- Pregenerate an index for the /alert/type element. This is seldom necessary,
since AMPS will generate the index when it is needed, but the directive is included here
for example purposes. -->
<Index>/alert/type</Index>
</Topic>
</SOW>
Example 4.1: Sample SOW Configuration
Tip
Topics are scoped by their message type.
For example, two topics named Orders can be created one which
supports MessageType
of json
and another which supports
MessageType
of xml
.
Each of the MessageType
entries that are defined for the
Orders
topic will require that Transport
in the
configuration file can accept messages of that type. Otherwise,
there is no way for a publisher to publish messages of that type
to this instance or for a subscriber to receive messages of that
type from this instance.
This means that messages published to the Orders
topic must
know the type of message they are sending (json
or xml
)
and the port defined by the transport.