8. SOW Queries¶

When SOW topics are configured inside an AMPS instance, clients can issue SOW queries to AMPS to retrieve all of the messages matching a given topic and content filter. When a query is executed, AMPS will test each message in the SOW against the content filter specified and all messages matching the filter will be returned to the client. The topic can be a straight topic or a regular expression pattern.

SOW Queries¶

A client can issue a query by sending AMPS a sow command and specifying an AMPS topic. Optionally a filter can be used to further refine the query results. AMPS also allows you to restrict the query to a specific set of messages identified by a set of SowKeys. When AMPS receives the sow command request, it will validate the filter and start executing the query. When returning a query result back to the client, AMPS will package the sow results into a sow record group by first sending a group_begin message followed by the matching SOW records, if any, and finally indicating that all records have been sent by terminating with a group_end message. AMPS returns the results for a SOW query in a single, atomic operation. Any messages for the client that arrive during the SOW query are delivered after the SOW results.

The message flow for a SOW query is provided as a sequence diagram in Figure 8.1.

For purposes of correlating a query request to its result, each query command can specify a QueryId. The QueryId specified will be returned as part of the response that is delivered back to the client. The group_begin and group_end messages will have the QueryId attribute set to the value provided by the client. The client specified QueryId is what the client can use to correlate query commands and responses coming from the AMPS engine.

AMPS does not allow a sow command on topics that do not have a SOW enabled. If a client queries a topic that does not have a SOW enabled, AMPS returns an error.

Caution

The ordering of records returned by a SOW query is undefined by default. You can also include an OrderBy parameter on the query to specify a particular ordering based on the contents of the messages.

Figure 8.1: SOW Query Sequence Diagram

Historical SOW Queries¶

Topics in the State of the World can also be configured to include historical snapshots of messages, which allows subscribers to retrieve the contents of the topic at a particular point in time.

As with simple queries, a client can issue a query by sending AMPS a sow command and specifying an AMPS topic. For a historical query, the client also adds a timestamp that includes the point in time for the query in the Bookmark header of the command. A filter can be used to further refine the query results based on the message content.

When To Use a Historical SOW Query¶

Use a historical SOW query when it is important to get a snapshot of the state of messages in a topic as they existed at a specific point in time (that is, if it is important for an application to be able to query the state of the world at a point in time).

If an application needs to replay the exact sequence of messages delivered to a topic, but does not need to be able to query the values that were current at a specific point in time, record the topic in the transaction log and replay from the transaction log.

If an application needs to both retrieve a snapshot of the values that were current at a specific point in time and replay the exact sequence of messages from that point forward, use a historical SOW query and record the topic in the transaction log.

Configuring the Topic: Window and Granularity¶

By default, AMPS does not maintain history for a topic in the State of the World. To enable history (and historical query) for the topic, add the History element to the Topic configuration. This element configures how much information AMPS stores for enabling historical queries.

There are two options that control how AMPS stores data for historical queries:

The Window option sets the amount of time that AMPS will retain historical versions of messages. AMPS will remove the historical state of the message from the SOW topic once that historical state is older than the specified window. (If the message has been deleted, and the delete command is older than the specified window, AMPS may remove the message from the SOW topic entirely). AMPS always retains the most current state of a message, even if that state was published earlier than the specified Window.
The Granularity option sets the interval at which AMPS retains a historical copy of a message in the SOW. For example, if the Granularity is set to 10m, AMPS stores a historical copy of the message no more frequently than every 10 minutes, regardless of how many times the message is updated in that 10 minute interval. AMPS stores the copies when a new message arrives to update the SOW. This means that AMPS always returns a valid SOW state that reflects a published message, but – as with a conflated topic – the SOW may not reflect all of the states that a message passes through. This also means that AMPS uses SOW space efficiently. If no updates have arrived for a message, since the last time a historical message was saved, AMPS has no need to save another copy of the message.

When a message is deleted from a topic that maintains history, AMPS saves the fact that the message has been deleted, and queries as of that point in time will not return the message. However, previously saved states of the message within the Window are still present, and can still be queried.

Likewise, if an application queries at a point in time earlier than the Window, AMPS will return an empty result set (even if messages had actually been present in the topic at that point), since the SOW state is only retained for the period in the Window.

Tip

The Granularity for a topic is always specified as a duration. If your application requires that a query be able to return the exact state of the SOW exactly as AMPS would have represented it at that time (with no tolerance for the granularity), you can specify that AMPS keep every message during the Window by specifying a Granularity of 0s. Notice that this is not required to replay every message after a point-in-time query, since replay is delivered from the transaction log rather than the stored State of the World.

When a historical SOW and Subscribe query is entered, and the topic is covered by a transaction log, AMPS returns the state of the SOW adjusted to the next oldest granularity, then replays messages from that point. In other words, AMPS returns the same results as a historical SOW query, then replays the full sequence of messages from that point forward.

The transaction log and the SOW topic are maintained separately, and have separate views of history. When a version of the message is removed from the SOW topic (because it is older than the specified Window), the message remains in the transaction log, but will not be returned by a SOW query.

Message Sequence Flow¶

The message sequence flow is the same as for a simple SOW query. Once AMPS has transmitted the messages that were in the SOW as of the timestamp of the query, the query ends. Notice that the query will include messages that have been subsequently deleted from the SOW, but which were the current state of the message as of that timestamp.

Pagination with Historical SOW Queries¶

Topics that maintain History in the SOW support paginated queries from a point in time. When the topic is also covered by the transaction log, the sow_and_subscribe command also supports paginated subscriptions from a point in time. See Paginated SOW and Subscribe for details.

SOW Query-and-Subscribe¶

AMPS has a special command that will execute a query and place a subscription at the same time to prevent a gap between the query and subscription where messages can be lost. Without a command like this, it is difficult to reproduce the SOW state locally on a client without creating complex code to reconcile incoming messages and state.

For an example, this command is useful for recreating part of the SOW in a local cache and keeping it up to date. Without a special command to place the query and subscription at the same moment, a client is left with two options:

Issue the query request, process the query results, and then place the subscription, which misses any records published between the time when the query and subscription were placed; or
Place the subscription and then issue the query request, which could send messages placed between the subscription and query twice.

Instead of requiring every program to work around these options, the AMPS sow_and_subscribe command allows clients to place a query and get the streaming updates to matching messages in a single command.

In a sow_and_subscribe command, AMPS behaves as if the SOW command and subscription are placed at the exact same moment.The SOW query will be sent before any messages from the subscription are sent to the client. Additionally, any new publishes that come into AMPS that match the sow_and_subscribe filtering criteria and come in after the query started will be sent after the query finishes (and the query will not include those messages.) As with a simple SOW query, any other messages that arrive for the client while the SOW query is running will also be delivered after the query results.

AMPS allows a sow_and_subscribe command on topics that do not have a SOW enabled. In this case, AMPS simply returns no messages between group_begin and group_end.

The message flow as a sequence diagram for sow_and_subscribe commands is contained in Figure 8.2.

Figure 8.2: SOW-And-Subscribe Query Sequence Diagram

SOW Query Response Batching¶

When processing a SOW query, AMPS has the ability to combine messages into batches for more efficient network usage. The maximum number of messages in a batch is determined by the BatchSize parameter on the SOW query command. AMPS defaults to a BatchSize value of 1, meaning AMPS sends one message per batch in the response. The BatchSize is the maximum number of records that will be returned within a single response payload. Each AMPS response for the query contains a BatchSize value in its header to indicate the number of messages in the batch. This number will be anywhere from 1 to BatchSize.

Current versions of the AMPS client libraries set a batch size of 10 when using the named convenience methods (for example, sowAndSubscribe) if no other batch size is specified.

Notice that the format of messages returned from AMPS may be different depending on the message type requested. However, the information contained in the messages is the same for all message types.

Tip

When issuing a sow_and_subscribe command AMPS will return a group_begin and group_end segment of messages before beginning the live subscription sequence of the query. This is also true when a sow_and_subscribe command is issued against a non-SOW topic. When the topic is not in the State of the World, no messages will be delivered between the group_begin and group_end messages.

Using a BatchSize greater than 1 can yield greater performance, particularly when querying a large number of small records. In general, 60East recommends using a BatchSize that provides good network utilization without consuming excessive server memory. Most applications that use small messages set a batch size designed to create batches that fit well into the maximum transmission unit (MTU) for the network. AMPS reports an error if an application requests a batch size larger than 10,000 records (this value is orders of magnitude larger than the typical BatchSize used by applications).

For applications that return a large number of messages that are larger than the MTU, 60East recommends testing performance with a variety of batch sizes. Because the client libraries parse the AMPS headers common to each message once per batch, a batch size larger than 1 can improve processing performance on the client side, particularly if the client message handling is efficient. Likewise, because the AMPS server only has to serialize the common headers once per batch, a batch size larger than 1 can improve performance at the server side (as well as reduce the overall bandwidth for a group of messages). At the same time, the server will hold a batch of messages until the batch can be transmitted together (or until the query is complete), so providing large values for the batch size can introduce latency in receiving results, and reduce performance if the total size of the batch is very large.

In general, the default client value is a good compromise for many application patterns if the messages are larger than will fit into the MTU of the network. For smaller messages, or if it is important to tune performance, 60East recommends testing with a variety of batch sizes.

Tip

Using an appropriate BatchSize parameter is critical to achieve the maximum query performance with a large number of messages when many messages will fit into the MTU for your network. For larger messages, tune the batch size based on performance testing with a variety of batch sizes.

Caution

AMPS treats queries as a single, atomic operation. All results from a query are sent to a client before the results of any subsequent commands. Use care when issuing queries that return a result set large enough to take several seconds or more to transmit over the network.

When planning for large queries, please see the information on how AMPS handles a situation where messages are produced faster than the client or network can consume them. This is discussed in the section called Slow Clients for more information.

For more information on executing queries, please see the Developer Guide for the AMPS client of your choice.

Configuring SOW Query Result Sets¶

AMPS allows you to control the results returned by a SOW query by including the following options and header on the query:

Option / Header Result

top_n (option)

Limits the results returned to the number of messages specified.

When a skip_n option is also provided for a subscription, AMPS creates a paginated subscription. Otherwise, this option applies only to the SOW query part of a sow_and_subscribe or sow_and_delta_subscribe command.

skip_n (option) Skips the number of messages specified before returning results. A command that provides this option must also provide a top_n option.

OrderBy (header)

Orders the results returned as specified. Requires a comma-separated list of identifiers of the form:

/field [ASC | DESC]

For example, to sort in descending order by orderDate so that the most recent orders are first, and ascending order by customerName for orders with the same date, you might use a specifier such as:

/orderDate DESC, /customerName ASC

If no sort order is specified for an identifier, AMPS defaults to ascending order.

Table 8.1: SOW Query Options

For details on how to submit these options with a SOW query, see the documentation for the AMPS client library your application uses.

When replacing a subscription that uses top_n, skip_n, or OrderBy, any of these options specified on the original command must be provided on the replacement command. In other words, sow_and_subscribe command that specifies top_n=10,skip_n=20 must provide both top_n and skip_n on a replacement command.

Paginated SOW and Subscribe¶

When top_n and skip_n are specified on a sow_and_subscribe command, AMPS creates a paginated subscription. (Both top_n and skip_n must be provided to create a paginated subscription.)

With a paginated subscription, AMPS maintains a list of the set of results for the SOW query, and delivers only results that fall between the first record after the skip_n number and within the number of records specified by the top_n number. This allows applications that only need a subset of the results returned by a filter to work with only those results. This is commonly used for interactive applications, where a user interface shows a small number of records at a time in the interface.

When the subscription specifies an OrderBy, that header specifies the order in which records are sorted within the paginated subscription. If no OrderBy is specified, the results are sorted by the SowKey generated by AMPS (effectively, an arbitrary but stable order).

From a subscriber point of view, paginated subscriptions behave as though only the messages in the pagination window are present in AMPS. For example, when out-of-focus notifications are enabled and a message in the topic is deleted, subscribers receive an oof notification only if the deleted message was in the pagination window. Likewise, if a message that was previously in the pagination window falls outside of the window due to an insert or delete, the message that is now outside of the window will be out of focus, and will generate an oof notification.

For example, consider the following topic in the SOW, where the topic uses the /id field as a key.

With a top_n of 2, a skip_n of 1, and an OrderBy of /id, the results for the subscription will include the records with id of 2 and id of 5.

Now a new message is published with an id of 4, as shown below:

Because the new message falls within the pagination window, the message is published to the subscriber. Because the message with the id of 5 is no longer within the pagination window, the subscriber will receive an oof message for the message with an id of 5 if the subscriber has requested out-of-focus notifications.

While a paginated subscription is active, AMPS maintains a list of the messages that match the subscription in memory (but does not, as of version 5.3.2, maintain the entire sorted result set in memory). For efficiency, when more than one subscription uses the same topic, these subscriptions will use the same result set in memory. The memory used counts as part of the configured MessageMemoryLimit. Each connection that uses the result set is counted as consuming a portion of the memory retained. For example, if 5 connections use the same result set, each of those connections is counted as using 1/5 of the memory for the result set.

In addition, each paginated subscription requires that AMPS maintain state for the window for that subscription: this memory is not shared, and is counted for that client.

Aggregated SOW Queries¶

AMPS provides the ability to aggregate the results of a SOW query. The results of an aggregated SOW query are the same as the results of querying a View with the same definition.

To request an aggregated SOW query, provide the grouping and projection options with the sow query.

Option Description

grouping=[keys]

For use with aggregated SOW queries.

The format of this option is a comma-delimited list of XPath identifiers within brackets. For example, to aggregate entries based on their /description (producing one record in the aggregation for each distinct value in /description), you would use the following option:

grouping=[/description]

When this option is provided, a projection must also be provided.

When the topic has History enabled, this option can be used with a bookmark to aggregate the historical state of the SOW.

projection=[fields]

For use with aggregated SOW queries.

Specifies a comma-delimited set of fields to project, within brackets. Each entry has the format described in the AMPS User Guide.

This option must contain an entry for every field in the aggregated message. If there is no entry for a field in this option, that field will not appear in the aggregated message, even if the field is in the underlying message.

There is no default for this option. When this option is provided, a grouping must also be provided.

When the topic has History enabled, this option can be used with a bookmark to aggregate the historical state of the SOW.

Table 8.2: Aggregated SOW Query Options

Table Of Contents

Related Topics

8. SOW Queries¶

SOW Queries¶

Historical SOW Queries¶

When To Use a Historical SOW Query¶

Configuring the Topic: Window and Granularity¶

Message Sequence Flow¶

SOW Query Response Batching¶

Configuring SOW Query Result Sets¶

Aggregated SOW Queries¶

8. SOW Queries¶

SOW Queries¶

Historical SOW Queries¶

When To Use a Historical SOW Query¶

Configuring the Topic: Window and Granularity¶

Message Sequence Flow¶

Pagination with Historical SOW Queries¶

SOW Query-and-Subscribe¶

Historical SOW Query and Subscribe¶

Conflated Subscriptions with SOW and Subscribe¶

Replacing Subscriptions with SOW and Subscribe¶

SOW Query Response Batching¶

Configuring SOW Query Result Sets¶

Paginated SOW and Subscribe¶

Aggregated SOW Queries¶