8. SOW Queries

When SOW topics are configured inside an AMPS instance, clients can issue SOW queries to AMPS to retrieve all of the messages matching a given topic and content filter. When a query is executed, AMPS will test each message in the SOW against the content filter specified and all messages matching the filter will be returned to the client. The topic can be a straight topic or a regular expression pattern.

SOW Queries

A client can issue a query by sending AMPS a sow command and specifying an AMPS topic. Optionally a filter can be used to further refine the query results. AMPS also allows you to restrict the query to a specific set of messages identified by a set of SowKeys. When AMPS receives the sow command request, it will validate the filter and start executing the query. When returning a query result back to the client, AMPS will package the sow results into a sow record group by first sending a group_begin message followed by the matching SOW records, if any, and finally indicating that all records have been sent by terminating with a group_end message. AMPS returns the results for a SOW query in a single, atomic operation. Any messages for the client that arrive during the SOW query are delivered after the SOW results.

The message flow for a SOW query is provided as a sequence diagram in Figure 8.1.

For purposes of correlating a query request to its result, each query command can specify a QueryId. The QueryId specified will be returned as part of the response that is delivered back to the client. The group_begin and group_end messages will have the QueryId attribute set to the value provided by the client. The client specified QueryId is what the client can use to correlate query commands and responses coming from the AMPS engine.

AMPS does not allow a sow command on topics that do not have a SOW enabled. If a client queries a topic that does not have a SOW enabled, AMPS returns an error.

caution The ordering of records returned by a SOW query is undefined by default. You can also include an OrderBy parameter on the query to specify a particular ordering based on the contents of the messages.
SOW Query Sequence Diagram

Figure 8.1: SOW Query Sequence Diagram

Historical SOW Queries

SOW topics can also be configured to include historical snapshots of messages, which allows subscribers to retrieve the contents of the SOW that reflect a particular point in time.

As with simple queries, a client can issue a query by sending AMPS a sow command and specifying an AMPS topic. For a historical query, the client also adds a timestamp that includes the point in time for the query. A filter can be used to further refine the query results based on the message content.

Window and Granularity

AMPS allows you to control the amount of storage to devote to historical SOW queries through the Window and Granularity configuration options.

The Window option sets the amount of time that AMPS will retain historical copies of messages. After the amount of time set by the Window, AMPS may discard copies of the messages.

The Granularity option sets the interval at which AMPS retains a historical copy of a message in the SOW. For example, if the Granularity is set to 10m, AMPS stores a historical copy of the message no more frequently than every 10 minutes, regardless of how many times the message is updated in that 10 minute interval. AMPS stores the copies when a new message arrives to update the SOW. This means that AMPS always returns a valid SOW state that reflects a published message, but – as with a conflated topic – the SOW may not reflect all of the states that a message passes through. This also means that AMPS uses SOW space efficiently. If no updates have arrived for a message, since the last time a historical message was saved, AMPS has no need to save another copy of the message.

When a historical SOW and Subscribe query is entered, and the topic is covered by a transaction log, AMPS returns the state of the SOW adjusted to the next oldest granularity, then replays messages from that point. In other words, AMPS returns the same results as a historical SOW query, then replays the full sequence of messages from that point forward.

Message Sequence Flow

The message sequence flow is the same as for a simple SOW query. Once AMPS has transmitted the messages that were in the SOW as of the timestamp of the query, the query ends. Notice that this replay includes messages that have been subsequently deleted from the SOW.

Pagination with Historical SOW Queries

Topics that maintain History in the SOW support paginated queries from a point in time. When the topic is also covered by the transaction log, the sow_and_subscribe command also supports paginated subscriptions from a point in time. See Paginated SOW and Subscribe for details.

SOW Query-and-Subscribe

AMPS has a special command that will execute a query and place a subscription at the same time to prevent a gap between the query and subscription where messages can be lost. Without a command like this, it is difficult to reproduce the SOW state locally on a client without creating complex code to reconcile incoming messages and state.

For an example, this command is useful for recreating part of the SOW in a local cache and keeping it up to date. Without a special command to place the query and subscription at the same moment, a client is left with two options:

  1. Issue the query request, process the query results, and then place the subscription, which misses any records published between the time when the query and subscription were placed; or
  2. Place the subscription and then issue the query request, which could send messages placed between the subscription and query twice.

Instead of requiring every program to work around these options, the AMPS sow_and_subscribe command allows clients to place a query and get the streaming updates to matching messages in a single command.

In a sow_and_subscribe command, AMPS behaves as if the SOW command and subscription are placed at the exact same moment.The SOW query will be sent before any messages from the subscription are sent to the client. Additionally, any new publishes that come into AMPS that match the sow_and_subscribe filtering criteria and come in after the query started will be sent after the query finishes (and the query will not include those messages.) As with a simple SOW query, any other messages that arrive for the client while the SOW query is running will also be delivered after the query results.

AMPS allows a sow_and_subscribe command on topics that do not have a SOW enabled. In this case, AMPS simply returns no messages between group_begin and group_end.

The message flow as a sequence diagram for sow_and_subscribe commands is contained in Figure 8.2.

SOW-And-Subscribe Query Sequence Diagram

Figure 8.2: SOW-And-Subscribe Query Sequence Diagram

Historical SOW Query and Subscribe

AMPS SOW Query and Subscribe also allows you to begin the subscription with a historical SOW query. For historical SOW queries, the subscription begins at the point of the query with the results of the SOW query. The subscription then replays messages from the transaction log. Once messages from the transaction log have been replayed, the subscription then provides messages as AMPS publishes them.

In effect, a SOW Query and Subscribe with a historical query allows you to recreate the client state and processing as though the client had issued a SOW Query and Subscribe at the point in time of the historical query.

A historical SOW and subscribe requires that the SOW topic is recorded in the transaction log and that history is enabled on the SOW. If history is not enabled for the topic, a SOW and subscribe command returns the current state of the SOW and the subscription begins atomically at the point in time when AMPS processes the command.

Conflated Subscriptions with SOW and Subscribe

A sow_and_subscribe command can include options for server side conflation (as described in Chapter 3 Conflated Subscriptions), just as a regular subscription can. When the command requests conflation, the results of the SOW query are not conflated, and the conflation interval and key apply to the subscription.

Replacing Subscriptions with SOW and Subscribe

As described in Chapter 3 Replacing Subscriptions, AMPS allows you to replace an existing subscription. When the subscription was entered with the sow_and_subscribe command, AMPS will re-run the SOW query delivering the messages that are in scope with the new filter but which were not previously delivered. If the subscription requests out-of-focus (OOF) messages, AMPS will deliver out of focus messages for messages that matched the previous filter but do not match the new filter. As with the initial query and subscribe, AMPS guarantees to deliver any changes to the SOW that match the filter and occur after the point of the query.

SOW And Subscribe Replace Sequence Diagram

Figure 8.3: SOW And Subscribe Replace Sequence Diagram

SOW Query Response Batching

When processing a SOW query, AMPS has the ability to combine messages into batches for more efficient network usage. The maximum number of messages in a batch is determined by the BatchSize parameter on the SOW query command. AMPS defaults to a BatchSize value of 1, meaning AMPS sends one message per batch in the response. The BatchSize is the maximum number of records that will be returned within a single response payload. Each AMPS response for the query contains a BatchSize value in its header to indicate the number of messages in the batch. This number will be anywhere from 1 to BatchSize.

Current versions of the AMPS client libraries set a batch size of 10 when using the named convenience methods (for example, sowAndSubscribe) if no other batch size is specified.

Notice that the format of messages returned from AMPS may be different depending on the message type requested. However, the information contained in the messages is the same for all message types.

tip When issuing a sow_and_subscribe command AMPS will return a group_begin and group_end segment of messages before beginning the live subscription sequence of the query. This is also true when a sow_and_subscribe command is issued against a non-SOW topic. In this later case, the group_begin and group_end will contain no messages.

Using a BatchSize greater than 1 can yield greater performance, particularly when querying a large number of small records. In general, 60East recommends using a BatchSize that provides good network utilization without consuming excessive server memory. Most applications use a batch size designed to create batches that fit well into the maximum transmission unit (MTU) for the network. AMPS reports an error if an application requests a batch size larger than 10,000 records (this value is orders of magnitude larger than the typical BatchSize used by applications).

For applications where the average message size is close to, or larger than, the MTU for the network, 60East recommends using a smaller BatchSize. For messages that are many times the MTU, 60East recommends a BatchSize of 1.

tip Using an appropriate BatchSize parameter is critical to achieve the maximum query performance with a large number of messages when many messages will fit into the MTU for your network. For larger messages, reducing the batchsize below the default that the AMPS clients specify may produce better performance.

caution Care should be taken when issuing queries that return large results. When contemplating the usage of large queries and how that impacts system reliability and performance, please see the section called Slow Clients for more information.

For more information on executing queries, please see the Developer Guide for the AMPS client of your choice.

Configuring SOW Query Result Sets

AMPS allows you to control the results returned by a SOW query by including the following options and header on the query:

Option / Header Result
top_n (option) Limits the results returned to the number of messages specified.
skip_n (option) Skips the number of messages specified before returning results. A command that provides this option must also provide a top_n option.
OrderBy (header)

Orders the results returned as specified. Requires a comma-separated list of identifiers of the form:

/field [ASC | DESC]

For example, to sort in descending order by orderDate so that the most recent orders are first, and ascending order by customerName for orders with the same date, you might use a specifier such as:

/orderDate DESC, /customerName ASC

If no sort order is specified for an identifer, AMPS defaults to ascending order.

Table 8.1: SOW Query Options

For details on how to submit these options with a SOW query, see the documentation for the AMPS client library your application uses.

When replacing a subscription that uses top_n, skip_n, or OrderBy, any of these options specified on the original command must be provided on the replacement command. In other words, sow_and_subscribe command that specifies top_n=10,skip_n=20 must provide both top_n and skip_n on a replacement command.

Paginated SOW and Subscribe

When top_n and skip_n are specified on a sow_and_subscribe command, AMPS creates a paginated subscription.

With a paginated subscription, AMPS maintains the set of results for the SOW query, and delivers only results that fall between the first record after the skip_n number and within the number of records specified by the top_n number. This allows applications that only need a subset of the results returned by a filter to work with only those results. This is commonly used for interactive applications, where a user interface shows a small number of records at a time in the interface.

When the subscription specifies an OrderBy, that header specifies the order in which records are sorted within the paginated subscription. If no OrderBy is specified, the results are sorted by the SowKey generated by AMPS (effectively, an arbitrary but stable order).

From a subscriber point of view, paginated subscriptions behave as though only the messages in the pagination window are present in AMPS. For example, when out-of-focus notifications are enabled and a message in the topic is deleted, subscribers receive an oof notification only if the deleted message was in the pagination window. Likewise, if a message that was previously in the pagination window falls outside of the window due to an insert or delete, the message that is now outside of the window will be out of focus, and will generate an oof notification.

For example, consider the following topic in the SOW, where the topic uses the /id field as a key.

../_images/paginated_subs_sub.png

With a top_n of 2, a skip_n of 1, and an OrderBy of /id, the results for the subscription will include the records with id of 2 and id of 5.

Now a new message is published with an id of 4, as shown below:

../_images/paginated_subs_pub.png

Because the new message falls within the pagination window, the message is published to the subscriber. Because the message with the id of 5 is no longer within the pagination window, the subscriber will receive an oof message for the message with an id of 5 if the subscriber has requested out-of-focus notifications.

While a paginated subscription is active, AMPS maintains the entire, sorted result set for that subscription in memory. For efficiency, when more than one subscription uses the same topic these subscriptions will use the same result set in memory. The memory used counts as part of the configured MessageMemoryLimit. Each connection that uses the result set is counted as consuming a portion of the memory retained. For example, if 5 connections use the same result set, each of those connections is counted as using 1/5 of the memory for the result set.

In addition, each paginated subscription requires that AMPS maintain state for the window for that subscription: this memory is not shared, and is counted for that client.

Aggregated SOW Queries

AMPS provides the ability to aggregate the results of a SOW query. The results of an aggregated SOW query are the same as the results of querying a View with the same definition.

To request an aggregated SOW query, provide the grouping and projection options with the sow query.

Option Description
grouping=[keys]

For use with aggregated SOW queries.

The format of this option is a comma-delimited list of XPath identifiers within brackets. For example, to aggregate entries based on their /description (producing one record in the aggregation for each distinct value in /description), you would use the following option:

grouping=[/description]

When this option is provided, a projection must also be provided.

When the topic has History enabled, this option can be used with a bookmark to aggregate the historical state of the SOW.

projection=[fields]

For use with aggregated SOW queries.

Specifies a comma-delimited set of fields to project, within brackets. Each entry has the format described in the AMPS User Guide.

This option must contain an entry for every field in the aggregated message. If there is no entry for a field in this option, that field will not appear in the aggregated message, even if the field is in the underlying message.

There is no default for this option. When this option is provided, a grouping must also be provided.

When the topic has History enabled, this option can be used with a bookmark to aggregate the historical state of the SOW.

Table 8.2: Aggregated SOW Query Options