8. SOW Queries¶
When SOW topics are configured inside an AMPS instance, clients can issue SOW queries to AMPS to retrieve all of the messages matching a given topic and content filter. When a query is executed, AMPS will test each message in the SOW against the content filter specified and all messages matching the filter will be returned to the client. The topic can be a straight topic or a regular expression pattern.
SOW Queries¶
A client can issue a query by sending AMPS a sow
command and
specifying an AMPS topic. Optionally a filter can be used to further
refine the query results. AMPS also allows you to restrict the query to
a specific set of messages identified by a set of SowKeys. When AMPS
receives the sow
command request, it will validate the filter and
start executing the query. When returning a query result back to the
client, AMPS will package the sow
results into a sow
record
group by first sending a group_begin
message followed by the
matching SOW records, if any, and finally indicating that all records
have been sent by terminating with a group_end
message. AMPS returns
the results for a SOW query in a single, atomic operation. Any messages
for the client that arrive during the SOW query are delivered after the
SOW results.
The message flow for a SOW query is provided as a sequence diagram in Figure 8.1.
For purposes of correlating a query request to its result, each query
command can specify a QueryId
. The QueryId
specified will be
returned as part of the response that is delivered back to the client.
The group_begin
and group_end
messages will have the QueryId
attribute set to the value provided by the client. The client specified
QueryId
is what the client can use to correlate query commands and
responses coming from the AMPS engine.
AMPS does not allow a sow
command on topics that do not have a SOW
enabled. If a client queries a topic that does not have a SOW enabled,
AMPS returns an error.
Caution
The ordering of records returned by a SOW query is undefined by default. You can also include an OrderBy parameter on the query to specify a particular ordering based on the contents of the messages.
Figure 8.1: SOW Query Sequence Diagram
Historical SOW Queries¶
Topics in the State of the World can also be configured to include historical snapshots of messages, which allows subscribers to retrieve the contents of the topic at a particular point in time.
As with simple queries, a client can issue a query by sending AMPS a
sow
command and specifying an AMPS topic. For a historical query,
the client also adds a timestamp that includes the point in time for the
query in the Bookmark
header of the command. A filter can be used to
further refine the query results based on the message content.
When To Use a Historical SOW Query¶
Use a historical SOW query when it is important to get a snapshot of the state of messages in a topic as they existed at a specific point in time (that is, if it is important for an application to be able to query the state of the world at a point in time).
If an application needs to replay the exact sequence of messages delivered to a topic, but does not need to be able to query the values that were current at a specific point in time, record the topic in the transaction log and replay from the transaction log.
If an application needs to both retrieve a snapshot of the values that were current at a specific point in time and replay the exact sequence of messages from that point forward, use a historical SOW query and record the topic in the transaction log.
Configuring the Topic: Window and Granularity¶
By default, AMPS does not maintain history for a topic in the State of
the World. To enable history (and historical query) for the topic,
add the History
element to the Topic
configuration. This element
configures how much information AMPS stores for enabling historical
queries.
There are two options that control how AMPS stores data for historical queries:
- The
Window
option sets the amount of time that AMPS will retain historical versions of messages. AMPS will remove the historical state of the message from the SOW topic once that historical state is older than the specified window. (If the message has been deleted, and the delete command is older than the specified window, AMPS may remove the message from the SOW topic entirely). AMPS always retains the most current state of a message, even if that state was published earlier than the specifiedWindow
. - The
Granularity
option sets the interval at which AMPS retains a historical copy of a message in the SOW. For example, if theGranularity
is set to10m
, AMPS stores a historical copy of the message no more frequently than every 10 minutes, regardless of how many times the message is updated in that 10 minute interval. AMPS stores the copies when a new message arrives to update the SOW. This means that AMPS always returns a valid SOW state that reflects a published message, but – as with a conflated topic – the SOW may not reflect all of the states that a message passes through. This also means that AMPS uses SOW space efficiently. If no updates have arrived for a message, since the last time a historical message was saved, AMPS has no need to save another copy of the message.
When a message is deleted from a topic that maintains history, AMPS
saves the fact that the message has been deleted, and queries as of
that point in time will not return the message. However, previously
saved states of the message within the Window
are still present,
and can still be queried.
Likewise, if an application queries at a point in time earlier than the
Window
, AMPS will return an empty result set (even if messages had
actually been present in the topic at that point), since the SOW state is
only retained for the period in the Window
.
Tip
The Granularity
for a topic is always specified as a duration. If
your application requires that a query be able to return the exact
state of the SOW exactly as AMPS would have represented it at that time
(with no tolerance for the granularity), you can specify that AMPS
keep every message during the Window
by specifying a Granularity
of 0s
. Notice that this is not required to replay every message
after a point-in-time query, since replay is delivered from the
transaction log rather than the stored State of the World.
When a historical SOW and Subscribe query is entered, and the topic is covered by a transaction log, AMPS returns the state of the SOW adjusted to the next oldest granularity, then replays messages from that point. In other words, AMPS returns the same results as a historical SOW query, then replays the full sequence of messages from that point forward.
The transaction log and the SOW topic are maintained separately, and have
separate views of history. When a version of the message is removed from
the SOW topic (because it is older than the specified Window
), the
message remains in the transaction log, but will not be returned by a SOW
query.
Tip
The length of time that messages remain in the topic
is specified by the Window
. A SOW topic that
retains history does not support sow expiration.
If it is necessary to delete messages after they have been active for a certain period of time in a topic that maintains history, use an explicit delete from an application or use a scheduled action. Configuring action is described in 24. Automating AMPS With Actions. An installation of AMPS would typically set a schedule as described in the section Running an Action on a Schedule and configure that action to delete messages as described in the section Deleting Messages from SOW.
Message Sequence Flow¶
The message sequence flow is the same as for a simple SOW query. Once AMPS has transmitted the messages that were in the SOW as of the timestamp of the query, the query ends. Notice that the query will include messages that have been subsequently deleted from the SOW, but which were the current state of the message as of that timestamp.
Pagination with Historical SOW Queries¶
Topics that maintain History
in the SOW support paginated queries from a
point in time. When the topic is also covered by the transaction log, the
sow_and_subscribe
command also supports paginated subscriptions from
a point in time. See Paginated SOW and Subscribe for details.
SOW Query-and-Subscribe¶
AMPS has a special command that will execute a query and place a subscription at the same time to prevent a gap between the query and subscription where messages can be lost. Without a command like this, it is difficult to reproduce the SOW state locally on a client without creating complex code to reconcile incoming messages and state.
For an example, this command is useful for recreating part of the SOW in a local cache and keeping it up to date. Without a special command to place the query and subscription at the same moment, a client is left with two options:
- Issue the query request, process the query results, and then place the subscription, which misses any records published between the time when the query and subscription were placed; or
- Place the subscription and then issue the query request, which could send messages placed between the subscription and query twice.
Instead of requiring every program to work around these options, the
AMPS sow_and_subscribe
command allows clients to place a query and
get the streaming updates to matching messages in a single command.
In a sow_and_subscribe
command, AMPS behaves as if the SOW command
and subscription are placed at the exact same moment.The SOW query will
be sent before any messages from the subscription are sent to the
client. Additionally, any new publishes that come into AMPS that match
the sow_and_subscribe
filtering criteria and come in after the query
started will be sent after the query finishes (and the query will not
include those messages.) As with a simple SOW query, any other messages
that arrive for the client while the SOW query is running will also be
delivered after the query results.
AMPS allows a sow_and_subscribe
command on topics that do not have a
SOW enabled. In this case, AMPS simply returns no messages between
group_begin
and group_end
.
The message flow as a sequence diagram for sow_and_subscribe
commands is contained in
Figure 8.2.
Figure 8.2: SOW-And-Subscribe Query Sequence Diagram
Historical SOW Query and Subscribe¶
For Topics that have History
configured,
AMPS SOW Query and Subscribe also allows you to begin the subscription
with a historical SOW query. For historical SOW queries, the
subscription begins at the point of the query with the results of the
SOW query. The subscription then replays messages from the transaction
log. Once messages from the transaction log have been replayed, the
subscription then provides messages as AMPS publishes them.
In effect, a SOW Query and Subscribe with a historical query allows you to recreate the client state and processing as though the client had issued a SOW Query and Subscribe at the point in time of the historical query.
A historical SOW and subscribe requires that the SOW topic is recorded in the transaction log and that history is enabled on the SOW. If history is not enabled for the topic, a SOW and subscribe command returns the current state of the SOW and the subscription begins atomically at the point in time when AMPS processes the command.
Conflated Subscriptions with SOW and Subscribe¶
A sow_and_subscribe
command can include options for server side
conflation (as described in Chapter 3 Conflated Subscriptions),
just as a regular subscription can. When the command requests conflation, the
results of the SOW query are not conflated, and the conflation interval
and key apply to the subscription.
Replacing Subscriptions with SOW and Subscribe¶
As described in Chapter 3 Replacing Subscriptions,
AMPS allows you to replace an existing subscription. When the subscription was entered with
the sow_and_subscribe
command, AMPS will re-run the SOW query
delivering the messages that are in scope with the new filter but which
were not previously delivered. If the subscription requests out-of-focus
(OOF) messages, AMPS will deliver out of focus messages for messages
that matched the previous filter but do not match the new filter. As
with the initial query and subscribe, AMPS guarantees to deliver any
changes to the SOW that match the filter and occur after the point of
the query.
Figure 8.3: SOW And Subscribe Replace Sequence Diagram
SOW Query Response Batching¶
When processing a SOW query, AMPS has the ability to combine messages
into batches for more efficient network usage. The maximum number of
messages in a batch is determined by the BatchSize
parameter on the
SOW query command. AMPS defaults to a BatchSize
value of 1, meaning
AMPS sends one message per batch in the response. The BatchSize
is
the maximum number of records that will be returned within a single
response payload. Each AMPS response for the query contains a
BatchSize
value in its header to indicate the number of messages in
the batch. This number will be anywhere from 1 to BatchSize
.
Current versions of the AMPS client libraries set a batch size of 10
when using the named convenience methods (for example,
sowAndSubscribe
) if no other batch size is specified.
Notice that the format of messages returned from AMPS may be different depending on the message type requested. However, the information contained in the messages is the same for all message types.
Tip
When issuing a sow_and_subscribe
command AMPS will return a
group_begin
and group_end
segment of messages before
beginning the live subscription sequence of the query. This is
also true when a sow_and_subscribe
command is issued against
a non-SOW topic. When the topic is not in the State of the World,
no messages will be delivered between the group_begin
and
group_end
messages.
Using a BatchSize
greater than 1 can yield greater performance,
particularly when querying a large number of small records. In general,
60East recommends using a BatchSize
that provides good network
utilization without consuming excessive server memory. Most applications
that use small messages set a batch size designed to create batches that
fit well into the maximum transmission unit (MTU) for the network.
AMPS reports an error if an application requests a batch size larger
than 10,000 records (this value is orders of magnitude larger than the
typical BatchSize
used by applications).
For applications that return a large number of messages that are larger than the MTU, 60East recommends testing performance with a variety of batch sizes. Because the client libraries parse the AMPS headers common to each message once per batch, a batch size larger than 1 can improve processing performance on the client side, particularly if the client message handling is efficient. Likewise, because the AMPS server only has to serialize the common headers once per batch, a batch size larger than 1 can improve performance at the server side (as well as reduce the overall bandwidth for a group of messages). At the same time, the server will hold a batch of messages until the batch can be transmitted together (or until the query is complete), so providing large values for the batch size can introduce latency in receiving results, and reduce performance if the total size of the batch is very large.
In general, the default client value is a good compromise for many application patterns if the messages are larger than will fit into the MTU of the network. For smaller messages, or if it is important to tune performance, 60East recommends testing with a variety of batch sizes.
Tip
Using an appropriate BatchSize
parameter is critical to
achieve the maximum query performance with a large number of
messages when many messages will fit into the MTU for your
network. For larger messages, tune the batch size based on
performance testing with a variety of batch sizes.
Caution
AMPS treats queries as a single, atomic operation. All results from a query are sent to a client before the results of any subsequent commands. Use care when issuing queries that return a result set large enough to take several seconds or more to transmit over the network.
When planning for large queries, please see the information on how AMPS handles a situation where messages are produced faster than the client or network can consume them. This is discussed in the section called Slow Clients for more information.
For more information on executing queries, please see the Developer Guide for the AMPS client of your choice.
Configuring SOW Query Result Sets¶
AMPS allows you to control the results returned by a SOW query by including the following options and header on the query:
Option / Header | Result |
---|---|
top_n (option) | Limits the results returned to the number of messages specified. When a |
skip_n (option) | Skips the number of messages specified before returning results.
A command that provides this option must also provide a top_n
option. |
OrderBy (header) | Orders the results returned as specified. Requires a comma-separated list of identifiers of the form: /field [ASC | DESC]
For example, to sort in descending order by /orderDate DESC, /customerName ASC
If no sort order is specified for an identifier, AMPS defaults to ascending order. |
Table 8.1: SOW Query Options
For details on how to submit these options with a SOW query, see the documentation for the AMPS client library your application uses.
When replacing a subscription that uses top_n
, skip_n
, or
OrderBy
, any of these options specified on the original command
must be provided on the replacement command. In other words,
sow_and_subscribe
command that specifies top_n=10,skip_n=20
must provide both top_n
and skip_n
on a replacement command.
Paginated SOW and Subscribe¶
When top_n
and skip_n
are specified on a sow_and_subscribe
command, AMPS creates a paginated subscription. (Both top_n
and
skip_n
must be provided to create a paginated subscription.)
With a paginated subscription, AMPS maintains a list of the set of results for the SOW query, and delivers only results that fall between the first record after the skip_n number and within the number of records specified by the top_n number. This allows applications that only need a subset of the results returned by a filter to work with only those results. This is commonly used for interactive applications, where a user interface shows a small number of records at a time in the interface.
When the subscription specifies an OrderBy
, that header specifies the
order in which records are sorted within the paginated subscription.
If no OrderBy
is specified, the results are sorted by the SowKey
generated by AMPS (effectively, an arbitrary but stable order).
From a subscriber point of view, paginated subscriptions behave as
though only the messages in the pagination window are present in
AMPS. For example, when out-of-focus notifications are enabled
and a message in the topic is deleted, subscribers receive an
oof
notification only if the deleted message was in the
pagination window. Likewise, if a message that was previously in
the pagination window falls outside of the window due to an insert
or delete, the message that is now outside of the window will be
out of focus, and will generate an oof
notification.
For example, consider the following topic in the SOW, where
the topic uses the /id
field as a key.
With a top_n
of 2
, a skip_n
of 1
, and an
OrderBy
of /id
, the results for the subscription will include
the records with id
of 2
and id
of 5
.
Now a new message is published with an id
of 4
, as shown
below:
Because the new message falls within the pagination window, the
message is published to the subscriber. Because the message with
the id
of 5
is no longer within the pagination window,
the subscriber will receive an oof
message for the message
with an id
of 5
if the subscriber has requested
out-of-focus notifications.
While a paginated subscription is active, AMPS maintains a list of
the messages that match the subscription in memory (but does not,
as of version 5.3.2, maintain the entire sorted result set in memory).
For efficiency, when more than one subscription uses the same topic,
these subscriptions will use the same result set in memory. The memory
used counts as part of the configured MessageMemoryLimit
. Each
connection that uses the result set is counted as consuming a
portion of the memory retained. For example, if 5 connections use
the same result set, each of those connections is counted as using 1/5 of the
memory for the result set.
In addition, each paginated subscription requires that AMPS maintain state for the window for that subscription: this memory is not shared, and is counted for that client.
Aggregated SOW Queries¶
AMPS provides the ability to aggregate the results of a SOW query. The results of an aggregated
SOW query are the same as the results of querying a View
with the same definition.
To request an aggregated SOW query, provide the grouping
and projection
options
with the sow
query.
Option | Description |
---|---|
grouping=[keys] |
For use with aggregated SOW queries. The format of this option is a comma-delimited
list of XPath identifiers within brackets. For
example, to aggregate entries based on their
grouping=[/description]
When this option is provided, a When the topic has |
projection=[fields] |
For use with aggregated SOW queries. Specifies a comma-delimited set of fields to project, within brackets. Each entry has the format described in the AMPS User Guide. This option must contain an entry for every field in the aggregated message. If there is no entry for a field in this option, that field will not appear in the aggregated message, even if the field is in the underlying message. There is no default for this option. When this
option is provided, a When the topic has |
Table 8.2: Aggregated SOW Query Options