14. Transactional Messaging and Bookmark Subscriptions¶
AMPS includes support for transactional messaging, which includes persistence, consistency across restarts, and message replay. AMPS message queues use the transaction log to hold the messages in the queue. Transactional messaging is also the basis for replication, a key component of the high-availability capability in AMPS (as described in 25. Replication and High Availability).
AMPS message queues use the transaction log as a persistent record of the messages that have entered the queue, the order of those messages, and which messages have been acknowledged and removed from the queue. All of these capabilities rely on the AMPS transaction log. The transaction log maintains a record of messages. You can choose which messages are included in the transaction log by specifying the message types and topics you want to record.
The AMPS transaction log differs from transaction logging in a conventional relational database system. Unlike transaction logs that are intended solely to maintain the consistency of data in the system, the AMPS transaction log is fully queryable through the AMPS client APIs. For applications that need access to historical information, or applications that need to be able to recover state in the event of a client restart, the transaction log allows you to do this, relying on AMPS as the definitive single version of the state of the application. There is no need for complex logic to handle reconciliation or state restoration in the client. AMPS handles the difficult parts of this process, and the transaction log guarantees consistency.
Topics covered by a transaction log are able to provide reliable messaging with strict consistency guarantees.
When a transaction log is enabled, topics covered by the transaction log provide atomic broadcast from that instance. This means that the instance enforces a repeatable ordering on the messages, and guarantees that all subscribers receive messages reliably, in a consistent order, and with no gaps or duplicates.
Enabling a transaction log in an instance also enables the following behaviors:
- When a transaction log is enabled, AMPS uses the client name provided by each client to manage the sequencing and integrity of the message stream, and to quickly detect and respond to reconnection. To allow this, each connection to AMPS must have a unique client name. If a duplicate client name is detected, one of the clients is assumed to be defunct and is disconnected.
- When a transation log is enabled, persisted acknowledgements to a publish are conflated, as described in Acknowledgment Conflation and Publish Acknowledgments.
Recording and Replaying Messages With Transaction Logs¶
AMPS includes the ability to record messages in a transaction log, and replay those messages at a later time. This capability is key for high availability, since it gives subscribers the ability to resume a subscription at a point in time without missing messages. This capability is also the foundation of replication, since it gives AMPS the ability to preserve message streams to be synchronized to an instance that has gone offline.
The transaction log in AMPS contains a sequential, historical record of
messages. Each message is identified by a bookmark
, a unique
identifier that AMPS uses to locate the message within the overall set
of recorded messages. The transaction log can record messages for a
topic, a set of topics, or for filtered content on one or more topics.
An application can request a subscription that replays messages from the transaction log. Subscriptions that replay from the transaction log are called bookmark subscriptions, since the subscription begins at a specific point in the transaction log identified by a specific bookmark. Bookmark subscriptions provide topic and content filtering in the same way that normal subscriptions do, and provide a set of unique capabilities (such as the ability to pause and resume the subscription) that are made possible because the subscription is provided from a persistent record of the message stream. Bookmark subscriptions are also key to high availability with AMPS. When a client is recovering from a restart or failure, this ability to replay allows a client to fill gaps in received messages and resume subscriptions without missing a message. This feature also allows new clients to receive an exact replay of a message stream. Replay from the transaction log is also useful for auditing, quality assurance, and backtesting.
The transaction log is used in AMPS replication to ensure that all servers in a replication group are continually synchronized should one of them experience an interruption in service. For example, say an AMPS instance, as a member of a replication group, goes down. When it comes back up, it can query another AMPS instance for all of the messages it did not receive, thereby catching up to a point of synchronization with the other instances. This feature, when coupled with AMPS replication, ensures that message subscriptions are always available and up-to-date.
The AMPS transaction log records messages that are received from a
publisher and events that affect those messages such as sow_delete
commands. AMPS does not record messages that are created through a view,
out-of-focus messages, or event status messages created by AMPS.
Understanding Message Persistence¶
To take advantage of transactional messaging, the publisher and the AMPS instance work together to ensure that messages are written to persistent storage. AMPS lets the publisher know when the message is persisted, so that the publisher knows that it no longer needs to track the message.
When a publisher publishes a message to AMPS, the publisher assigns each message a unique sequence number. Once the message has been written to persistent storage, AMPS uses the sequence number to acknowledge the message and let the publisher know that the message is persisted. Once AMPS has acknowledged the message, the publisher considers the message published. For safety, AMPS always writes a message to the local transaction log before acknowledging that the message is persisted. If the topic is configured for synchronous replication, all replication destinations have to persist the message before AMPS will acknowledge that the message is persisted.
For efficiency, AMPS may not acknowledge each individual message. Instead, AMPS acknowledges the most recent persisted message to indicate that all previous messages have also been persisted, as described in Acknowledgment Conflation and Publish Acknowledgments. Publishers that need transactional messaging do not wait for acknowledgment to publish more messages. Instead, publishers retain messages that haven’t been acknowledged, and republish messages that haven’t been acknowledged if failover occurs. The AMPS client libraries include this functionality for persistent messaging (see descriptions of the publish store in client library documentation). See the Guaranteed Publishing section of this guide for further details.
Configuring a Transaction Log¶
Before demonstrating the power of the transaction log, we will first show how to configure the transaction log in the AMPS configuration file.
<!-- All transaction log definitions are contained within the TransactionLog block.
The following global settings apply to all Topic blocks defined within the
TransactionLog: JournalDirectory, PreallocatedJournalFiles, and JournalSize. -->
<TransactionLog>
<!-- The JournalDirectory is the filesystem location where journal files and journal
index files will be stored. -->
<JournalDirectory>./amps/journal/</JournalDirectory>
<!-- The JournalArchiveDirectory is the filesystem location to which AMPS will
archive journals. Notice that AMPS does not archive files by default. You
configure an action to archive journal files -->
<JournalArchiveDirectory>/mnt/somedev0/amps/journal</JournalArchiveDirectory>
<!-- PreallocatedJournalFiles defines the number of journal files AMPS will create as
part of the server startup. Default: 2 Minimum: 1 -->
<PreallocatedJournalFiles>1</PreallocatedJournalFiles>
<!-- The JournalSize is the approximate size of the journal files
that AMPS will create. -->
<JournalSize>10MB</JournalSize>
<!-- When a Topic is specified, then all messages which match exactly the specified
topic or regular expression will be included in the transaction log. Otherwise,
AMPS initializes the transaction logging, but does not record any messages to
the transaction log. -->
<Topic>
<Name>orders</Name>
<MessageType>nvfix</MessageType>
</Topic>
</TransactionLog>
Replaying Messages with Bookmark Subscription¶
One of the most useful and powerful features in AMPS is bookmark subscription, which is enabled by the transaction log. With bookmark subscription, an application requests a subscription that starts at a specific point in the transaction log. AMPS begins the subscription at the specified point, and provides messages from the transaction log.
Each message in the transaction log has a bookmark. A bookmark is an
opaque, unique identifier that is added by AMPS to each message recorded
in the transaction log. For messages provided from a transaction log,
the field is included in the Bookmark
header of the message. AMPS
guarantees that bookmarks for the instance are monotonically increasing,
which enables AMPS to rapidly find an individual bookmark within the
transaction log.
A bookmark subscription simply requests that AMPS begin the subscription with the first message following the bookmark provided with the subscription. AMPS locates the bookmark in the transaction log, and begins the subscription at that point in time.
One way to think about a bookmark subscription is that AMPS publishes to the subscribing client only those messages that:
- have bookmarks after the provided bookmark,
- match the subscription’s
Topic
andFilter
, and - have been written to the transaction log
AMPS provides these messages in the order in which they were recorded to the transaction log. Because a bookmark subscription requires a transaction log, when a client requests a bookmark subscription for a topic that is not being recorded in the transaction log, AMPS returns an error.
If the subscription requests a completed
acknowledgment, that
acknowledgment will be delivered to the subscription once replay has completed.
Messages delivered to the subscription after the acknowledgment is delivered
are from new publishes.
AMPS allows an application to submit a comma-delimited list of bookmark values as the bookmark for a subscription request. In this case, AMPS begins replay at the oldest bookmark in the list. The client controls the bookmark provided on the subscription request. For a bookmark subscription, the AMPS server does not keep a persistent record of which bookmarks a specific client or subscription has processed. The AMPS client libraries provide a facilities for easily tracking the messages which an application has processed so the application can resume at the appropriate point in the transaction log.
Requesting replay from the transaction log is how AMPS applications manage resumable subscriptions. The application keeps track of which messages have been processed, and requests replay from the appropriate point in the transction log to resume the message stream. This record-keeping is built in to the AMPS client libraries, and most often handled transparently when a bookmark store is set for the client. Notice that this means that the AMPS server itself does not track the progress of individual subscriptions, nor does the application need to inform the server of how far the subscription has progressed. The application state needed to resume the subscription is entirely handled on the application side, with no involvement by the AMPS server. (For details on how specific client libraries manage the application state, see the Developer Guide for that client library.)
Bookmark subscriptions are provided from the transaction log rather than the live publish stream. This lets AMPS adapt the pace of replay to the pace at which the subscriber is consuming replayed messages without triggering slow client offlining. |
There are four different ways that a client can request a bookmark replay from the transaction log. Each of these bookmark types meets a different need and enables a different recovery strategy that an application can use. The sections below describe the recovery types, the cases in which they can be used, and how the 60East clients implement them.
While there are similarities between a bookmark subscription used for replay and a SOW query, the transaction log and SOW are independent features that can be used separately. The SOW gives a snapshot of the current view of the latest data, while the journal is capable of playback of previous messages. Historical SOW queries provide a snapshot of the SOW at a defined point in the past, and are provided by the SOW database rather than the transaction log. |
Recovery With an Epoch¶
The epoch bookmark, when requested on a subscription, will replay the
transaction log back to the subscribing client from the very beginning.
Once the transaction log has been replayed in its entirety, then the
subscriber will begin receiving messages on the live incoming stream of
messages. A subscriber does this by requesting a 0
in the
bookmark
header field of their subscription. The AMPS clients
provide a constant for epoch, typically represented as EPOCH
.
This type of bookmark can be used in a case where the subscriber has begun after the start of an event, and needs to catch up on all of the messages that have been published to the topic.
To ensure that no messages from the subscription are lost during the replay, AMPS replays messages from the transaction log until the client reaches the last message in the transaction log. Once all of the existing messages in the transaction log have been sent to the client, AMPS will cut over to the live subscription stream and provide messages to the client as soon as they are persisted.
Bookmark Replay From NOW¶
The NOW bookmark, when requested on a subscription, declines to replay
any messages from the transaction log, and instead begins streaming
messages from the live stream - returning any messages that would be
published to the transaction log that match the subscription’s Topic
and Filter
.
This type of bookmark is used when a client is concerned with messages
that will be published to the transaction log, but is unconcerned with
replaying the historical messages in the transaction log. This strategy
is often used for applications that want to ensure that they do not miss
messages, even if the application temporarily loses connectivity, but
are not concerned with older messages. For this case, the application
subscribes with NOW when the application starts, and then re-establishes
the subscription with the most recently-processed bookmark if
connectivity is lost. This resubscription behavior is typically handled
by the client reconnection logic (as in the 60East HAClient
implementations).
The NOW bookmark is performed using a subscribe query with “0|1|” as
the bookmark
field. The AMPS clients provide a constant for this
value, typically represented as NOW
.
Bookmark Replay With a Bookmark¶
Clients that store the bookmarks from published messages can use those bookmarks to recover from an interruption in service. By placing a subscribe query with the last bookmark recorded, a client will get a replay of all messages persisted to the transaction log after that bookmark. Once the replay has completed, the subscription will then cut over to the live stream of messages.
To perform a bookmark replay, the client places a bookmark subscription with the bookmark at which to start the subscription.
Developer Note: the MOST_RECENT value¶
The AMPS client libraries provide a special constant value that requests
that the library look up the appropriate recovery point in the bookmark
store and then provide that recovery point in the subscription
request. This special value is typically represented as
MOST_RECENT
, RECENT
, or recent
. When the application requests a
bookmark subscription with a bookmark of MOST_RECENT
, the client library
looks for the most recent bookmarks processed for that subscription, then
provides the appropriate bookmark or list of bookmarks when resubscribing.
This process helps to ensure that the subscription begins at the last
processed message, and the application receives the next unprocessed message
for the subscription. If there is no record of a subscription, the AMPS clients
will start with EPOCH
, so that the first time a subscription is
entered, the application gets the full record of available messages.
It’s important to remember that the AMPS server has no knowledge of the
MOST_RECENT
value. MOST_RECENT
itself is never sent to AMPS and never
appears in the AMPS log. MOST_RECENT
is simply a request to the AMPS
client library to look up the exact bookmark value to provide to AMPS. The
AMPS client libraries always translate a request for MOST_RECENT
into either a specific value (typically a list of bookmarks) or EPOCH
.
Bookmark Replay From a Moment in Time¶
The final type of bookmark supported is the ASCII-formatted timestamp. When using a timestamp as the bookmark value, the transaction log replays all messages that occurred after the timestamp, and then cuts over to the live subscription once the replay stream has been consumed.
This bookmark has the format of YYYYmmddTHHMMSS[Z]
where:
YYYY
is the four digit year.mm
is the two digit month.dd
is the two digit day.T
the character separator between the date and time.HH
the two digit hour.MM
the minutes of the time.SS
the two digit second.Z
is an optional timezone specifier. AMPS timestamps are always in UTC, regardless of whether the timezone is included. AMPS only accepts a literal value ofZ
for a timezone specifier.
For example, a timestamp for January 2nd, 2015, at 12:35:
20150102T123500Z
Content and Topic Filtering¶
As with all other subscriptions, bookmark subscriptions support content filtering.
Bookmark subscriptions provide only messages from topics that are recorded in the transaction log. In other words, when a bookmark subscription uses a topic regular expression, only messages from topics that are recorded in the transaction log are provided to the subscription. This ensures that a bookmark subscription provides a consistent, repeatable stream of messages. The topics provided to the subscription are the same during replay, when only messages recorded in the transaction log are available, and after replay completes, when every publish to AMPS is available. This also ensures that bookmark subscription that replays messages for a specific timeframe gets the same messages as bookmark subscribers that had active subscriptions during that timeframe.
Content filtering is covered in greater detail in Chapter 4 Amps Expressions.
Delivery Rate Control for Bookmark Subscriptions¶
AMPS allows subscribers to specify the maximum delivery rate for
messages delivered from a bookmark subscription. A subscriber specifies
the maximum rate at which AMPS should deliver messages to the
subscription. AMPS then limits the rate at which replay occurs so that
the overall rate does not exceed the specified maximum. bookmark
subscription rate controlRate control is not available for subscriptions
that use the live
option.
To request rate control, a subscriber provides the rate
option on
the subscription. A rate can be specified in either messages per second,
number of bytes delivered per second, or a multiple of the original
delivery rate. For example, the following subscription option limits
delivery to 1000 messages per second:
rate=1000
To limit delivery to 500KB per second, a subscriber would provide this option:
rate=500KB
To limit replay to double the speed at which messages were originally published, a subscriber would provide this option:
rate=2X
To limit delivery to half the speed at which messages were originally published, a subscriber would provide this option
rate=.5X
When using a rate
that is a factor of the original replay speed,
you may want AMPS to skip over long gaps. For example, you may want
to do load testing by replaying several days’ worth of operations
at a 5x
multiplier. In that case, however, your load test does
not need to be idle when there are gaps during which no messages are
produced (for example, outside of trading hours or during holidays). For
this situation, AMPS provides a rate_max_gap
option that sets the maximum
amount of time for a replay to wait to produce a message. For example,
with an option string like:
rate=5X,rate_max_gap=10s
AMPS will attempt to produce messages at 5 times the original publish rate. In the event that there is a gap between messages of more than 50 seconds in the original publish stream (that is, 10 seconds in the replay), AMPS will wait for 10 seconds and then “skip ahead” to the next message in the replay.
Pausing and Resuming Bookmark Subscriptions¶
Beginning in AMPS 5.0, AMPS offers the ability to pause a bookmark subscription. When a subscriber requests that AMPS pause the subscription, AMPS stops providing messages from the bookmark subscription, but does not remove the subscription. The subscriber can then resume the subscription, and AMPS will again begin providing messages from the subscription. While the subscription is paused, AMPS maintains a record of the current position in the transaction log, and begins replay from that point.
This feature can be useful for clients that need to temporarily stop processing messages while minimizing the buffer space consumed during the time that the client is not consuming messages. For example, a simulation that visualizes historical data might pause the bookmark subscription if the user pauses the visualization.
An application may create a
subscription in the paused state by including pause
as an option on
the initial subscribe
command. To pause an active subscription, a
subscriber sends a subscribe
command with the existing subscription
ID and the pause
option. To resume a subscription, a subscriber
sends a subscribe
command with the subscription ID (or a
comma-separated list of subscription IDs) and the resume
option. The
AMPS clients provide convenience constants for the pause
and
resume
options.
AMPS allows a given client to pause or resume multiple subscriptions at once.
When multiple bookmark subscriptions are resumed at the same time, AMPS
will attempt to combine replay for the subscriptions. When AMPS can
combine replay, AMPS will guarantee that messages across subscriptions
are delivered from the same replay, which can help to preserve order
across subscriptions. AMPS can combine subscriptions when they are
delivered to the same client connection, were paused at the same
bookmark, deliver at the same rate and are resumed with the same
command. This feature can be useful for synchronizing message delivery
across a number of subscriptions. When using pause
and resume
for this purpose, an application typically includes the pause
option
on a number of subscriptions when the subscriptions are created, and
then resumes the subscriptions when the application is ready to begin
the replay.
Pausing a subscription stops AMPS from sending messages to the client once the pause command is processed. However, any messages already on the network, or in a network buffer on the client or the server will be delivered to the client.
AMPS allows you to begin a subscription in the paused state by providing
the pause
option when creating the subscription.
AMPS removes a paused subscription if the subscriber disconnects: for restarting a subscription across subscriber restarts, use the basic bookmark subscription features as described above.
Conflation and Bookmark Subscriptions¶
AMPS supports subscription conflation for bookmark subscriptions, as described in Conflated Subscriptions.
Conflation for bookmark subscriptions works the same way that conflation
for regular subscriptions works. Messages from the replay are held by
AMPS for the conflation interval. If during that interval the replay
finds a message with the same conflation_key value, AMPS replaces the
held message with the message from the replay. At the end of the
conflation interval, AMPS provides the currently held message to the
subscriber. The conflation interval refers to the replay. In other
words, a conflation interval of 1s
conflates messages for 1 second,
regardless of whether the messages are provided from a replay or from
current publishes. If the messages are provided from the transaction
log, conflation occurs for 1 second of replay time, regardless of the
rate at which the messages were originally published.
When using conflation, the bookmark provided on a message that has been provided after conflation is the bookmark for the first conflated message during the interval rather than the message that AMPS delivers at the end of the conflation interval.
Requesting Message Timestamps¶
Messages that are replayed from the transaction log will not have the
timestamp field of the header populated by default. In order to request
timestamps provide the timestamp
option when creating the subscription.
Selecting Message Durability Options¶
AMPS supports two distinct options for specifying message durability. By default, messages are provided to a bookmark subscription when they are persisted to the local transaction log.
Once replay from the transaction log is finished, AMPS sends messages to subscribers as the messages are processed. By default, AMPS waits until a message is persisted to the local transaction log before sending the message to subscribers. Because each message delivered is persisted, this approach ensures that the sequence of messages is consistent for this instance across client and server restarts, and that messages that are received by a subscriber will be available after a restart.
AMPS provides options that a subscriber can use to change the point at which AMPS delivers messages once replay from the transaction log has finished.
Using the ‘fully_durable’ Option for Bookmark Subscriptions¶
With the fully_durable
option, once replay from the transaction log
is finished, AMPS sends a message to the subscriber only when the
message has been persisted in the local transaction log and all
synchronous downstream replication destinations have acknowledged the
message. This option is useful for applications where processing of a
message should not begin until more than one AMPS instance has persisted
the message.
This option will typically introduce more latency for incoming messages when those messages must be replicated. When this option is used and one or more of the synchronous downstream replication destinations that receives messages for this topic is offline, the instance will not deliver incoming messages until that destination comes back online or is downgraded to asynchronous replication.
Using the ‘live’ Option for Bookmark Subscriptions¶
In some cases, reducing latency may be more important than consistency.
To support these cases, AMPS provides a live
option on bookmark
subscriptions. For bookmark subscriptions that use the live
option,
once replay has finished, AMPS sends messages to subscribers before
the message has been persisted. This can provide a small reduction in
latency at the expense of increasing the risk of inconsistency upon
failover. For example, if a publisher does not republish a message after
failover, your application may receive a message that is not stored in
the transaction log and that other applications have not received.
The live option increases the risk of inconsistent data
between your program and AMPS in the event of a failover. 60East
recommends using this option only if the risk is acceptable and
if your application requires the small latency reduction this
option provides. |
Because the live
option does not wait for messages to be persisted,
subscriptions that use this option are subject to slow client offlining
after replay from the transaction log is complete.
The rate
, pause
, and resume
options are not supported with
the live
option.
Managing Journal Files¶
The design of the journal files for the transaction log are such that AMPS can archive, compress and remove these files while AMPS is running. AMPS actions provide integrated administration for journal files, as described in Chapter 23 Automating AMPS with Actions.
Archiving a file copies the file to an archival directory, typically located on higher-capacity but higher-latency storage. Compressing a file compresses the file in place. Archived and compressed journal files are still accessible to clients for replay and for AMPS to use in rebuilding any SOW files that are damaged or removed.
When defining a policy for archiving, compressing or removing files, keep in mind the amount of time for which clients will need to replay data. Once journal files have been deleted, the messages in those files are no longer available for clients to replay or for AMPS to use in recreating a SOW file. If journal files are removed, and a SOW file is retained, this means that the SOW may have data that is not in the transaction log.
While AMPS is running, the Likewise, the |
To determine how best to manage your journal files, consider your application’s access pattern to the recorded messages. Most applications have a period of time (often a day or a week) where historical data is in heavy use, and a period of time (often a week, or a month) where data is infrequently used. One common strategy is to create the journal files on high-throughput storage. The files are archived to slower, higher-capacity storage after a short period of time, compressed, and then to removed after a longer period of time. This strategy preserves space on high-throughput storage, while still allowing the journals to be used. For example, if your applications frequently replay data for the last day, occasionally replay data older than the last week, and never request data older than one month, a management strategy that meets these needs would be to archive files after one day, compress them after a week, and remove them after one month.
If you remove journal files when AMPS is shut down, keep in mind
that the removal of journal files must be sequential and can
not leave gaps in the remaining files. For example, say there
are three journal files, 001 , 002 and 003 . If only
002 is removed, then the next AMPS restart could potentially
overwrite the journal file 003 , causing an unrecoverable
problem. |
When using AMPS actions to manage journal files, AMPS ensures that all replays from a journal file are complete, all queue messages in that journal file have been delivered (and acknowledged, if required), and all messages from a journal file have been successfully replicated before removing the file.
Reference to File Types¶
AMPS creates the following kinds of files as a part of creating and managing the transaction log. Notice that this includes both files that contain messages (journal files) and a set of files created by AMPS to improve efficiency when the instance is restarting and recovering the state of the transaction log.
Extension | File Type | Description |
---|---|---|
.journal |
Journal file | These files contain the messages that comprise the transaction log. AMPS always writes new messages to an uncompressed journal file. |
.journal.gz |
Compressed journal file | These files contain messages that
comprise the transaction log. These files
have been compressed by AMPS as a result
of the amps-action-do-compress-journal
action. Other than being compressed, they
are treated identically to uncompressed
journal files. |
.index.gz |
Index file | These files are used during recovery to help AMPS quickly rebuild its references to the content of the transaction log without having to completely reprocess each file. Each index file contains index information for the corresponding journal file. These files do not contain messages. |
.clients.ack |
Clients acknowledgement cache | Used during recovery to help AMPS quickly identify the last message persisted from each publisher without having to reprocess each journal file. These files do not contain messages. |
.queues.ack |
Queue acknowledgement cache | For each queue, stores the point in the transaction log for which that queue has been completely processed (that is, all messages prior to that point in the transaction log have been acknowledged or expired). On recovery, AMPS can begin restoring the state of the queue from that point rather than reprocessing the entire transaction log. These files do not contain messages. |
Table 14.1: Files created by AMPS when a transaction log is configured