14. Transactional Messaging and Bookmark Subscriptions

AMPS includes support for transactional messaging, which includes persistence, consistency across restarts, and message replay. AMPS message queues use the transaction log to hold the messages in the queue. Transactional messaging is also the basis for replication, a key component of the high-availability capability in AMPS (as described in 25. Replication and High Availability).

AMPS message queues use the transaction log as a persistent record of the messages that have entered the queue, the order of those messages, and which messages have been acknowledged and removed from the queue. All of these capabilities rely on the AMPS transaction log. The transaction log maintains a record of messages. You can choose which messages are included in the transaction log by specifying the message types and topics you want to record.

The AMPS transaction log differs from transaction logging in a conventional relational database system. Unlike transaction logs that are intended solely to maintain the consistency of data in the system, the AMPS transaction log is fully queryable through the AMPS client APIs. For applications that need access to historical information, or applications that need to be able to recover state in the event of a client restart, the transaction log allows you to do this, relying on AMPS as the definitive single version of the state of the application. There is no need for complex logic to handle reconciliation or state restoration in the client. AMPS handles the difficult parts of this process, and the transaction log guarantees consistency.

Topics covered by a transaction log are able to provide reliable messaging with strict consistency guarantees.

When a transaction log is enabled, topics covered by the transaction log provide atomic broadcast from that instance. This means that the instance enforces a repeatable ordering on the messages, and guarantees that all subscribers receive messages reliably, in a consistent order, and with no gaps or duplicates.

Enabling a transaction log in an instance also enables the following behaviors:

  • When a transaction log is enabled, AMPS uses the client name provided by each client to manage the sequencing and integrity of the message stream, and to quickly detect and respond to reconnection. To allow this, each connection to AMPS must have a unique client name. If a duplicate client name is detected, one of the clients is assumed to be defunct and is disconnected.
  • When a transation log is enabled, persisted acknowledgements to a publish are conflated, as described in Acknowledgment Conflation and Publish Acknowledgments.

Recording and Replaying Messages With Transaction Logs

AMPS includes the ability to record messages in a transaction log, and replay those messages at a later time. This capability is key for high availability, since it gives subscribers the ability to resume a subscription at a point in time without missing messages. This capability is also the foundation of replication, since it gives AMPS the ability to preserve message streams to be synchronized to an instance that has gone offline.

The transaction log in AMPS contains a sequential, historical record of messages. Each message is identified by a bookmark, a unique identifier that AMPS uses to locate the message within the overall set of recorded messages. The transaction log can record messages for a topic, a set of topics, or for filtered content on one or more topics.

An application can request a subscription that replays messages from the transaction log. Subscriptions that replay from the transaction log are called bookmark subscriptions, since the subscription begins at a specific point in the transaction log identified by a specific bookmark. Bookmark subscriptions provide topic and content filtering in the same way that normal subscriptions do, and provide a set of unique capabilities (such as the ability to pause and resume the subscription) that are made possible because the subscription is provided from a persistent record of the message stream. Bookmark subscriptions are also key to high availability with AMPS. When a client is recovering from a restart or failure, this ability to replay allows a client to fill gaps in received messages and resume subscriptions without missing a message. This feature also allows new clients to receive an exact replay of a message stream. Replay from the transaction log is also useful for auditing, quality assurance, and backtesting.

The transaction log is used in AMPS replication to ensure that all servers in a replication group are continually synchronized should one of them experience an interruption in service. For example, say an AMPS instance, as a member of a replication group, goes down. When it comes back up, it can query another AMPS instance for all of the messages it did not receive, thereby catching up to a point of synchronization with the other instances. This feature, when coupled with AMPS replication, ensures that message subscriptions are always available and up-to-date.

The AMPS transaction log records messages that are received from a publisher and events that affect those messages such as sow_delete commands. AMPS does not record messages that are created through a view, out-of-focus messages, or event status messages created by AMPS.

Understanding Message Persistence

To take advantage of transactional messaging, the publisher and the AMPS instance work together to ensure that messages are written to persistent storage. AMPS lets the publisher know when the message is persisted, so that the publisher knows that it no longer needs to track the message.

When a publisher publishes a message to AMPS, the publisher assigns each message a unique sequence number. Once the message has been written to persistent storage, AMPS uses the sequence number to acknowledge the message and let the publisher know that the message is persisted. Once AMPS has acknowledged the message, the publisher considers the message published. For safety, AMPS always writes a message to the local transaction log before acknowledging that the message is persisted. If the topic is configured for synchronous replication, all replication destinations have to persist the message before AMPS will acknowledge that the message is persisted.

For efficiency, AMPS may not acknowledge each individual message. Instead, AMPS acknowledges the most recent persisted message to indicate that all previous messages have also been persisted, as described in Acknowledgment Conflation and Publish Acknowledgments. Publishers that need transactional messaging do not wait for acknowledgment to publish more messages. Instead, publishers retain messages that haven’t been acknowledged, and republish messages that haven’t been acknowledged if failover occurs. The AMPS client libraries include this functionality for persistent messaging (see descriptions of the publish store in client library documentation). See the Guaranteed Publishing section of this guide for further details.

Configuring a Transaction Log

Before demonstrating the power of the transaction log, we will first show how to configure the transaction log in the AMPS configuration file.

<!-- All transaction log definitions are contained within the TransactionLog block.
    The following global settings apply to all Topic blocks defined within the
    TransactionLog: JournalDirectory, PreallocatedJournalFiles, and JournalSize. -->
<TransactionLog>

    <!-- The JournalDirectory is the filesystem location where journal files and journal
        index files will be stored. -->
    <JournalDirectory>./amps/journal/</JournalDirectory>

    <!-- The JournalArchiveDirectory is the filesystem location to which AMPS will
        archive journals. Notice that AMPS does not archive files by default. You
        configure an action to archive journal files -->
    <JournalArchiveDirectory>/mnt/somedev0/amps/journal</JournalArchiveDirectory>

    <!-- PreallocatedJournalFiles defines the number of journal files AMPS will create as
        part of the server startup. Default: 2 Minimum: 1 -->
    <PreallocatedJournalFiles>1</PreallocatedJournalFiles>

    <!-- The JournalSize is the approximate size of the journal files
         that AMPS will create. -->
    <JournalSize>10MB</JournalSize>

    <!-- When a Topic is specified, then all messages which match exactly the specified
        topic or regular expression will be included in the transaction log. Otherwise,
        AMPS initializes the transaction logging, but does not record any messages to
        the transaction log. -->
    <Topic>
        <Name>orders</Name>
        <MessageType>nvfix</MessageType>
    </Topic>

    <!-- The FlushInterval is the interval at which messages will be flushed the journal
        file during periods of slow activity. Default: 100ms Maximum: 100ms Minimum: 30us -->
    <FlushInterval>40ms</FlushInterval>
</TransactionLog>

Replaying Messages with Bookmark Subscription

One of the most useful and powerful features in AMPS is bookmark subscription, which is enabled by the transaction log. With bookmark subscription, an application requests a subscription that starts at a specific point in the transaction log. AMPS begins the subscription at the specified point, and provides messages from the transaction log.

Each message in the transaction log has a bookmark. A bookmark is an opaque, unique identifier that is added by AMPS to each message recorded in the transaction log. For messages provided from a transaction log, the field is included in the Bookmark header of the message. AMPS guarantees that bookmarks for the instance are monotonically increasing, which enables AMPS to rapidly find an individual bookmark within the transaction log.

A bookmark subscription simply requests that AMPS begin the subscription with the first message following the bookmark provided with the subscription. AMPS locates the bookmark in the transaction log, and begins the subscription at that point in time.

One way to think about a bookmark subscription is that AMPS publishes to the subscribing client only those messages that:

  1. have bookmarks after the provided bookmark,
  2. match the subscription’s Topic and Filter, and
  3. have been written to the transaction log

AMPS provides these messages in the order in which they were recorded to the transaction log. Because a bookmark subscription requires a transaction log, when a client requests a bookmark subscription for a topic that is not being recorded in the transaction log, AMPS returns an error.

If the subscription requests a completed acknowledgment, that acknowledgment will be delivered to the subscription once replay has completed. Messages delivered to the subscription after the acknowledgment is delivered are from new publishes.

AMPS allows an application to submit a comma-delimited list of bookmark values as the bookmark for a subscription request. In this case, AMPS begins replay at the oldest bookmark in the list. The client controls the bookmark provided on the subscription request. For a bookmark subscription, the AMPS server does not keep a persistent record of which bookmarks a specific client or subscription has processed. The AMPS client libraries provide a facilities for easily tracking the messages which an application has processed so the application can resume at the appropriate point in the transaction log.

tip Bookmark subscriptions are provided from the transaction log rather than the live publish stream. This lets AMPS adapt the pace of replay to the pace at which the subscriber is consuming replayed messages without triggering slow client offlining.

There are four different ways that a client can request a bookmark replay from the transaction log. Each of these bookmark types meets a different need and enables a different recovery strategy that an application can use. The sections below describe the recovery types, the cases in which they can be used, and how the 60East clients implement them.

tip While there are similarities between a bookmark subscription used for replay and a SOW query, the transaction log and SOW are independent features that can be used separately. The SOW gives a snapshot of the current view of the latest data, while the journal is capable of playback of previous messages. Historical SOW queries provide a snapshot of the SOW at a defined point in the past, and are provided by the SOW database rather than the transaction log.

Recovery With an Epoch

The epoch bookmark, when requested on a subscription, will replay the transaction log back to the subscribing client from the very beginning. Once the transaction log has been replayed in its entirety, then the subscriber will begin receiving messages on the live incoming stream of messages. A subscriber does this by requesting a 0 in the bookmark header field of their subscription. The AMPS clients provide a constant for epoch, typically represented as EPOCH.

This type of bookmark can be used in a case where the subscriber has begun after the start of an event, and needs to catch up on all of the messages that have been published to the topic.

To ensure that no messages from the subscription are lost during the replay, AMPS replays messages from the transaction log until the client reaches the last message in the transaction log. Once all of the existing messages in the transaction log have been sent to the client, AMPS will cut over to the live subscription stream and provide messages to the client as soon as they are persisted.

Bookmark Replay From NOW

The NOW bookmark, when requested on a subscription, declines to replay any messages from the transaction log, and instead begins streaming messages from the live stream - returning any messages that would be published to the transaction log that match the subscription’s Topic and Filter.

This type of bookmark is used when a client is concerned with messages that will be published to the transaction log, but is unconcerned with replaying the historical messages in the transaction log. This strategy is often used for applications that want to ensure that they do not miss messages, even if the application temporarily loses connectivity, but are not concerned with older messages. For this case, the application subscribes with NOW when the application starts, and then re-establishes the subscription with the most recently-processed bookmark if connectivity is lost. This resubscription behavior is typically handled by the client reconnection logic (as in the 60East HAClient implementations).

The NOW bookmark is performed using a subscribe query with “0|1|” as the bookmark field. The AMPS clients provide a constant for this value, typically represented as NOW.

Bookmark Replay With a Bookmark

Clients that store the bookmarks from published messages can use those bookmarks to recover from an interruption in service. By placing a subscribe query with the last bookmark recorded, a client will get a replay of all messages persisted to the transaction log after that bookmark. Once the replay has completed, the subscription will then cut over to the live stream of messages.

To perform a bookmark replay, the client places a bookmark subscription with the bookmark at which to start the subscription.

Developer Note: the MOST_RECENT value

The AMPS client libraries provide a special constant value that requests that the library look up the appropriate recovery point in the bookmark store and then provide that recovery point in the subscription request. This special value is typically represented as MOST_RECENT. When the application requests a bookmark subscription with a bookmark of MOST_RECENT, the client library looks for the most recent bookmarks processed for that subscription, then provides the appropriate bookmark or list of bookmarks when resubscribing. This process helps to ensure that the subscription begins at the last processed message, and the application receives the next unprocessed message for the subscription. If there is no record of a subscription, the AMPS clients will start with EPOCH, so that the first time a subscription is entered, the application gets the full record of available messages.

It’s important to remember that the AMPS server has no knowledge of the MOST_RECENT value. MOST_RECENT itself is never sent to AMPS and never appears in the AMPS log. MOST_RECENT is simply a request to the AMPS client library to look up the exact bookmark value to provide to AMPS. The AMPS client libraries always translate a request for MOST_RECENT into either a specific value (typically a list of bookmarks) or EPOCH.

Bookmark Replay From a Moment in Time

The final type of bookmark supported is the ASCII-formatted timestamp. When using a timestamp as the bookmark value, the transaction log replays all messages that occurred after the timestamp, and then cuts over to the live subscription once the replay stream has been consumed.

This bookmark has the format of YYYYmmddTHHMMSS[Z] where:

  • YYYY is the four digit year.
  • mm is the two digit month.
  • dd is the two digit day.
  • T the character separator between the date and time.
  • HH the two digit hour.
  • MM the minutes of the time.
  • SS the two digit second.
  • Z is an optional timezone specifier. AMPS timestamps are always in UTC, regardless of whether the timezone is included. AMPS only accepts a literal value of Z for a timezone specifier.

For example, a timestamp for January 2nd, 2015, at 12:35:

20150102T123500Z

Content and Topic Filtering

As with all other subscriptions, bookmark subscriptions support content filtering.

Bookmark subscriptions provide only messages from topics that are recorded in the transaction log. In other words, when a bookmark subscription uses a topic regular expression, only messages from topics that are recorded in the transaction log are provided to the subscription. This ensures that a bookmark subscription provides a consistent, repeatable stream of messages. The topics provided to the subscription are the same during replay, when only messages recorded in the transaction log are available, and after replay completes, when every publish to AMPS is available. This also ensures that bookmark subscription that replays messages for a specific timeframe gets the same messages as bookmark subscribers that had active subscriptions during that timeframe.

Content filtering is covered in greater detail in Chapter 4 Amps Expressions.

Delivery Rate Control for Bookmark Subscriptions

AMPS allows subscribers to specify the maximum delivery rate for messages delivered from a bookmark subscription. A subscriber specifies the maximum rate at which AMPS should deliver messages to the subscription. AMPS then limits the rate at which replay occurs so that the overall rate does not exceed the specified maximum. bookmark subscription rate controlRate control is not available for subscriptions that use the live option.

To request rate control, a subscriber provides the rate option on the subscription. A rate can be specified in either messages per second, number of bytes delivered per second, or a multiple of the original delivery rate. For example, the following subscription option limits delivery to 1000 messages per second:

rate=1000

To limit delivery to 500KB per second, a subscriber would provide this option:

rate=500KB

To limit replay to double the speed at which messages were originally published, a subscriber would provide this option:

rate=2X

To limit delivery to half the speed at which messages were originally published, a subscriber would provide this option

rate=.5X

When using a rate that is a factor of the original replay speed, you may want AMPS to skip over long gaps. For example, you may want to do load testing by replaying several days’ worth of operations at a 5x multiplier. In that case, however, your load test does not need to be idle when there are gaps during which no messages are produced (for example, outside of trading hours or during holidays). For this situation, AMPS provides a rate_max_gap option that sets the maximum amount of time for a replay to wait to produce a message. For example, with an option string like:

rate=5X,rate_max_gap=10s

AMPS will attempt to produce messages at 5 times the original publish rate. In the event that there is a gap between messages of more than 50 seconds in the original publish stream (that is, 10 seconds in the replay), AMPS will wait for 10 seconds and then “skip ahead” to the next message in the replay.

Pausing and Resuming Bookmark Subscriptions

Beginning in AMPS 5.0, AMPS offers the ability to pause a bookmark subscription. When a subscriber requests that AMPS pause the subscription, AMPS stops providing messages from the bookmark subscription, but does not remove the subscription. The subscriber can then resume the subscription, and AMPS will again begin providing messages from the subscription. While the subscription is paused, AMPS maintains a record of the current position in the transaction log, and begins replay from that point.

This feature can be useful for clients that need to temporarily stop processing messages while minimizing the buffer space consumed during the time that the client is not consuming messages. For example, a simulation that visualizes historical data might pause the bookmark subscription if the user pauses the visualization.

An application may create a subscription in the paused state by including pause as an option on the initial subscribe command. To pause an active subscription, a subscriber sends a subscribe command with the existing subscription ID and the pause option. To resume a subscription, a subscriber sends a subscribe command with the subscription ID (or a comma-separated list of subscription IDs) and the resume option. The AMPS clients provide convenience constants for the pause and resume options.

AMPS allows a given client to pause or resume multiple subscriptions at once.

When multiple bookmark subscriptions are resumed at the same time, AMPS will attempt to combine replay for the subscriptions. When AMPS can combine replay, AMPS will guarantee that messages across subscriptions are delivered from the same replay, which can help to preserve order across subscriptions. AMPS can combine subscriptions when they are delivered to the same client connection, were paused at the same bookmark, deliver at the same rate and are resumed with the same command. This feature can be useful for synchronizing message delivery across a number of subscriptions. When using pause and resume for this purpose, an application typically includes the pause option on a number of subscriptions when the subscriptions are created, and then resumes the subscriptions when the application is ready to begin the replay.

Pausing a subscription stops AMPS from sending messages to the client once the pause command is processed. However, any messages already on the network, or in a network buffer on the client or the server will be delivered to the client.

AMPS allows you to begin a subscription in the paused state by providing the pause option when creating the subscription.

AMPS removes a paused subscription if the subscriber disconnects: for restarting a subscription across subscriber restarts, use the basic bookmark subscription features as described above.

Conflation and Bookmark Subscriptions

AMPS supports subscription conflation for bookmark subscriptions, as described in Conflated Subscriptions.

Conflation for bookmark subscriptions works the same way that conflation for regular subscriptions works. Messages from the replay are held by AMPS for the conflation interval. If during that interval the replay finds a message with the same conflation_key value, AMPS replaces the held message with the message from the replay. At the end of the conflation interval, AMPS provides the currently held message to the subscriber. The conflation interval refers to the replay. In other words, a conflation interval of 1s conflates messages for 1 second, regardless of whether the messages are provided from a replay or from current publishes. If the messages are provided from the transaction log, conflation occurs for 1 second of replay time, regardless of the rate at which the messages were originally published.

When using conflation, the bookmark provided on a message that has been provided after conflation is the bookmark for the first conflated message during the interval rather than the message that AMPS delivers at the end of the conflation interval.

Requesting Message Timestamps

Messages that are replayed from the transaction log will not have the timestamp field of the header populated by default. In order to request timestamps provide the timestamp option when creating the subscription.

Selecting Message Durability Options

AMPS supports two distinct options for specifying message durability. By default, messages are provided to a bookmark subscription when they are persisted to the local transaction log.

Once replay from the transaction log is finished, AMPS sends messages to subscribers as the messages are processed. By default, AMPS waits until a message is persisted to the local transaction log before sending the message to subscribers. Because each message delivered is persisted, this approach ensures that the sequence of messages is consistent for this instance across client and server restarts, and that messages that are received by a subscriber will be available after a restart.

AMPS provides options that a subscriber can use to change the point at which AMPS delivers messages once replay from the transaction log has finished.

Using the ‘fully_durable’ Option for Bookmark Subscriptions

With the fully_durable option, once replay from the transaction log is finished, AMPS sends a message to the subscriber only when the message has been persisted in the local transaction log and all synchronous downstream replication destinations have acknowledged the message. This option is useful for applications where processing of a message should not begin until more than one AMPS instance has persisted the message.

This option will typically introduce more latency for incoming messages when those messages must be replicated. When this option is used and one or more of the synchronous downstream replication destinations that receives messages for this topic is offline, the instance will not deliver incoming messages until that destination comes back online or is downgraded to asynchronous replication.

Using the ‘live’ Option for Bookmark Subscriptions

In some cases, reducing latency may be more important than consistency. To support these cases, AMPS provides a live option on bookmark subscriptions. For bookmark subscriptions that use the live option, once replay has finished, AMPS sends messages to subscribers before the message has been persisted. This can provide a small reduction in latency at the expense of increasing the risk of inconsistency upon failover. For example, if a publisher does not republish a message after failover, your application may receive a message that is not stored in the transaction log and that other applications have not received.

warning The live option increases the risk of inconsistent data between your program and AMPS in the event of a failover. 60East recommends using this option only if the risk is acceptable and if your application requires the small latency reduction this option provides.

Because the live option does not wait for messages to be persisted, subscriptions that use this option are subject to slow client offlining after replay from the transaction log is complete.

The rate, pause, and resume options are not supported with the live option.

Managing Journal Files

The design of the journal files for the transaction log are such that AMPS can archive, compress and remove these files while AMPS is running. AMPS actions provide integrated administration for journal files, as described in Chapter 23 Automating AMPS with Actions.

Archiving a file copies the file to an archival directory, typically located on higher-capacity but higher-latency storage. Compressing a file compresses the file in place. Archived and compressed journal files are still accessible to clients for replay and for AMPS to use in rebuilding any SOW files that are damaged or removed.

When defining a policy for archiving, compressing or removing files, keep in mind the amount of time for which clients will need to replay data. Once journal files have been deleted, the messages in those files are no longer available for clients to replay or for AMPS to use in recreating a SOW file. If journal files are removed, and a SOW file is retained, this means that the SOW may have data that is not in the transaction log.

caution

While AMPS is running, the amps-action-do-remove-journal action is the only way to safely remove a journal file. This action correctly updates the internal AMPS data structures that refer to the journal file.

Likewise, the amps-action-do-archive-journal action is the only way to safely move a journal file to the archive directory while AMPS is running, and the amps-action-do-compress-journal action is the only way to safely compress journals while AMPS is running.

To determine how best to manage your journal files, consider your application’s access pattern to the recorded messages. Most applications have a period of time (often a day or a week) where historical data is in heavy use, and a period of time (often a week, or a month) where data is infrequently used. One common strategy is to create the journal files on high-throughput storage. The files are archived to slower, higher-capacity storage after a short period of time, compressed, and then to removed after a longer period of time. This strategy preserves space on high-throughput storage, while still allowing the journals to be used. For example, if your applications frequently replay data for the last day, occasionally replay data older than the last week, and never request data older than one month, a management strategy that meets these needs would be to archive files after one day, compress them after a week, and remove them after one month.

caution If you remove journal files when AMPS is shut down, keep in mind that the removal of journal files must be sequential and can not leave gaps in the remaining files. For example, say there are three journal files, 001, 002 and 003. If only 002 is removed, then the next AMPS restart could potentially overwrite the journal file 003, causing an unrecoverable problem.

When using AMPS actions to manage journal files, AMPS ensures that all replays from a journal file are complete, all queue messages in that journal file have been delivered (and acknowledged, if required), and all messages from a journal file have been successfully replicated before removing the file.

Reference to File Types

AMPS creates the following kinds of files as a part of creating and managing the transaction log. Notice that this includes both files that contain messages (journal files) and a set of files created by AMPS to improve efficiency when the instance is restarting and recovering the state of the transaction log.

Extension File Type Description
.journal Journal file These files contain the messages that comprise the transaction log. AMPS always writes new messages to an uncompressed journal file.
.journal.gz Compressed journal file These files contain messages that comprise the transaction log. These files have been compressed by AMPS as a result of the amps-action-do-compress-journal action. Other than being compressed, they are treated identically to uncompressed journal files.
.index.gz Index file

These files are used during recovery to help AMPS quickly rebuild its references to the content of the transaction log without having to completely reprocess each file. Each index file contains index information for the corresponding journal file.

These files do not contain messages.

.clients.ack Clients acknowledgement cache

Used during recovery to help AMPS quickly identify the last message persisted from each publisher without having to reprocess each journal file.

These files do not contain messages.

.queues.ack Queue acknowledgement cache

For each queue, stores the point in the transaction log for which that queue has been completely processed (that is, all messages prior to that point in the transaction log have been acknowledged or expired). On recovery, AMPS can begin restoring the state of the queue from that point rather than reprocessing the entire transaction log.

These files do not contain messages.

Table 14.1: Files created by AMPS when a transaction log is configured