5. Regular Expressions

Regular expression matching provides precision, power, and flexibility for matching patterns. AMPS supports regular expression matching on topics and within content filters. Regular expressions are implemented in AMPS using the Perl-Compatible Regular Expressions (PCRE) library. For a complete definition of the supported regular expression syntax, please refer to:

http://perldoc.perl.org/perlre.html

To use regular expressions for topic matching, provide a regular expression pattern where you would normally provide a topic name.

To use regular expressions in content filtering, compare strings to regular expressions using the LIKE operator. The syntax of the LIKE operator is:

string LIKE pattern

where a string is any expression that provides a string, and pattern is a literal regular expression pattern.

This chapter presents a brief overview of regular expressions in AMPS. However, this chapter is not exhaustive. For more information on regular expression matching, see the PCRE site mentioned above.

Examples

Here is an example of a content filter for messages that will match any message meeting the following criteria:

  • Regular expression match of symbols of 2 or 3 characters starting with “IB”
  • Regular expression match of prices starting with “90”
  • Numeric comparison of prices less than 91

and, the corresponding content filter:

(/FIXML/Order/Instrmt/@Sym LIKE "^IB.?$") AND
(/FIXML/Order/@Px LIKE "^90\..*" AND /FIXML/Order/@Px < 91.0)

The tables below ( Table 5.1, Table 5.2, and Table 5.3 ) contain a brief summary of special characters and constructs available within regular expressions.

Here are more examples of using regular expressions within AMPS.

Use (?i) to enable case-insensitive regular expression searching. For example, the following filter will be true regardless if /client/country contains “US” or “us”.

(/client/country LIKE "(?i)ˆus$")

To match messages where tag 55 has a TRADE suffix, use the following filter:

(/55 LIKE "TRADE$")

To match messages where tag 109 has a US prefix and a TRADE suffix, with case insensitive matching, use the following filter:

(/109 LIKE "(?i)ˆUS.*TRADE$")
Character s Meaning
^ Beginning of string
$ End of string
. Any character except a newline
* Match previous 0 or more times
+ Match previous 1 or more times
? Match previous 0 or 1 times
| The previous is an alternative to the following
() Grouping of expression
[] Set of characters
{} Repetition modifier
\ Escape for special characters

Table 5.1: Regular Expression Meta-characters

Construct Meaning
a* Zero or more a‘s
a+ One or more a‘s
a? Zero or one a‘s
a{m} Exactly m a‘s
a{m,} At least m a‘s
a{m,n} At least m, but no more than n a‘s

Table 5.2: Regular Expression Repetition Constructs

Modifier Meaning
i Case insensitive search
m Multi-line search
s Any character (including newlines) can be matched by a . character
x Unescaped white space is ignored in the pattern.
A Constrain the pattern to only match the beginning of a string.
U Make the quantifiers non-greedy by default (the quantifiers are greedy and try to match as much as possible by default.)

Table 5.3: Regular Expression Behavior Modifiers

Raw Strings

AMPS additionally provides support for raw strings which are strings prefixed by an ‘r’ or ‘R’ character. Raw strings use different rules for how a backslash escape sequence is interpreted by the parser. When a string literal is provided as a raw string, the characters in the raw string are matched exactly, even when those characters are special characters for a regular expression.

In the example below, the raw string - noted by the r prefix of the string literal in the second operand of the LIKE predicate ( Example 5.1 ) - causes AMPS to search for the literal characters ++ in the results, without requiring those characters to be escaped ( Example 5.2 ). In this example we are querying for string that contains the programming language named C++. In the regular string, we are required to escape the '+' character since it is also used in a regular expression as the “match previous 1 or more times” regular expression character. In the raw string we can use r'C++' to search for the string and not have to escape the special '+' character.

/FIXML/Language LIKE r'C++'

Example 5.1: Raw String Example

/FIXML/Language LIKE 'C\+\+'

Example 5.2: Regular String Example

Topic Regular Expressions

As mentioned previously, AMPS supports regular expression filtering for topics, in addition to content filters. Regular expressions use the same grammar described in content filtering. Regular expression matching for topics is enabled in an AMPS instance by default.

Subscriptions or queries that use a regular expression for the topic name provide all matching records from AMPS topics where the name of the topic matches the regular expression used for the subscription or query. For example, if your AMPS configuration has three SOW topics, Topic_A, Topic_B and Topic_C and you wish to search for all messages in all of your SOW topics for records where the Name field is equal to “Bob”, then you could use a sow command with a topic of Topic_.* and a filter of /FIXML/@Name='Bob' to return all matching messages that match the filter in all of the topics that match the topic regular expression.

tip Results returned when performing a topic regular expression query will follow “configuration order” — meaning that the topics will be searched in the order that they appear in your AMPS configuration file. Using the above query example with Topic_A, Topic_B and Topic_C, if the configuration file has these topics in that exact order, the results will be returned first from Topic_A, then from Topic_B and finally the results from Topic_C. As with other queries, AMPS does not make any guarantees about the ordering of results within any given topic query.