6. Regular Expressions¶
Regular expression matching provides precision, power, and flexibility for matching patterns. AMPS supports regular expression matching on topics and within content filters. Regular expressions are implemented in AMPS using the Perl-Compatible Regular Expressions (PCRE) library. For a complete definition of the supported regular expression syntax, please refer to:
http://perldoc.perl.org/perlre.html
To use regular expressions for topic matching, provide a regular expression pattern where you would normally provide a topic name.
To use regular expressions in content filtering, compare strings to
regular expressions using the LIKE
operator. The syntax of the
LIKE
operator is:
string LIKE pattern
where a string is any expression that provides a string, and pattern is a literal regular expression pattern.
This chapter presents a brief overview of regular expressions in AMPS. However, this chapter is not exhaustive. For more information on regular expression matching, see the PCRE site mentioned above.
Examples¶
Here is an example of a content filter for messages that will match any message meeting the following criteria:
- Regular expression match of symbols of 2 or 3 characters starting with “IB”
- Regular expression match of prices starting with “90”
- Numeric comparison of prices less than 91
and, the corresponding content filter:
(/FIXML/Order/Instrmt/@Sym LIKE "^IB.?$") AND
(/FIXML/Order/@Px LIKE "^90\..*" AND /FIXML/Order/@Px < 91.0)
The tables below ( Table 6.1, Table 6.2, and Table 6.3 ) contain a brief summary of special characters and constructs available within regular expressions.
Here are more examples of using regular expressions within AMPS.
Use (?i)
to enable case-insensitive regular expression searching. For example, the following
filter will be true regardless if /client/country
contains “US” or
“us”.
(/client/country LIKE "(?i)ˆus$")
To match messages where tag 55 has a TRADE
suffix, use the following
filter:
(/55 LIKE "TRADE$")
To match messages where tag 109 has a US
prefix and a TRADE
suffix, with case insensitive matching, use the following filter:
(/109 LIKE "(?i)ˆUS.*TRADE$")
Character s | Meaning |
---|---|
^ | Beginning of string |
$ | End of string |
. | Any character except a newline |
* | Match previous 0 or more times |
+ | Match previous 1 or more times |
? | Match previous 0 or 1 times |
| | The previous is an alternative to the following |
() | Grouping of expression |
[] | Set of characters |
{} | Repetition modifier |
\ | Escape for special characters |
Table 6.1: Regular Expression Meta-characters
Construct | Meaning |
---|---|
a* | Zero or more a‘s |
a+ | One or more a‘s |
a? | Zero or one a‘s |
a{m} | Exactly m a‘s |
a{m,} | At least m a‘s |
a{m,n} | At least m, but no more than n a‘s |
Table 6.2: Regular Expression Repetition Constructs
Modifier | Meaning |
---|---|
i | Case insensitive search |
m | Multi-line search |
s | Any character (including newlines) can be matched by a . character |
x | Unescaped white space is ignored in the pattern. |
A | Constrain the pattern to only match the beginning of a string. |
U | Make the quantifiers non-greedy by default (the quantifiers are greedy and try to match as much as possible by default.) |
Table 6.3: Regular Expression Behavior Modifiers
Raw Strings¶
AMPS additionally provides support for raw strings which are strings prefixed by an ‘r’ or ‘R’ character. Raw strings use different rules for how a backslash escape sequence is interpreted by the parser. When a string literal is provided as a raw string, the characters in the raw string are matched exactly, even when those characters are special characters for a regular expression.
In the example below, the raw string - noted by the r
prefix of the
string literal in the second operand of the LIKE
predicate (
Example 6.1
) - causes AMPS to search for the literal characters
++
in the results, without requiring those characters to be escaped (
Example 6.2
). In this example we are querying for string that
contains the programming language named C++. In the regular string, we
are required to escape the '+'
character since it is also used in a
regular expression as the “match previous 1 or more times” regular
expression character. In the raw string we can use r'C++'
to search
for the string and not have to escape the special '+'
character.
/FIXML/Language LIKE r'C++'
Example 6.1: Raw String Example
/FIXML/Language LIKE 'C\+\+'
Example 6.2: Regular String Example
Topic Regular Expressions¶
As mentioned previously, AMPS supports regular expression filtering for topics, in addition to content filters. Regular expressions use the same grammar described in content filtering. Regular expression matching for topics is enabled in an AMPS instance by default.
Subscriptions or queries that use a regular expression for the topic
name provide all matching records from AMPS topics where the name of the
topic matches the regular expression used for the subscription or query.
For example, if your AMPS configuration has three SOW topics,
Topic_A
, Topic_B
and Topic_C
and you wish to search for all
messages in all of your SOW topics for records where the Name
field
is equal to “Bob”, then you could use a sow
command with a topic of
Topic_.*
and a filter of /FIXML/@Name='Bob'
to return all
matching messages that match the filter in all of the topics that match
the topic regular expression.
Results returned when performing a topic regular expression
query will follow “configuration order” — meaning that the
topics will be searched in the order that they appear in your
AMPS configuration file. Using the above query example with
Topic_A , Topic_B and Topic_C , if the configuration
file has these topics in that exact order, the results will be
returned first from Topic_A , then from Topic_B and
finally the results from Topic_C . As with other queries,
AMPS does not make any guarantees about the ordering of results
within any given topic query. |