4. AMPS Expressions¶
AMPS includes an expression language that combines elements of XPath and
SQL-92’s WHERE
clause. This expression language is used whenever the
AMPS server refers to the contents of a message, including:
- Content filtering
- Constructing fields for message enrichment
- Creating projected fields for views
AMPS uses a common syntax for each of these purposes, and provides a common set of operators and functions. AMPS also provides special directives for message enrichment, and aggregation functions for projecting views.
For example, when an expression is used as a content filter, any message
for which the expression returns true
matches the content filter.
When an expression is used to construct a field for message enrichment
or view projection, the expression is evaluated and the result that the
expression returns is used as the content of the field.
Expressions Overview¶
The quickest way to learn AMPS expressions is to think of each as a combination of identifiers that tell AMPS where to find data in a message, and operators that tell AMPS what to do with that data. Each AMPS expression produces a value. The way AMPS uses that value depends on where the expression is used. For example, in a content filter, AMPS uses the value of the expression to determine whether a message matches the filter. When constructing a field, AMPS uses the value of the expression as the contents of the field.
Consider a simple example of an expression used as a filter. Imagine AMPS receives the following JSON message:
{"name":"Gyro", "job":"kitten"}
Using an AMPS expression, you can easily construct a content filter that matches the message:
/name = 'Gyro'
There are three parts to this expression. The first part, /name
, is
an identifier that tells AMPS to look for the contents of the name
field at the top level of the JSON document. The second part of the
filter, =
, is the equality operator, which tells AMPS to compare
the values on either side of the operator and return true
if the
values match. The final part of the filter, 'Gyro'
, is a string
literal for the equality operator to use in the comparison. When an
expression is used in a content filter, a message matches the filter
when the expression returns true
. The expression returns true
for the sample message, so the sample message matches the filter.
The identifier syntax is a subset of XPath, as described in Identifiers. The comparison syntax is similar to SQL-92.
Notice that AMPS makes no rigid guarantees as to the number of times a given expression is evaluated or when that evaluation will take place. AMPS will evaluate the expression as needed.
Expression Syntax¶
AMPS expressions are designed to work exactly as expected if you are familiar with XPath path specifiers and SQL-92 predicates. This section describes in detail how AMPS evaluates the syntax, operators, and functions available in the AMPS expression language.
AMPS expressions combine the following elements:
- Identifiers specify a field in a message. When evaluating an expression, AMPS replaces identifiers with values from the message or set of messages being evaluated.
- Literal values are explicit values in an AMPS expression, such as
'IBM'
or42
- Operators and functions such as
=
,<
,>
,*
, andUNIX_TIMESTAMP()
Every AMPS expression produces a value. The way that AMPS uses the value
depends on the context in which AMPS evaluates the expression. For
example, if the expression is used for a filter, the message is
considered to match the filter when the expression returns true
.
When an expression is used to project a field, the result of the
expression is used as the value of the projected field.
Identifiers¶
AMPS identifiers use a subset of XPath to specify values in a message. AMPS identifiers specify the value of an attribute or element in an XML message, and the value of a field in a JSON, FIX or NVFIX message. Given that the identifier syntax is only used to specify values, the subset of XPath used by AMPS does not include wildcards, relative paths, array manipulation, predicates or functions.
For example, when messages are in this XML format:
<Order update="full">
<ClientID>12345</ClientID>
<Symbol>IBM</Symbol>
<OrderQty>1000</OrderQty>
</Order>
The following identifier specifies the Symbol
element of an
Order
message:
/Order/Symbol
The following identifier specifies the update
attribute of an
Order
message:
/Order/@update
For FIX and NVFIX, you specify fields using /
and the tag name. AMPS
interprets FIX and NVFIX messages as though they were an XML fragment
with no root element. For example, to specify the value of FIX tag
55
(symbol), use the following identifier:
/55
Likewise, for JSON or other types that represent an object, you navigate
through the object structure using the /
to indicate each level of
nesting.
AMPS only guarantees support for field identifiers that are valid step names
in XPath. For example, AMPS does not guarantee that it can process or
filter on a field named Fits&Starts
.
AMPS also supports an optional bracketed field identifier syntax that extends the characters available for field names. For example, the following step name:
[/Not Xpath Name]
refers to a field name of Not Xpath Name
at the root level of the message.
This syntax allows spaces to be used in field names in AMPS expressions, even
though this is not a valid step name in XPath. Notice that not all message
types support field names with embedded spaces or other special characters.
For example, the Not Xpath Name
identifier is not a valid element name
in XML, nor would it be a valid field name in Google Protocol Buffers.
AMPS checks the syntax of identifiers when parsing an expression. AMPS
does not try to predict whether an identifier will match messages within
a particular topic. It is not an error to submit an identifier that can
never match due to the limitations of the message type. For example,
AMPS allows you to use an identifier like /OrderQty
in a filter submitted
for a FIX connection, even though FIX messages only use numeric tags, or an
identifier like /DataPackage/RunDate
in a filter submitted for a BFlat
connection, even though BFlat does not support nested elements.
The message type is responsible for constructing a set of identifiers
from a message. In most cases, the mapping is simple. However, see the
documentation for the message type for details, or if the mapping is
unclear. For example, a composite-local
message type adds the number
of the part to the beginning of each XPath within the part (so, a
top-level field of /name
in the first part of the message has an
identifier of /0/name
).
AMPS Data Types¶
Each value in AMPS is assigned a data type when the message type module
parses the value. AMPS operators and functions attempt to convert values
into compatible types, based on the type of operation. For example, the
*
operator (multiplication) will attempt to convert all values to
numeric values, while the CONCAT
function (string concatenation)
will attempt to convert all values to strings. In effect, a value in
AMPS can be transparently treated as any type to which it can be
meaningfully converted.
Internally, AMPS uses the data types in the table below. As mentioned above, the message type module is responsible for assigning the type of a value from an incoming message as part of the parsing process. For some types, such as JSON, XML, FIX and NVFIX, the parser infers the type of the value from the field. For other types, such as MessagePack, BFLAT, Google Protocol Buffers or BSON, the message itself contains information about the type of the field.
As mentioned above, the AMPS expression language does not limit the value to the type assigned by the message type module. Instead, a value in AMPS can be used in any context.
For example, given the following JSON document:
{"a":1,"b":"47"}
The values of /a
and /b
can be used as either string values or
numeric values. AMPS will automatically convert these values as necessary,
and AMPS considers the string or numeric representation to
be equally correct and valid.
The following table lists the data types in the AMPS expression language:
Type | Description | Untyped Message Examples |
---|---|---|
NULL | Unknown, untyped value (SQL-92 semantics) | [no field provided] NVFIX: JSON: XML: |
Boolean | True (1 ) or false
(0 ) |
JSON: {"e":true} |
Integer | Signed 64-bit integer or unsigned 64-bit integer for values > LONG_MAX | NVFIX: JSON: XML: |
Floating Point Number | 64-bit floating point number | NVFIX: JSON: XML: |
String | Arbitrary sequence of bytes of a specific length An empty string is
considered to be |
NVFIX:
JSON:
XML:
|
Numeric Types and Literals in AMPS Expressions¶
Numeric values in AMPS are always typed as either integers or floating point values. All numeric types that are less than or equal to the LONG_MAX limit in AMPS are signed, otherwise, the numeric type is unsigned. AMPS message types convert the original numeric types (or original representation for message types that do not have typed values) into the internal AMPS type system for the purposes of expression evaluation.
Within expressions, integer values are all numerals, with no decimal point, and can have a value in the same range as a 64-bit integer. For example:
42
149
-273
18446744073709551610
Within expressions, all numerals with a decimal point are floating-point numbers. AMPS interprets these numerals as double-precision floating point values. For example:
3.1415926535
98.6
-273.0
or, in scientific notation:
31.4e-1
6.022E23
2.998e8
AMPS automatically converts strings that contain numeric values to numbers when strings are used with an operator, function or comparison that expects a numeric value.
Type Promotion for Numeric Types¶
AMPS uses the following rules for type promotion when evaluating numeric expressions:
- If any of the values in the expression is
NaN
, the result isNaN
. - Otherwise, if any of the values in the expression is floating point, the result is floating point.
- Otherwise, all of the values in the expression are integers, and the result is an integer.
Notice that, for division in particular, the results returned are
affected by the type of the values. For example, the expression
1 / 5
evaluates to 0
since the result is interpreted as an
integer. In comparison, the expression 1.0 / 5
evaluates to 0.2
since the result is interpreted as a floating point value.
When a function or operator that expects a numeric type is provided with
a string, AMPS will attempt to convert string values to numeric types as
necessary. When converting string values, AMPS recognizes the same numeric
formats in message data as are supported in the AMPS expression language
(see String Literals). If the
string is in an unrecognized format, AMPS converts the string as
NaN
.
String Literals in AMPS Expressions¶
When creating expressions for AMPS, string literals are indicated with single or double quotes. For example:
/FIXML/Order/Instrmt/@Sym = 'IBM'
AMPS supports the following escape sequences within string literals:
Escape Sequence | Definition |
---|---|
\a | Alert |
\b | Backspace |
\t | Horizontal tab |
\n | Newline |
\f | Form feed |
\r | Carriage return |
\xHH | Hexadecimal digit where H is (0..9,a..f,A..F) |
\OOO | Octal Digit (0..7) |
Additionally, any character which follows a backslash will be treated as a literal character.
AMPS string operations have no restrictions on character set, and
correctly handle embedded NULL
characters (\x00
) and characters
outside of the 7-bit ASCII range. AMPS string operations are not
unicode-aware.
NULL, NaN and IS NULL¶
XPath expressions are considered to be NULL
when they evaluate to an
empty or nonexistent field reference. NULL
values follow SQL-92
semantics.
This means that comparisons with NULL
are never true
(in other words, even if /a
is NULL
, /a != NULL
is false
and /a == NULL
is also false).
In numeric expressions where the operands or results are not a valid
number, the XPath expression evaluates to NaN
(not a number). The rules
for applying the AND
and OR
operators against NULL
and NaN
values are outlined in the tables below:
Operand1 | Operand2 | Result | |
---|---|---|---|
TRUE | (AND) | NULL | NULL |
FALSE | (AND) | NULL | FALSE |
NULL | (AND) | NULL | NULL |
NULL | (AND) | TRUE | NULL |
NULL | (AND) | FALSE | NULL |
Operand1 | Operand2 | Result | |
---|---|---|---|
TRUE | (OR) | NULL | TRUE |
FALSE | (OR) | NULL | NULL |
NULL | (OR) | NULL | NULL |
NULL | (OR) | TRUE | NULL |
NULL | (OR) | FALSE | NULL |
Likewise, direct comparisons with NULL
are not ever true (so, if /b
is NULL,
/b == NULL
does not produce a true value, and neither does /b != NULL
).
AMPS, like SQL-92, provides an IS NULL
predicate for testing whether a value is
NULL
, and an IS NOT NULL
predicate for testing whether a value is not NULL
.
There also exists an
IS NAN
predicate for checking that a value is NaN
(not a
number.)
Caution
To reliably check for existence of a NULL
value, you must
use the IS NULL
predicate such as the filter:
/optionalField IS NULL
To reliably check that a value is not NULL
, you must use
the IS NOT NULL
predicate or negate the value of an
IS NULL
test: /optionalField IS NOT NULL
and
NOT /optionalField IS NULL
are equivalent
AMPS also provides a COALESCE()
function that accepts a set
of values and returns the first value that is not NULL. For example,
given the following filter expression:
COALESCE(/userCategory,
/employeeCategory,
/vendorCategory,
'restricted') != 'restricted'
AMPS will return the first value that is not NULL
, and compare that
value to the constant string 'restricted'
. Notice that, to make the
intent of the filter clear, this example provides a constant value for
AMPS to return from the COALESCE
if all of the field values are
NULL
.
The COALESCE
function, like other functions in AMPS, is not array-aware.
This means that when one of the XPath expressions provided to COALESCE
specifies an array in the original message, AMPS provides the first item in
the array to the COALESCE
function. See Working with Arrays for details.
Compound Types in AMPS¶
Many messaging applications are designed for high performance and use a simplified message structure. For applications that use compound types, AMPS includes the ability to parse and filter on the contents of nested data structures.
For performance, AMPS parses nested data structures into a set of values. As with single-valued (or scalar) values, the AMPS expression language refers to a parsed set of values that is common to all message types rather than the underlying data.
The AMPS message types treat compound data types as a set of paths with corresponding scalar values. A field that only contains other fields is represented as a step in the path to the primitive values that it contains.
AMPS parses compound types as follows:
- Any field that contains a scalar value is represented as an identifier/value pair.
- Any field that contains other fields is represented as a step in the path to that value.
- Multiple values with identical paths are represented as an array. For more information on arrays in the AMPS expression language, see Working with Arrays.
The following JSON document is a simple example.
{"outer": {"middle": { "inner": 5 } }
With this document, AMPS produces the following parsed value:
Path | Value |
---|---|
/outer/middle/inner |
5 |
In the parsed representation, the outer
and middle
fields contain no data
of their own. They serve only as containers for the inner
field which contains data.
Notice that the intermediate paths do not have an explicit scalar value.
With a more complex document the parsed representation continues to follow the same principles, as shown in the following example.
{"outer" :
{
"array" : ["a1", "a2", "a3"],
"compound" : { "A" : "middle-A",
"B" : "middle-B",
"C" :
[ {"C1":"first-C1", "D1":"first-D1"},
{"C1":"second-C1","D1":"second-D1"} ]
}
}
}
The representation of the above message in the AMPS expression language would typically be as follows:
Path | Value | Notes |
---|---|---|
/outer/array |
['a1', 'a2', 'a3] |
Elements in the array can be referred to directly
with subscript notation. For example,
/outer/array[0] is 'a1' . |
/outer/compound/A |
'middle-A' |
|
/outer/compound/B |
'middle-B' |
|
/outer/compound/C/C1 |
['first-C1', 'second-C1'] |
Elements in the array can be referred to directly
with subscript notation. For example,
/outer/compound/C/C1[0] is 'first-C1' . |
/outer/compound/C/D1 |
['first-D1', 'second-D1'] |
Elements in the array can be referred to directly
with subscript notation. For example,
/outer/compound/C/D1[0] is 'first-D1' . |
As with the first example, fields that do not directly contain a value do not have an explict scalar value. Values with the same identifier are represented as an array of values with that identifier.
Grouping and Order of Evaluation¶
AMPS expressions allow you to group parts of the expression using parentheses. Parts of an expression inside parentheses are evaluated together. 60East recommends using parentheses to group independent parts of an expression to ensure that the expression is evaluated in the expected order. For example, in this expression:
( /counter % 3 ) == 0
The clause /counter % 3
is evaluated first, and the result of that
evaluation is compared to 0
.
Within a group, elements are evaluated left to right in precedence order. For example, given the filter below:
(expression1 OR expression2 AND expression3) OR (expression4 AND
NOT expression5) ...
AMPS evaluates expression2
, then expression3
(since AND
has
higher precedence than OR
), and if they evaluate to false, then
expression1
will be evaluated.
AMPS does not guarantee that all parts of an expression will be evaluated if the result of an expression can be determined after only evaluating part of the expression. For example, given the expression:
A_FUNCTION(/a) OR B_FUNCTION(/b)
AMPS only guarantees that B_FUNCTION(/b)
will be evaluated if
A_FUNCTION(/a)
returns false
.
Logical Operators¶
The logical operators are NOT
, AND
, and OR
, in order of
precedence. These operators have the usual Boolean logic semantics.
/FIXML/Order/Instrmt/@Sym = 'IBM' OR /FIXML/Order/Instrmt/@Sym = 'MSFT'
As with other operators, you can use parentheses to group operators and affect the order of evaluation.
(/orderType = 'rush' AND /customerType IN ('silver', 'gold') )
OR /customerType = 'platinum'
Arithmetic Operators¶
AMPS supports the arithmetic operators +
, -
, *
, /
,
%
, and MOD
in expressions. The result of arithmetic operators
where one of the operands is NULL
is undefined and evaluates to
NULL
.
AMPS distinguishes between floating point and integral types. When an arithmetic operator uses two different types, AMPS will convert the integral type to a floating point value as described in Numeric Types and Literals.
Examples of filter expressions using arithmetic operators:
/6 * /14 < 1000
/Order/@Qty * /Order/@Prc >= 1000000
AMPS numeric types are signed, and the AMPS arithmetic operators
correctly handle negative numbers. The MOD
and %
operators
preserve the sign of the first argument to the operator. That is,
-5 % 3
produces a result of -2
, while 5 % -3
produces a
result of 2
.
Caution
When using mathematical operators in conjunction with filters,
be careful about the placement of the operator. Some operators
are used in the XPath expression as well as for mathematical
operation (for example, the '/'
operator in division).
Therefore, it is important to separate mathematical operators
with white space to prevent interpretation as an XPath
expression.
Comparison Operators¶
The comparison operators can be loosely grouped into equality
comparisons and range comparisons. The basic equality comparison
operators, in precedence order, are ==
, =
, >
, >=
, <
,
<=
, !=
, and <>
. The ==
comparison and the =
comparison
are treated as the same operator and produce the same results.
If these binary operators are applied to two operands of different
types, AMPS attempts to convert strings to numbers. If conversion
succeeds, AMPS uses the numeric values. If conversion fails because the
string cannot be meaningfully converted to a number, strings are always
considered to be greater than numbers. The operators consider an empty
string to be NULL
.
The following table shows some examples of how AMPS compares different types.
Expression | Result |
---|---|
1 < 2 |
TRUE |
10 < '2' |
FALSE, ‘2’ can be converted to a number |
'2.000' <> '2.0' |
TRUE, no conversion to numbers since both are strings |
2 = 2.0 |
TRUE, numeric comparison |
10 < 'Crank It Up' |
TRUE, strings are greater than numbers |
10 < '' |
FALSE, an empty string is considered to be NULL |
10 > '' |
FALSE, an empty string is considered to be NULL |
'' = '' |
FALSE, an empty string is considered to be NULL |
'' IS NULL |
TRUE, an empty string is considered to be NULL |
There are also set and range comparison operators. The BETWEEN
operator
can be used to check the range values.
Tip
The range used in the BETWEEN
operator is inclusive of both
operands, meaning the expression /A BETWEEN 0 AND 100
is
equivalent to /A >= 0 AND /A <= 100
For example:
/FIXML/Order/OrdQty/@Qty BETWEEN 0 AND 10000
/FIXML/Order/@Px NOT BETWEEN 90.0 AND 90.5
(/price * /qty) BETWEEN 0 AND 100000
The IN
operator can be used to perform membership operations on sets
of values. The IN
operator returns true when the value on the left
of the IN
appears in the set of values in the IN
clause. For
example:
/Trade/OwnerID NOT IN ('JMB', 'BLH', 'CJB')
/21964 IN (/14*5, /6*/14, 1000, 2000)
/customer IN ('Bob', 'Phil', 'Brent')
The IN
operator returns true for the set of records that would be
returned by an equivalent set of =
comparisons joined by OR
. The
following two statements return the same set of records:
/pet IN ('puppy', 'kitten', 'goldfish')
(/pet = 'puppy') OR (/pet = 'kitten') OR (/pet = 'goldfish)
This equivalence means that NULL
values in either the field being
evaluated, or the set of values provided to the IN
clause, always return
false.
This also means that, for string values, the IN
operator performs exact,
case-sensitive matching.
Tip
When evaluating against a set of values, the IN
operator
typically provides better performance than using a set of OR
operators. That is, a filter written as
/firstName IN ('Joe', 'Kathleen', 'Frank', 'Cindy', 'Mortimer')
will typically perform better than an equivalent filter written
as
/firstName = 'Joe' OR /firstName = 'Kathleen' OR /firstName =
'Frank' OR /firstName = 'Cindy' OR /firstName = 'Mortimer'
.
Regular Expression Matching¶
AMPS also provides a regular expression comparison operator, LIKE
,
to provide regular expression matching on string values. A pattern is
used for the right side of the LIKE
operator. A pattern must be
provided as a literal, quoted value. For more on regular expressions and
the LIKE
comparison operator, please see the section on
Using Regular Expressions in AMPS.
The string comparison operators described in the section called
String Comparison Functions are usually more
efficient than equivalent LIKE
expressions, particularly when used
to compare multiple literal patterns, or when the only purpose of the
regular expression is to perform case-insensitive matching. Use LIKE
operations when it is not practical to represent the filter condition
with the string comparison operators.
Function or Operator | Parameters | Description |
---|---|---|
LIKE |
The string to be compared The pattern to evaluate the string against |
Case-sensitive Returns true if the string to be compared matches the pattern. For example, the
following filter uses a
PCRE backreference to
return true for any
message where the
/state LIKE '(.)\1'
This operator is not unicode-aware. |
Conditional Operators¶
AMPS contains support for a ternary conditional IF
operator which
allows for a Boolean condition to be evaluated to true
or false
,
and will return one of the two parameters. The general format of the
IF
statement is
IF (BOOLEAN_CONDITIONAL, VALUE_TRUE, VALUE_FALSE)
In this example, the BOOLEAN_CONDITIONAL
will be evaluated, and if
the result is true, the VALUE_TRUE
value will be returned otherwise
the VALUE_FALSE
will be returned.
Function or Operator | Parameters | Description |
---|---|---|
IF |
Conditional expression Value to return if conditional expression is true Value to return if conditional expression is false |
Evaluate the conditional expression and return one of the two input values based on the results of the expression. The AMPS expression
engine can conditionally
evaluate the terms
provided to the |
For example:
SUM( IF(( (/FIXML/Order/OrdQty/@Qty > 500) AND
(/FIXML/Order/Instrmt/@Sym ='MSFT')), 1, 0 ))
The above example returns a count of the total number of orders that have been placed where the symbol is MSFT and the order contains a quantity more than 500.
The IF
operator can also be used to evaluate results to determine if
results are NULL
or NaN
. This is useful for calculating aggregates
where some values may be NULL
or NaN
. The NULL
and NaN
values are
discussed in more detail in the section called
NULL, NaN, and IS NULL.
For example:
SUM(/FIXML/Order/Instrmt/@Qty * IF(
/FIXML/Order/Instmt/@Price IS NOT NULL, 1, 0))
Working with Arrays¶
AMPS supports filters that operate on arrays in messages. There are two simple principles behind how AMPS treats arrays:
- Binary operators that yield
true
orfalse
(for example,=
,<
,LIKE
) are array aware, as is theIN
operator. These operators work on arrays as a whole, and evaluate every element in the array. - Arithmetic operators, functions, user-defined functions and other scalar operators, are not array aware, and use the first element in the array.
With these simple principles, you can predict how AMPS will
evaluate an expression that uses an array. For any operator, an empty
array evaluates to NULL
.
Let’s look at some examples. For the purposes of this section, we will consider the following JSON document:
{
"data" : [1, 2, 3, "zebra", 5],
"other" : [14, 34, 23, 5]
}
While these arrays are presented using JSON format for simplicity, the same principles apply to arrays in other message formats.
Here are some examples of ways to use an array in an AMPS filter:
Determining if any element in an array meets a criteria. To determine this, you provide the identifier for the array, and use a comparison operator.
¶ Filter Evaluates as /data = 1
TRUE, /data
contains1
/data = 'zebra'
TRUE, /data
contains'zebra'
/data != 'zebra'
TRUE, /data
contains an element that is not'zebra'
/data = 42
FALSE, /data
does not contain42
/data LIKE 'z'
TRUE, a member of /data
matches'z'
/other > 30
TRUE, a member of /other
is> 30
/other > 50
FALSE, no member of /other
is> 50
Determine whether a specific value is at a specific position. To determine this, use the subscript operator
[]
on the XPath identifier to specify the position, and use the equality operator to check the value at that position.¶ Filter Evaluates as /data[0] = 1
TRUE, first element of /data
is1
/data[3] = "zebra"
TRUE, fourth element of /data
is'zebra'
/data[1] != 1
TRUE, second element of /data
is not1
/other[1] LIKE '4'
TRUE, second element of /other
matches'4'
Determine whether any value in one array is present in another array.
¶ Filter Evaluates as /data = /other
TRUE, a value in /data
equals a value in/other
/data != /other
TRUE, a value in /data
does not equal a value in/other
Determine whether an array contains one of a set of values.
¶ Filter Evaluates as 3 IN (/data)
TRUE, 3
is a member of/data
/data IN (1, 2, 3)
TRUE, a member of /data
is in(1, 2, 3)
/data IN ("zebra", "antelope", "lion")
TRUE, a member of /data
is in("zebra", "antelope", "lion")
These patterns and principles hold regardless of the original representation of the array in a document.
When creating an expression that uses a field in a compound value, keep in mind that AMPS represents compound values as described in Compound Types in AMPS.
Working with Timestamps¶
AMPS does not include a dedicated timestamp data type. Instead, AMPS represents
timestamps either as a double
or a string
.
When representing timestamps as a double
, AMPS uses standard UNIX timestamps.
When representing timestamps as a string
, AMPS formats strings in a format compliant
with ISO-8601. The format AMPS uses was chosen to balance parsing speed, precision,
readability, and bandwidth. The format uses:
- Basic format (no delimiters between parts of a date or time)
- Decimal fractional seconds (with
.
delimiting the fraction rather than,
) - Explicit
T
to separate date and time - Explicit time zone specifier allowed, but not required
A string
timestamp has the format of YYYYmmddTHHMMSS[Z]
where:
YYYY
is the four digit yearmm
is the two digit monthdd
is the two digit dayT
is the character separator between the date and timeHH
is the two digit hourMM
is the two digit minuteSS
is the two digit secondZ
is an optional timezone specifier. AMPS timestamps are always in UTC, regardless of whether the timezone is included. AMPS only accepts a literal value ofZ
for a timezone specifier.
For example, a timestamp for January 2nd, 2015, at 12:35:
20150102T123500Z
Timestamps in string format are used for point-in-time bookmarks, as
explicit time specifiers in configuration files, in the timestamp
header optionally returned on AMPS messages, and so on.
The timestamp format AMPS uses was chosen to make string comparisons
for timestamps work as expected, including simple comparisons like
<
and >
, as well as more sophisticated comparisons like the
BETWEEN
operator.
AMPS provides functions to convert between string
representation
of a timestamp and the double
representation of a timestamp,
as described in Date and Time Functions.
Using Regular Expressions in AMPS¶
Regular expression matching provides precision, power, and flexibility for matching patterns. AMPS supports regular expression matching on topics and within content filters. Regular expressions are implemented in AMPS using the Perl-Compatible Regular Expressions (PCRE) library. For a complete definition of the supported regular expression syntax, please refer to:
http://perldoc.perl.org/perlre.html
To use regular expressions for topic matching, provide a regular expression pattern where you would normally provide a topic name.
To use regular expressions in content filtering, compare strings to
regular expressions using the LIKE
operator. The syntax of the
LIKE
operator is:
string LIKE pattern
In this context, a string is any expression that provides a string and pattern is a literal regular expression pattern.
This chapter presents a brief overview of regular expressions in AMPS. However, this chapter is not exhaustive. For more information on regular expression matching, see the PCRE site mentioned above.
Examples¶
Here is an example of a content filter for messages that will match any message meeting the following criteria:
- Regular expression match of symbols of 2 or 3 characters starting with “IB”
- Regular expression match of prices starting with “90”
- Numeric comparison of prices less than 91
The corresponding content filter would be:
(/FIXML/Order/Instrmt/@Sym LIKE "^IB.?$") AND
(/FIXML/Order/@Px LIKE "^90\..*" AND /FIXML/Order/@Px < 91.0)
The tables below ( Regular Expression Meta-characters, Regular Expression Repetition Constructs, and Regular Expression Behavior Modifiers ) contain a brief summary of special characters and constructs available within regular expressions.
Here are more examples of using regular expressions within AMPS:
Use (?i)
to enable case-insensitive regular expression searching. For example, the following
filter will be true regardless if /client/country
contains “US” or
“us”.
(/client/country LIKE "(?i)ˆus$")
To match messages where tag 55 has a TRADE
suffix, use the following
filter:
(/55 LIKE "TRADE$")
To match messages where tag 109 has a US
prefix and a TRADE
suffix, with case insensitive matching, use the following filter:
(/109 LIKE "(?i)ˆUS.*TRADE$")
Characters | Meaning |
---|---|
^ | Beginning of string |
$ | End of string |
. | Any character except a newline |
* | Match previous 0 or more times |
? | Match previous 0 or 1 times |
| | The previous is an alternative to the following |
() | Grouping of expression |
[] | Set of characters |
{} | Repetition modifier |
\ | Escape for special characters |
Construct | Meaning |
---|---|
a* | Zero or more a‘s |
a? | Zero or one a‘s |
a{m} | Exactly m a‘s |
a{m,} | At least m a‘s |
a{m,n} | At least m, but no more than n a‘s |
Modifier | Meaning |
---|---|
i | Case insensitive search |
m | Multi-line search |
s | Any character (including newlines) can be matched by a . character |
x | Unescaped white space is ignored in the pattern |
A | Constrain the pattern to only match the beginning of a string |
U | Make the quantifiers non-greedy by default (the quantifiers are greedy and try to match as much as possible by default) |
Raw Strings¶
AMPS additionally provides support for raw strings, which are strings prefixed by an ‘r’ or ‘R’ character. Raw strings use different rules for how a backslash escape sequence is interpreted by the parser. When a string literal is provided as a raw string, the characters in the raw string are matched exactly, even when those characters are special characters for a regular expression.
In the example below, the raw string - noted by the r
prefix of the
string literal in the second operand of the LIKE
predicate (
Raw String Example
) - causes AMPS to search for the literal characters
++
in the results, without requiring those characters to be escaped (
Regular String Example
). In this example we are querying for a string that
contains the programming language named C++. In the regular string, we
are required to escape the '+'
character since it is also used in a
regular expression as the “match previous 1 or more times” regular
expression character. In the raw string we can use r'C++'
to search
for the string and not have to escape the special '+'
character.
/FIXML/Language LIKE r'C++'
Raw String Example
/FIXML/Language LIKE 'C\+\+'
Regular String Example
Subscribing to a Set of Topics Using Regular Expressions¶
As mentioned previously, AMPS supports regular expression filtering for topics, in addition to content filters. Regular expressions use the same grammar described in content filtering. Regular expression matching for topics is enabled in an AMPS instance by default.
Subscriptions or queries that use a regular expression for the topic
name provide all matching records from AMPS topics where the name of the
topic matches the regular expression used for the subscription or query.
For example, if your AMPS configuration has three SOW topics,
Topic_A
, Topic_B
and Topic_C
and you wish to search for all
messages in all of your SOW topics for records where the Name
field
is equal to “Bob”, then you could use a sow
command with a topic of
^Topic_.*
and a filter of /FIXML/@Name='Bob'
to return all
matching messages that match the filter in all of the topics that match
the topic regular expression.
Notice that, as with the LIKE
expression, a regular expression
will match at any position in the topic name. To anchor the match
to the beginning of the string, use the ^
directive at the
beginning of the regular expression. To anchor the match to the
end of the string, use the $
directive at the end of the string.
For example, to match a topic with "order"
anywhere in the topic
name, you could use the regular expression order.*
(the
ending .*
matches zero or more characters, but lets AMPS know
to interpret this as a regular expression). To match only
topics that start with order
, you would use the regular expression
^order
. To match topics that end with order
, you would
use the regular expression order$
.
Tip
Results returned when performing a topic regular expression
query will follow “configuration order” — meaning that the
topics will be searched in the order that they appear in your
AMPS configuration file. Using the above query example with
Topic_A
, Topic_B
and Topic_C
, if the configuration
file has these topics in that exact order, the results will be
returned first from Topic_A
, then from Topic_B
and
finally the results from Topic_C
. As with other queries,
AMPS does not make any guarantees about the ordering of results
within any given topic query.
Performance Considerations¶
This section describes general performance considerations for the AMPS expression language and content filters. The considerations here are aspects of AMPS performance to be aware of in the general case. However, since the AMPS expression language operates on specific data, the structure and size of the messages that your application uses may have more effect on overall performance than the specific expressions used. For example, parsing and filtering a 20MB XML document is inherently more expensive than parsing and filtering a 400 byte BFlat document.
Use Short-Circuiting¶
When clauses in an expression are joined by OR
, AMPS will only
evaluate the right side of an OR
expression if the left side
of the expression is false.
When constructing an expression, this means that there can be a
performance advantage to having relatively less expensive clauses
on the left hand sides of the OR
. For example, in the
following clause:
/code = 'restricted' OR /notes LIKE 'restricted|limited'
The regular expression comparison is only evaluated if the comparison
/code = 'restricted'
is false. If the comparison is true, then
the overall clause is true and there is no need to evaluate the
regular expression.
Avoid Redundant Expressions¶
AMPS does not reorder or recombine complex expressions. Where feasible, your application can save work at the server by combining expressions. In particular, if an application is constructing a filter by reading options from various sources, performance can be improved by combining the queries.
For example, in a filter like the following:
/id = '12345' OR /id IN ('12345','23456','34567','45678')
OR /id IN ('12345','45678','90909')
The comparison against '12345'
will be evaluated three times in cases where
the value of /id
does not match any of the values in the filter.
This filter is equivalent to:
/id IN ('12345','23456','34567','45678','90909')
The same results are produced, but only evaluates the /id
field against
a given value one time.
Use Specialized Operators for Simple Comparisons, Use LIKE when Necessary¶
The LIKE
operator offers access to full Perl-Compatible Regular
Expressions within the AMPS expression language. This flexibility
allows for very precise filtering, and the PCRE engine performs
well.
However, for comparisons for which AMPS provides a named function, the named function is highly-optimized and will perform somewhat better than the general-purpose regular expression engine.
For example, given a choice between two equivalent expressions:
/state BEGINS WITH('North')
and
/state LIKE '^North'
The version that uses BEGINS WITH
will typically perform slightly
better than the version that uses the regular expression.
This doesn’t mean that regular expressions or the LIKE
operator
perform poorly. The LIKE
operator can efficiently match patterns
that would be difficult or impossible to match using the other operators.
However, for very simple comparisons where AMPS provides a dedicated
operator, that operator typically performs slightly better than a regular
expression.
The following table shows some examples of regular expressions and the AMPS operator equivalent.
Regular Expression | AMPS Operator Equivalent |
---|---|
^something |
BEGINS WITH('something') |
something$ |
ENDS WITH ('something') |
something |
INSTR(/field, 'something') != 0 |
(?i)something |
INSTR_I(/field, 'something') != 0 |
(?i)^something$ |
STREQUAL_I(/field, 'something') != 0 |
^a$|^b$|^c$ |
IN ('a','b','c') |
Regular expressions and operators
Optimize for Partial Parsing¶
Most AMPS message types have the ability to partially parse messages. That is, rather than parsing the entire message, the message type can simply find the identifiers that will be used, and stop the parsing process as soon as those identifiers are found.
This optimization is most useful for larger messages. For example,
if the SOW key for a topic is based on the /id
field of a message and
there are active content filters that use both the /id
field and the
/code
field, while no other field is being indexed, then, considering
the message below:
{"id":24,"code":"A12347","notes":"entered on behalf of a sloth",
// ... 100K of other data ...
}
The AMPS parser can stop parsing after processing only the /id
and the
/code
fields. In this case, halting the parsing after processing these two
fields avoids the expense of parsing the remaining parts of the message.
Notice that this optimization will only improve performance in cases
where AMPS doesn’t need to parse the entire message. For example, if
there is a delta_subscribe
active for the topic, or if the
command being processed is a delta_publish
, AMPS will parse the
message completely to be able to calculate the deltas. Likewise, if
any filter refers to a field that doesn’t appear in the message,
AMPS will parse the message completely to be able to determine that
the field does not appear in the message.
SOW Queries and Indexing¶
Queries over topics in the State of the World (SOW) have additional performance considerations. AMPS maintains indexes over SOW topics to help locate messages in response to a query.
- Queries over a topic in the SOW can use SOW topic indexes. Where possible, use an exact string match and create a hash index to take advantage of hash indexes.
- When a query is submitted with an XPath identifier for which no index
exists, AMPS will create and populate a memo index for that XPath
identifier. This can add to the amount of time a query takes the
first time a given XPath identifier is queried. You can specify that
AMPS creates a memo index for a given identifier by using the
Index
configuration item in theTopic
definition. Once an index is created, AMPS will continue to search for that XPath identifier in incoming messages for that topic to keep the index up to date.
Notice that SOW topic indexes are only used for sow
commands and during
the sow
portion of a sow_and_subscribe
(or sow_and_delta_subscribe
)
command. Once the subscription to current updates begins, the subscription
does not use a SOW topic index because there is no need to locate
a message. During a subscription, filters are run against the current
message.
See SOW Indexing for details.