4. AMPS Expressions¶
AMPS includes an expression language that combines elements of XPath and
SQL-92’s WHERE
clause. This expression language is used whenever the
AMPS server refers to the contents of a message, including:
- Content filtering
- Constructing fields for message enrichment
- Creating projected fields for views
AMPS uses a common syntax for each of these purposes, and provides a common set of operators and functions. AMPS also provides special directives for message enrichment, and aggregation functions for projecting views.
For example, when an expression is used as a content filter, any message
for which the expression returns true
matches the content filter.
When an expression is used to construct a field for message enrichment
or view projection, the expression is evaluated and the result that the
expression returns is used as the content of the field.
Expressions Overview¶
The quickest way to learn AMPS expressions is to think of each as a combination of identifiers that tell AMPS where to find data in a message, and operators that tell AMPS what to do with that data. Each AMPS expression produces a value. The way AMPS uses that value depends on where the expression is used. For example, in a content filter, AMPS uses the value of the expression to determine whether a message matches the filter. When constructing a field, AMPS uses the value of the expression as the contents of the field.
Consider a simple example of an expression used as a filter. Imagine AMPS receives the following JSON message:
{"name":"Gyro", "job":"kitten"}
Using an AMPS expression, you can easily construct a content filter that matches the message:
/name = 'Gyro'
There are three parts to this expression. The first part, /name
, is
an identifier that tells AMPS to look for the contents of the name
field at the top level of the JSON document. The second part of the
filter, =
, is the equality operator, which tells AMPS to compare
the values on either side of the operator and return true
if the
values match. The final part of the filter, 'Gyro'
, is a string
literal for the equality operator to use in the comparison. When an
expression is used in a content filter, a message matches the filter
when the expression returns true
. The expression returns true
for the sample message, so the sample messages matches the filter.
The identifier syntax is a subset of XPath, as described in Identifiers. The comparison syntax is similar to SQL-92.
Expression Syntax¶
AMPS expressions are designed to work exactly as expected if you are familiar with XPath path specifiers and SQL-92 predicates. This section describes in detail how AMPS evaluates the syntax, operators, and functions available in the AMPS expression language.
AMPS expressions combine the following elements:
- Identifiers specify a field in a message. When evaluating an expression, AMPS replaces identifiers with values from the message or set of messages being evaluated.
- Literal values are explicit values in an AMPS expression, such as
'IBM'
or42
- Operators and functions such as
=
,<
,>
,*
, andUNIX_TIMESTAMP()
Every AMPS expression produces a value. The way that AMPS uses the value
depends on the context in which AMPS evaluates the expression. For
example, if the expression is used for a filter, the message is
considered to match the filter when the expression returns true
.
When an expression is used to project a field, the result of the
expression is used as the value of the projected field.
Identifiers¶
AMPS identifiers use a subset of XPath to specify values in a message. AMPS identifiers specify the value of an attribute or element in an XML message, and the value of a field in a JSON, FIX or NVFIX message. Because the identifier syntax is only used to specify values, the subset of XPath used by AMPS does not include wildcards, relative paths, array manipulation, predicates, or functions.
For example, when messages are in this XML format:
<Order update="full">
<ClientID>12345</ClientID>
<Symbol>IBM</Symbol>
<OrderQty>1000</OrderQty>
</Order>
The following identifier specifies the Symbol
element of an
Order
message:
/Order/Symbol
The following identifier specifies the update
attribute of an
Order
message:
/Order/@update
For FIX and NVIX, you specify fields using /
and the tag name. AMPS
interprets FIX and NVFIX messages as though they were an XML fragment
with no root element. For example, to specify the value of FIX tag
55
(symbol), use the following identifier:
/55
Likewise, for JSON or other types that represent an object, you navigate
through the object structure using the /
to indicate each level of
nesting.
AMPS only guarantees support for field identifiers that are valid step names
in XPath. For example, AMPS does not guarantee that it can process or
filter on a field named Fits&Starts
.
AMPS also supports an optional bracketed field identifier syntax that extends the characters available for field names. For example, the following step name:
[/Not Xpath Name]
refers to a field name of Not Xpath Name
at the root level of the message.
This syntax allows spaces to be used in field names in AMPS expressions, even
though this is not a valid step name in XPath. Notice that not all message
types support field names with embedded spaces or other special characters.
The Not Xpath Name
identifier is not a valid element name in XML, nor
would it be a valid field name in Google Protocol Buffers.
AMPS checks the syntax of identifiers when parsing an expression. AMPS
does not try to predict whether an identifier will match messages within
a particular topic. It is not an error to submit an identifier that can
never match due to the limitations of the message type. For example,
AMPS allows you to use an identifier like /OrderQty
with a FIX
topic, even though FIX messages only use numeric tags, or an identifier
like /DataPackage/RunDate
with a BFlat topic, even though BFlat does
not support nested elements.
The message type is responsible for constructing a set of identifiers
from a message. In most cases, the mapping is simple. However, see the
documentation for the message type for details, or if the mapping is
unclear. For example, a composite-local
message type adds the number
of the part to the beginning of each XPath within the part (so, a
top-level field of /name
in the first part of the message has an
identifier of /0/name
).
AMPS Data Types¶
Each value in AMPS is assigned a data type when the message type module
parses the value. AMPS operators and functions attempt to convert values
into compatible types, based on the type of operation. For example, the
*
operator (multiplication) will attempt to convert all values to
numeric values, while the CONCAT
function (string concatenation)
will attempt to convert all values to strings. In effect, a value in
AMPS can be treated as any type to which it can be meaningfully converted.
Internally, AMPS uses the following data types. As mentioned above, the message type module is responsible for assigning the type of a value from an incoming message as part of the parsing process. For some types, such as JSON, XML, FIX and NVFIX, the parser infers the type of the value from the field. For other types, such as BFLAT, Google Protocol Buffers, or BSON, the message itself contains information about the type of the field. Notice, however, that the AMPS expression language does not limit the value to the type assigned by the message type module. Instead, a value in AMPS can be used in any context.
For example, given the following JSON document:
{"a":1,"b":"47"}
The values of /a
and /b
can be used as either string values or
numeric values. AMPS will automatically convert these values as necessary,
and AMPS considers the string or numeric representation to
be equally correct and valid.
Type | Description | Untyped Message Examples |
---|---|---|
NULL | Unknown, untyped value (SQL-92 semantics) | [no field provided] NFVIX: JSON: XML: |
Boolean | True or false | JSON: {"e":true} |
Integer | signed 64-bit integer | NVFIX: JSON: XML: |
Floating point number | 64-bit floating point number | NVFIX: JSON: XML: |
String | Arbitrary sequence of bytes of a specific length | NVFIX:
JSON:
XML:
|
Table 4.1: AMPS data types
Numeric Types and Literals in AMPS Expressions¶
Numeric values in AMPS are always typed as either integers or floating point values. All numeric types in AMPS are signed. AMPS message types convert the original numeric types (or original representation for message types that do not have typed values) into the internal AMPS type system for the purposes of expression evaluation.
Within expressions, integer values are all numerals, with no decimal point, and can have a value in the same range as a 64-bit integer. For example:
42
149
-273
Within expressions, all numerals with a decimal point are floating-point numbers. AMPS interprets these numerals as double-precision floating point values. For example:
3.1415926535
98.6
-273.0
or, in scientific notation:
31.4e-1
6.022E23
2.998e8
AMPS automatically converts strings that contain numeric values to numbers when strings are used in a numeric comparison.
Type Promotion for Numeric Types¶
AMPS uses the following rules for type promotion when evaluating numeric expressions:
- If any of the values in the expression is
NaN
, the result isNaN
. - Otherwise, if any of the values in the expression is floating point, the result is floating point.
- Otherwise, all of the values in the expression are integers, and the result is an integer.
Notice that, for division in particular, the results returned are
affected by the type of the values. For example, the expression
1 / 5
evaluates to 0
since the result is interpreted as an
integer. In comparison, the expression 1.0 / 5
evaluates to 0.2
since the result is interpreted as a floating point value.
When a function or operator that expects a numeric type is provided with
a string, AMPS will attempt to convert string values to numeric types as
necessary. When converting string values, AMPS recognizes same numeric
formats in message data as are supported in the AMPS expression language
(see String Literals. If the
string is in an unrecognized format, AMPS converts the string as
NaN
.
String Literals in AMPS Expressions¶
When creating expressions for AMPS, string literals are indicated with single or double quotes. For example:
/FIXML/Order/Instrmt/@Sym = 'IBM'
AMPS supports the following escape sequences within string literals:
Escape Sequence | Definition |
---|---|
\a | Alert |
\b | Backspace |
\t | Horizontal tab |
\n | Newline |
\f | Form feed |
\r | Carriage return |
\xHH | Hexadecimal digit where H is (0..9,a..f,A..F) |
\OOO | Octal Digit (0..7) |
Table 4.2: Escape Sequences
Additionally, any character which follows a backslash will be treated as a literal character.
AMPS string operations have no restrictions on character set, and
correctly handle embedded NULL
characters (\x00
) and characters
outside of the 7-bit ASCII range. AMPS string operations are not
unicode-aware.
NULL, NaN and IS NULL¶
XPath expressions are considered to be NULL
when they evaluate to an
empty or nonexistent field reference. In numeric expressions where the
operands or results are not a valid number, the XPath expression
evaluates to NaN
(not a number). The rules for applying the AND
and OR
operators against NULL
and NaN
values are outlined in
Table 4.3 and
Table 4.4.
Operand1 | Operand2 | Result |
---|---|---|
TRUE | NULL | NULL |
FALSE | NULL | FALSE |
NULL | NULL | NULL |
Table 4.3: Logical AND with NULL/NaN Values
Operand1 | Operand2 | Result |
---|---|---|
TRUE | NULL | TRUE |
FALSE | NULL | NULL |
NULL | NULL | NULL |
Table 4.4: Logical OR with NULL/NaN Values
The NOT
operator applied to a NULL
value is also NULL
, or
“Unknown.” The only way to check for the existence of a NULL
value
reliably is to use the IS NULL
predicate. There also exists an
IS NAN
predicate for checking that a value is NaN
(not a
number.)
To reliably check for existence of a NULL value, you must
use the IS NULL predicate such as the filter:
/Order/Comment IS NULL |
AMPS also provides a COALESCE()
function that accepts a set
of values and returns the first value that is not NULL. For example,
given the following filter expression:
COALESCE(/userCategory,
/employeeCategory,
/vendorCategory,
'restricted') != 'restricted'
AMPS will return the first value that is not NULL
, and compare that
value to the constant string 'restricted'
. Notice that, to make the
intent of the filter clear, this example provides a constant value for
AMPS to return from the COALESCE
if all of the field values are
NULL
.
The COALESCE
function, like other functions in AMPS, is not array-aware.
This means that when one of the XPath expressions provided to COALESCE
specifies an array in the original message, AMPS provides the first item in
the array to the COALESCE
function. See Working With Arrays for details.
Grouping and Order of Evaluation¶
AMPS expressions allow you to group parts of the expression using parentheses. Parts of an expression inside parentheses are evaluated together. 60East recommends using parentheses to group independent parts of an expression to ensure that expression is evaluated in the expected order. For example, in this expression:
( /counter % 3 ) == 0
The clause /counter % 3
is evaluated first, and the result of that
evaluation is compared to 0
.
Within a group, elements are evaluated left to right in precedence order. For example, given the filter below:
(expression1 OR expression2 AND expression3) OR (expression4 AND
NOT expression5) ...
AMPS evalutes expression2
, then expression3
(since AND
has
higher precedence than OR
), and if they evaluate to false, then
expression1
will be evaluated.
AMPS does not guarantee that all parts of an expression will be evaluated if the result of an expression can be determined after only evaluating part of the expression. For example, given the expression:
A_FUNCTION(/a) OR B_FUNCTION(/b)
AMPS only guarantees that B_FUNCTION(/b)
will be evaluated if
A_FUNCTION(/a)
returns false
.
Logical Operators¶
The logical operators are NOT
, AND
, and OR
, in order of
precedence. These operators have the usual Boolean logic semantics.
/FIXML/Order/Instrmt/@Sym = 'IBM' OR /FIXML/Order/Instrmt/@Sym = 'MSFT'
As with other operators, you can use parentheses to group operators and affect the order of evaluation
(/orderType = 'rush' AND /customerType IN ('silver', 'gold') )
OR /customerType = 'platinum'
Arithmetic Operators¶
AMPS supports the arithmetic operators +
, -
, *
, /
,
%
, and MOD
in expressions. The result of arithmetic operators
where one of the operands is NULL
is undefined and evaluates to
NULL
.
AMPS distinguishes between floating point and integral types. When an arithmetic operator uses two different types, AMPS will convert the integral type to a floating point value as described in Numeric Types and Literals.
Examples of filter expressions using arithmetic operators:
/6 * /14 < 1000
/Order/@Qty * /Order/@Prc >= 1000000
AMPS numeric types are signed, and the AMPS arithmetic operators
correctly handle negative numbers. The MOD
and %
operators
preserve the sign of the first argument to the operator. That is,
-5 % 3
produces a result of -2
, while 5 % -3
produces a
result of 2
.
When using mathematical operators in conjunction with filters,
be careful about the placement of the operator. Some operators
are used in the XPath expression as well as for mathematical
operation (for example, the '/' operator in division).
Therefore, it is important to separate mathematical operators
with white space, to prevent interpretation as an XPath
expression. |
Comparison Operators¶
The comparison operators can be loosely grouped into equality
comparisons and range comparisons. The basic equality comparison
operators, in precedence order, are ==
, =
, >
, >=
, <
,
<=
, !=
, and <>
. The ==
comparison and the =
comparison
are treated as the same operator and produce the same results.
If these binary operators are applied to two operands of different
types, AMPS attempts to convert strings to numbers. If conversion
succeeds, AMPS uses the numeric values. If conversion fails because the
string cannot be meaningfully converted to a number, strings are always
considered to be greater than numbers. The operators consider an empty
string to be NULL
.
The following table shows some examples of how AMPS compares different types.
Expression | Result |
---|---|
1 < 2 |
TRUE |
10 < '2' |
FALSE, ‘2’ can be converted to a number |
'2.000' <> '2.0' |
TRUE, no conversion to numbers since both are strings |
2 = 2.0 |
TRUE, numeric comparison |
10 < 'Crank It Up' |
TRUE, strings are greater than numbers |
10 < '' |
FALSE, an empty string is considered to be NULL |
10 > '' |
FALSE, an empty string is considered to be NULL |
'' = '' |
FALSE, an empty string is considered to be NULL |
'' IS NULL |
TRUE, an empty string is considered to be NULL |
Table 4.5: Comparison Operator Examples
There are also set and range comparison operators. The BETWEEN operator can be used to check the range values.
The range used in the BETWEEN operator is inclusive of both
operands, meaning the expression /A BETWEEN 0 AND 100 is
equivalent to /A >= 0 AND /A <= 100 |
For example:
/FIXML/Order/OrdQty/@Qty BETWEEN 0 AND 10000
/FIXML/Order/@Px NOT BETWEEN 90.0 AND 90.5
(/price * /qty) BETWEEN 0 AND 100000
The IN
operator can be used to perform membership operations on sets
of values. The IN
operator returns true when the value on the left
of the IN appears in the set of values in the IN
clause. For
example:
/Trade/OwnerID NOT IN ('JMB', 'BLH', 'CJB')
/21964 IN (/14*5, /6*/14, 1000, 2000)
/customer IN ('Bob', 'Phil', 'Brent')
The IN operator returns true for the set of records that would be
returned by an equivalent set of =
comparisons joined by OR
. The
following two statements return the same set of records:
/pet IN ('puppy', 'kitten', 'goldfish')
(/pet = 'puppy') OR (/pet = 'kitten') OR (/pet = 'goldfish)
This equivalence means that NULL
values in either the field being
evaluated, or the set of values provided to the IN clause, always return
false.
This also means that, for string values, the IN operator performs exact, case-sensitive matching.
When evaluating against a set of values, the IN operator
typically provides better performance than using a set of OR
operators. That is, a filter written as
/firstName IN ('Joe', 'Kathleen', 'Frank', 'Cindy', 'Mortimer')
will typically perform better than an equivalent filter written
as
/firstName = 'Joe' OR /firstName = 'Kathleen' OR /firstName =
'Frank' OR /firstName = 'Cindy' OR /firstName = 'Mortimer' . |
String Comparison Functions¶
AMPS includes several types of string comparison operators.
- Case-sensitive exact matches. The
IN
,=
,BEGINS WITH
,ENDS WITH
, andINSTR
operators do literal matching on the contents of a string. These operators are case-sensitive. - Case-insensitive exact matches. AMPS also provides two
case-insensitive operators:
INSTR_I
, a case-insensitive version ofINSTR
, and a case-insensitive equality operator,STREQUAL_I
. - Regular expression matches. AMPS also provides full regular
expression matching using the
LIKE
operator, described in regex matching and Chapter 5.
The =
operator tests whether a field exactly matches the literal
string provided.
/status = 'available'
/orderId = 'F327AC'
BEGINS WITH
and ENDS WITH
test whether a field begins or ends
with the literal string provided. The operators return TRUE
or
FALSE
:
/Department BEGINS WITH ('Engineering')
/path NOT BEGINS WITH ('/public/dropbox')
/filename ENDS WITH ('txt')
/price NOT ENDS WITH ('99')
AMPS allows you to use set comparisons with BEGINS WITH
and
ENDS WITH
. In this case, the filter matches if the string in the
field BEGINS WITH
or ENDS WITH
any of the strings in the set:
/Department BEGINS WITH ('Engineering', 'Research', 'Technical')
/filename ENDS WITH ('gif', 'png', 'jpg')
The INSTR
operator allows you to check to see if one string occurs
within another string. For this operator, you provide two string values.
If the second string occurs within the first string, INSTR
returns
the position at which the second string starts, or 0 if the second
string does not occur within the first string. Notice that the first
character of the string is 1 (not 0). For example, the expression below
tests whether the string critical
occurs within the /eventLevels
field.
INSTR(/eventLevels, "critical") != 0
AMPS also provides INSTR_I
and STREQUAL_I
functions for performing
case-insensitive comparisons:
STREQUAL_I(/couponCode, 'QED') == 1
INSTR_I(/symbolList, 'MSFT') != 0
Function or Operator | Parameters | Description |
---|---|---|
= |
The string to be compared The string to compare |
Case-sensitive Returns true if the string to be compared is identical to the string to compare. /state = 'Ohio'
|
BEGINS WITH |
The string to be compared A list of strings to compare |
Case-sensitive Returns true if the string to be compared begins with any of the strings in the list. /state
BEGINS WITH ('North', 'South')
|
ENDS WITH |
The string to be compared A list of strings to compare |
Case-sensitive Returns true if the string to be compared ends with any of the strings in the list. /state
ENDS WITH ('Dakota', 'Carolina')
|
INSTR |
The string to be compared The string to compare |
Case-sensitive Returns the position at which the second string starts, or 0 if the second string does not occur within the first string. INSTR(/state, 'i') != 0
|
INSTR_I |
The string to be compared The string to compare |
Case-insensitive Returns the position at which the second string starts, or 0 if the second string does not occur within the first string. This function is not unicode-aware. INSTR_I(/state, 'i') != 0
|
STREQUAL_I |
The string to be compared The string to compare |
Case-insensitive Returns true if, when both strings are transformed to the same case, the string to be compared is identical to the string to compare. This function is not unicode-aware. STREQUAL_I(/state, 'oHIO') != 0
|
Table 4.6: AMPS string comparison
Regular Expression Matching¶
AMPS also provides a regular expression comparison operator, LIKE
,
to provide regular expression matching on string values. A pattern is
used for the right side of the LIKE
operator. A pattern must be
provided as a literal, quoted value. For more on regular expressions and
the LIKE
comparison operator, please see
Chapter 4.
The string comparison operators described in the section called
String Comparison Functions are usually more
efficient than equivalent LIKE
expressions, particularly when used
to compare multiple literal patterns, or when the only purpose of the
regular expression is to perform case-insensitive matching. Use LIKE
operations when it is not practical to represent the filter condition
with the string comparison operators.
Function or Operator | Parameters | Description |
---|---|---|
LIKE |
The string to be compared The pattern to evaluate the string against |
Case-sensitive Returns true if the string to be compared matches the pattern. For example, the
following filter uses a
PCRE backreference to
return true for any
message where the
/state LIKE '(.)\1'
This operator is not unicode-aware. |
Table 4.7: AMPS regular expression comparison
Conditional Operators¶
AMPS contains support for a ternary conditional IF
operator which
allows for a Boolean condition to be evaluated to true
or false
,
and will return one of the two parameters. The general format of the
IF
statement is
IF (BOOLEAN_CONDITIONAL, VALUE_TRUE, VALUE_FALSE)
In this example, the BOOLEAN_CONDITIONAL
will be evaluated, and if
the result is true, the VALUE_TRUE
value will be returned otherwise
the VALUE_FALSE
will be returned.
For example:
SUM( IF(( (/FIXML/Order/OrdQty/@Qty > 500) AND
(/FIXML/Order/Instrmt/@Sym ='MSFT')), 1, 0 ))
The above example returns a count of the total number of orders that have been placed where the symbol is MSFT and the order contains a quantity more than 500.
The IF
can also be used to evaluate results to determine if results
are NULL
or NaN
. This is useful for calculating aggregates where
some values may be NULL
or NaN
. The NULL and NaN values are
discussed in more detail in the section called
“NULL, NaN, and IS NULL”.
For example:
SUM(/FIXML/Order/Instrmt/@Qty * IF(
/FIXML/Order/Instmt/@Price IS NOT NULL, 1, 0))
Working With Arrays¶
AMPS supports filters that operate on arrays in messages. There are two simple principles behind how AMPS treats arrays.
- Binary operators that yield
true
orfalse
(for example,=
,<
,LIKE
) are array aware, as is theIN
operator. These operators work on arrays as a whole, and evaluate every element in the array. - Arithmetic operators, functions, user-defined functions and other scalar operators, are not array aware, and use the first element in the array.
With these simple principles, you can predict how AMPS will
evaluate an expression that uses an array. For any operator, an empty
array evaluates to NULL
.
Let’s look at some examples. For the purposes of this section, we will consider the following JSON document:
{
"data" : [1, 2, 3, "zebra", 5],
"other" : [14, 34, 23, 5]
}
While these arrays are presented using JSON format for simplicity, the same principles apply to arrays in other message formats.
Here are some examples of ways to use an array in an AMPS filter:
Determining if any element in an array meets a criteria. To determine this, you provide the identifier for the array, and use a comparison operator.
Filter Evaluates as /data = 1
TRUE, /data
contains1
/data = 'zebra'
TRUE, /data
contains'zebra'
/data != 'zebra'
TRUE, /data
contains an element that is not'zebra'
/data = 42
FALSE, /data
does not contain42
/data LIKE 'z'
TRUE, a member of /data
matches'z'
/other > 30
TRUE, a member of /other
is> 30
/other > 50
FALSE, no member of /other
is> 50
Table 4.8: Array contains element
Determine whether a specific value is at a specific position. To determine this, use the subscript operator
[]
on the XPath identifier to specify the position, and use the equality operator to check the value at that position.Filter Evaluates as /data[0] = 1
TRUE, first element of /data
is1
/data[3] = "zebra"
TRUE, fourth element of /data
is'zebra'
/data[1] != 1
TRUE, second element of /data
is not1
/other[1] LIKE '4'
TRUE, second element of /other
matches'4'
Table 4.9: Element at specific position
Determine whether any value in one array is present in another array.
Filter Evaluates as /data = /other
TRUE, a value in /data
equals a value in/other
/data != /other
TRUE, a value in /data
does not equal a value in/other
Table 4.10: Identical elements
Determine whether an array contains one of a set of values.
Filter Evaluates as 3 IN (/data)
TRUE, 3
is a member of/data
/data IN (1, 2, 3)
TRUE, a member of /data
is in(1, 2, 3)
/data IN ("zebra", "antelope", "li on")
TRUE, a member of /data
is in("zebra", "antelope", "lion")
Table 4.11: Set of values in an array
Concatenating Strings¶
AMPS provides a function, CONCAT
that can be used for
constructing strings. The CONCAT
function takes any number of
parameters and returns a string constructed from those parameters. The
function can accept both XPath identifiers and literal values.
The CONCAT
function can be used in any AMPS expression that uses a
string. For example, you could CONCAT in a filter as follows:
CONCAT(/firstName, " ", /lastName) = 'George Orwell'
CONCAT
can be combined with other expressions, including conditional
expressions. A mailingAddressName
field in a view could be
constructed as follows:
<Field>CONCAT(/firstName, " ", /lastName,
IF(/suffix NOT NULL, CONCAT(", ", /suffix), "") )
AS /mailingAddressName</Field>
Managing String Case¶
AMPS provides the UPPER
and LOWER
functions to produce a string
in a specific case. This can be useful when constructing fields, or when
an expression needs case-insensitive comparisons against a group of
values using the IN
clause.
As described above in
String Comparison Functions, AMPS provides INSTR_I
and
STREQUAL_I
functions for performing case-insensitive
comparisons. In some cases, particularly when using strings with the IN
clause, it is more efficient to simply convert the string to a known
case.
The UPPER
and LOWER
functions are not unicode-aware; these
functions will not produce the correct data when used with multibyte
characters. For example, you might compare an incoming field of unknown
case to a set of known values as follows:
UPPER(/ticker) IN ('MSFT', 'IBM', 'RHAT', 'DIS')
Function | Parameters | Description |
---|---|---|
UPPER |
The string to transform | Returns the input string, transformed to upper case. This function is not unicode aware. |
LOWER |
The string to transform | Returns the input string, transformed to lower case. This function is not unicode aware. |
Table 4.12: AMPS string manipulation functions
Replacing Text in Strings¶
AMPS provides a pair of functions, REPLACE
and REGEXP_REPLACE
, that replace text within strings.
Function | Parameters | Description |
---|---|---|
REPLACE |
string to transform, string to match, replacement text | Returns the input string, with all occurrences of the string to match replaced with the replacement text |
REGEXP_REPLACE |
string to transform, pattern to match, replacement text | Returns the input string, with all occurrences of the pattern to match replaced with the replacement text |
Table 4.13: AMPS string replacement functions
Working With Substrings¶
AMPS provides a function, SUBSTR
, that can be used
for returning a subset of a string. There are two forms of this
function.
The first form takes the source string and the position at which to
begin the substring. You can use a negative number to count backward
from the end of the string. AMPS starts at the position specified, and
returns a string that starts at the specified position and goes to the
end of the string. If the provided position is before the beginning of
the string, AMPS starts at the beginning of the string. If the provided
position is past the end of the string, AMPS returns a zero-length
string, which evaluates to NULL
. If the provided position is before
the beginning of the string, AMPS returns the full string.
For example, the following expressions are all TRUE
:
SUBSTR("fandango", 4) == "dango"
SUBSTR("fandango", 1) == "fandango"
SUBSTR("fandango", -2) == "go"
SUBSTR("fandango", -99) == "fandango"
SUBSTR("fandango", 99) IS NULL
The second form of SUBSTR
takes the source string, the position at
which to begin the substring, and the length of the substring. Notice
that SUBSTR
considers the first character in the string to be
position 1
(rather than position 0
), as demonstrated below. AMPS
will not return a string larger than the source string. As with the
two-argument form, if the starting position is before the beginning of
the string, AMPS starts at the beginning of the string. If the starting
position is after the end of the source string, AMPS returns an empty
string which evaluates to NULL
.
For example, the following expressions are all TRUE
:
SUBSTR("fandango", 1, 3) == "fan"
SUBSTR("fandango", -4, 2) == "an"
SUBSTR("fandango", -8, 8) == "fandango"
SUBSTR("fandango", -23, 3) == "fan"
SUBSTR("fandango", 99, 8) IS NULL
Timestamp Function¶
AMPS includes a function that returns the current Unix timestamp. Notice that AMPS also includes functions for working with date and time in the Legacy Messaging Compatibility layer.
Function | Parameters | Description |
---|---|---|
UNIX_TIMESTAMP |
none | Returns the current timestamp as a double. Notice that a UNIX timestamp is seconds elapsed since 00:00 on January 1, 1970 in UTC and is independent of the timezone of the local system. |
Table 4.14: AMPS Timestamp functions
Geospatial Functions¶
AMPS includes a function for calculating the distance from a signed latitude and longitude.
Function | Parameters | Description |
---|---|---|
GEO_DISTANCE |
first_latitude, first_longitude, Seccond_latitude, second_longitude | Returns a double that contains the distance between the point identified by first_latitude, first_longitude and second_latitude, second_longitude in meters. For example, given a
home point and a message
containing GEO_DISTANCE(
/lat,
/long,
40.786337,
-119.206508)
AMPS uses the haversine formula when computing distances. |
Table 4.15: AMPS Geospatial Functions
Numeric Functions¶
AMPS includes the following functions for working with numbers.
Function | Parameters | Description |
---|---|---|
ABS |
number | Returns the absolute value of a number. For example, the
following filter will be
TRUE when the difference
between ABS(/a - /b) > 5
|
ROUND |
number, [number of decimal places] | Returns a number rounded to the specified number of decimal places. The number of decimal places is optional. When not provided, the number defaults to 0. The number of decimal places can be positive or negative. When the number is positive, the number specifies the number of digits to the right of the decimal place to round at. When the number is negative, the number specifies the number of digits to the left of the decimal place to round at. For example, you could
use the following
expression in a view to
limit the precision of
the ROUND(/price, 2) AS
/price
|
Table 4.16: AMPS Numeric Functions
CRC32 Function¶
AMPS includes a function that computes a CRC32 checksum over a string. This function is useful for creating a numeric identifier from a string representation. This is commonly-used to create a shortened representation of the string, or to provide input for a MOD calculation.
Function | Parameters | Description |
---|---|---|
CRC32 |
_string_ | Returns an integer calculated as a checksum of the provided string. |
Table 4.17: AMPS CRC function
Constructing Fields¶
For views, aggregated subscriptions, and SOW topic enrichment, AMPS allows you to construct new fields based on existing data.
When you construct a field, there are two components required:
- A source expression that produces a value. This expression can include XPath identifiers that extract values from a message, literal values, operators, and functions.
- A destination identifier that specifies the identifier where the message type will serialize the value produced by the source expression.
The source expressions and the destination identifier are separated by
the AS
keyword. The format for a field construction expression is as
follows:
<source expression>
AS
<destination identifier>
For example, to create a field in a view that calculates the total value
of an order by multiplying the /price
field times the /quantity
field, construct the field as shown below:
<Field>/price * /qty AS /total</Field>
This constructs a field using /price * /qty
to as the source
expression. Both /price
and /qty
are taken from the incoming
message. When the result of this expression is computed, the value will
be produced with the XPath identifier /total
as the destination.
That value will then be serialized to a message (with the exact format
and syntax determined by the message type).
Notice that the grammar for constructing fields does not specify precisely how the field is represented in the message. AMPS constructs the value and provides the XPath identifier to the message type. The message type itself is responsible for serializing the value into the correct representation and structure for that message type.
All of the AMPS operators and functions that are available for filters are available to use in source expressions, including any user-defined functions loaded into the instance.
Depending on the context for field construction, there are additional capabilities available when constructing fields, as described in the following sections.
Constructing Preprocessing Fields¶
Preprocessing field constructors operate on a single message and
construct fields based on that message. The results of the preprocessing
field constructor are merged into the incoming message. Any field in the
source message that is not changed or removed during preprocessing is
left unchanged, so it is not necessary to include all fields in the
message in the Preprocessing
block.
Because preprocessing fields apply to a specific message, preprocessing fields cannot specify the topic or message type in an XPath identifier. All identifiers in the source expression are evaluated as identifiers in the message being preprocessed. Preprocessing fields are evaluated during the preprocessing phase, so they cannot refer to the previous state of a message.
Using HINT to Control Field Construction¶
Preprocessing can be used to remove fields from a message. By default,
AMPS serializes any field that has an empty string or NULL
value
after preprocessing. Preprocessing fields can include a directive that
specifies that a field that contains a NULL
value should be removed
from the set of fields rather than serialized with a NULL
value. The
directive HINT OPTIONAL
applied to the XPath identifier specifies
that if the result of the source expression is NULL
, AMPS does not
provide the value for the message type to serialize. For example, the
following field constructor removes the /source
field from the
message if the value provided is not in a specific list of values:
<Field>IF(/source IN ('a','e','f'), /source, NULL)
AS /source HINT OPTIONAL</Field>
By default, AMPS considers the results of field construction (the
processed message) to be distinct from the current message. AMPS
rewrites the current message after preprocessing is completed. This
means that, by default, the results of fields constructed during
preprocessing are not available to other fields within preprocessing.
The HINT SET_CURRENT
option immediately inserts or updates values in
the current message, which makes the new value available to all
subsequent Field
declarations.
In the sample below, AMPS enriches the message by performing an expensive operation (implemented as a user-defined function) on two input fields, and immediately updates the current message with the output of that operation. AMPS then sets other fields in the processed message using the updated value in the current message.
<Field>EXPENSIVE_UDF_CALL(/dataSet1, /dataSet2)
AS /processedData HINT SET_CURRENT</Field>
<Field>IF(/processedData > 1000000,
'A',
'B') AS /resultClass</Field>
Notice that using HINT SET_CURRENT
requires AMPS to process
Field
declarations in order, which may prevent future optimizations.
Hints can be combined as follows:
<Field>EXPENSIVE_UDF_CALL(/dataSet1, /dataSet2)
AS /processedData HINT SET_CURRENT,OPTIONAL
</Field>
In this case, if the projected field would be NULL
, the field is
removed from the current message.
Constructing Enrichment Fields¶
Enrichment field constructors operate on a single message and construct
fields based on that message. Enrichment expressions operate on the
current message and change the current message. The results of the
enrichment directives are merged into the incoming message. Any field in
the source message that is not changed or removed during preprocessing
is left unchanged, so it is not necessary to include all fields in the
message in the Enrichment
directive.
Because enrichment fields apply to a specific message, enrichment fields cannot specify the topic or message type in an XPath identifier. All identifiers in the source expression are evaluated as indentifiers in the message being enriched.
Enrichment fields are constructed during the enrichment phase, so enrichment fields can refer to the previous state of a message. Within an enrichment expression, AMPS provides two special modifiers for XPath identifiers that specify whether an XPath identifier refers to the current incoming message or the previous state of the message. These modifiers apply only to the source expression, and cannot be used in the destination identifier. The modifiers are as follows:
Modifier | Description |
---|---|
OF CURRENT |
Specify that the XPath identifier refers to the incoming message. |
OF PREVIOUS |
Specify that the XPath identifier refers to the
previous state of the message in the SOW. If there
is no record in the SOW for this message, all
identifiers that specify OF PREVIOUS return
NULL . |
Table 4.18: XPath Identifier Modifiers for Enrichment
Using HINT to Control Field Construction¶
Enrichment can be used to remove fields from a message. By default, AMPS
serializes any field that has an empty string or NULL
value after
enrichment. Enrichment Field
elements can include a directive that
specifies that a field that contains a NULL
value should be removed
from the message rather than serialized with a NULL
value. The
directive HINT OPTIONAL
applied to the XPath identifier specifies
that if the result of the source expression is NULL
, AMPS does not
provide the value for the message type to serialize. For example, the
following field constructor removes the /source
field from the
message if the value provided is not in a specific list of values:
<Field>IF(/source IN ('a','e','f'), /source, NULL)
AS /source HINT OPTIONAL</Field>
By default, AMPS considers the results of field construction (the
enriched message) to be distinct from the current message. AMPS rewrites
the current message after enrichment is completed. This means that, by
default, the results of fields constructed during enrichment are not
available to other fields within enrichment. The HINT SET_CURRENT
option immediately inserts or updates values in the current message,
which makes the new value available to all subsequent Field
declarations.
In the sample below, AMPS enriches the message by performing an expensive operation (implemented as a user-defined function) on two input fields, and immediately updates the current message with the output of that operation. AMPS then sets other fields in the processed message using the updated value in the current message.
<Field>EXPENSIVE_UDF_CALL(/dataSet1, /dataSet2)
AS /processedData HINT SET_CURRENT</Field>
<Field>IF(/processedData > 1000000,
'A',
'B') AS /resultClass</Field>
Notice that using HINT SET_CURRENT
requires AMPS to process
Field
declarations in order, which may prevent future optimizations.
Hints can be combined as follows:
<Field>EXPENSIVE_UDF_CALL(/dataSet1, /dataSet2)
AS /processedData HINT SET_CURRENT,OPTIONAL
</Field>
In this case, if the projected field would be NULL
, the field is
removed from the current message.
Constructing View Fields¶
View field constructors operate over groups of messages, and construct a
single output message for each distinct group, as specified by the
Grouping
element in the View
configuration.
When constructing a field in a view, all identifiers used in the source
expression must be in one of the underlying topics for the view. When
the view uses a Join
, the identifiers must include the topic
identifier. If the topics in the Join are of different message types,
the identifiers must include both the message type and the topic
identifier.
For example, the following Field
definition multiplies the
/quantity
from the NVFIX topic orders
by the /price
from the
JSON topic items
, and projects the result into the /total
field
of the view.
<Field>[nvfix].[orders]./quantity * [json].[items]./price AS /total</Field>
Aggregate Functions¶
AMPS provides a set of aggregation functions that can be used in a
Field
constructor. These functions return a single value for each
distinct group of messages, as identified by distinct combinations of
values in the Grouping
clause.
Function | Description |
---|---|
AVG |
Average over an expression. Returns the mean value of the values specified by the expression. |
COUNT |
Count of values in an expression. Returns the number of values specified by the expression. |
COUNT_DISTINCT |
Count of number of distinct values in an expression, ignoring NULL .
Returns the number of distinct values in the expression. AMPS type conversion
rules apply when determining distinct values. |
MIN |
Minimum value. Returns the minimum out of the values specified by the expression. |
MAX |
Maximum value. Returns the maximum out of the values specified by the expression. |
STDDEV_POP |
Population standard deviation of an expression. Returns the calculated standard deviation. |
STDDEV_SAMP |
Sample standard deviation of an expression. Returns the calculated standard deviation. |
SUM |
Summation over an expression. Returns the total value of the values specified by the expression. |
Table 4.19: AMPS Aggregation Functions
Null values are not included in aggregate expressions with AMPS, nor in ANSI SQL. COUNT will count only non-null values; SUM will add only non-null values; AVG will average only non-null values; and MIN and MAX ignore NULL values, and so on.
MIN and MAX can operate on either numbers or strings, or a combination of the two. AMPS compares values using the principles described for comparison operators. For MIN and MAX, AMPS determines order based on these rules:
- Numbers sort in numeric order.
- String values sort in ASCII order.
- When comparing a number to a string, convert the string to a number, and use a numeric comparison. If that is not successful, the value of the string is higher than the value of the number.
For example, given a field that has the following values across a set of messages:
24, 020, 'cat', 75, 1.3, 200, '75', '42'
MIN will return 1.3
, MAX will return 'cat'
. Notice that
different message types may have different support for converting
strings to numeric values: AMPS relies on the parsing done by the
message type to determine the numeric value of a string.