4. AMPS Expressions

AMPS includes an expression language that combines elements of XPath and SQL-92’s WHERE clause. This expression language is used whenever the AMPS server refers to the contents of a message, including:

  • Content filtering
  • Constructing fields for message enrichment
  • Creating projected fields for views

AMPS uses a common syntax for each of these purposes, and provides a common set of operators and functions. AMPS also provides special directives for message enrichment, and aggregation functions for projecting views.

For example, when an expression is used as a content filter, any message for which the expression returns true matches the content filter. When an expression is used to construct a field for message enrichment or view projection, the expression is evaluated and the result that the expression returns is used as the content of the field.

Expressions Overview

The quickest way to learn AMPS expressions is to think of each as a combination of identifiers that tell AMPS where to find data in a message, and operators that tell AMPS what to do with that data. Each AMPS expression produces a value. The way AMPS uses that value depends on where the expression is used. For example, in a content filter, AMPS uses the value of the expression to determine whether a message matches the filter. When constructing a field, AMPS uses the value of the expression as the contents of the field.

Consider a simple example of an expression used as a filter. Imagine AMPS receives the following JSON message:

{"name":"Gyro", "job":"kitten"}

Using an AMPS expression, you can easily construct a content filter that matches the message:

/name = 'Gyro'

There are three parts to this expression. The first part, /name, is an identifier that tells AMPS to look for the contents of the name field at the top level of the JSON document. The second part of the filter, =, is the equality operator, which tells AMPS to compare the values on either side of the operator and return true if the values match. The final part of the filter, 'Gyro', is a string literal for the equality operator to use in the comparison. When an expression is used in a content filter, a message matches the filter when the expression returns true. The expression returns true for the sample message, so the sample messages matches the filter.

The identifier syntax is a subset of XPath, as described in Identifiers. The comparison syntax is similar to SQL-92.

Expression Syntax

AMPS expressions are designed to work exactly as expected if you are familiar with XPath path specifiers and SQL-92 predicates. This section describes in detail how AMPS evaluates the syntax, operators, and functions available in the AMPS expression language.

AMPS expressions combine the following elements:

  • Identifiers specify a field in a message. When evaluating an expression, AMPS replaces identifiers with values from the message or set of messages being evaluated.
  • Literal values are explicit values in an AMPS expression, such as 'IBM' or 42
  • Operators and functions such as =, <, >, *, and UNIX_TIMESTAMP()

Every AMPS expression produces a value. The way that AMPS uses the value depends on the context in which AMPS evaluates the expression. For example, if the expression is used for a filter, the message is considered to match the filter when the expression returns true. When an expression is used to project a field, the result of the expression is used as the value of the projected field.

Identifiers

AMPS identifiers use a subset of XPath to specify values in a message. AMPS identifiers specify the value of an attribute or element in an XML message, and the value of a field in a JSON, FIX or NVFIX message. Because the identifier syntax is only used to specify values, the subset of XPath used by AMPS does not include wildcards, relative paths, array manipulation, predicates, or functions.

For example, when messages are in this XML format:

<Order update="full">
    <ClientID>12345</ClientID>
    <Symbol>IBM</Symbol>
    <OrderQty>1000</OrderQty>
</Order>

The following identifier specifies the Symbol element of an Order message:

/Order/Symbol

The following identifier specifies the update attribute of an Order message:

/Order/@update

For FIX and NVIX, you specify fields using / and the tag name. AMPS interprets FIX and NVFIX messages as though they were an XML fragment with no root element. For example, to specify the value of FIX tag 55 (symbol), use the following identifier:

/55

Likewise, for JSON or other types that represent an object, you navigate through the object structure using the / to indicate each level of nesting.

AMPS only guarantees support for field identifiers that are valid step names in XPath. For example, AMPS does not guarantee that it can process or filter on a field named Fits&Starts.

AMPS also supports an optional bracketed field identifier syntax that extends the characters available for field names. For example, the following step name:

[/Not Xpath Name]

refers to a field name of Not Xpath Name at the root level of the message. This syntax allows spaces to be used in field names in AMPS expressions, even though this is not a valid step name in XPath. Notice that not all message types support field names with embedded spaces or other special characters. For example, the Not Xpath Name identifier is not a valid element name in XML, nor would it be a valid field name in Google Protocol Buffers.

AMPS checks the syntax of identifiers when parsing an expression. AMPS does not try to predict whether an identifier will match messages within a particular topic. It is not an error to submit an identifier that can never match due to the limitations of the message type. For example, AMPS allows you to use an identifier like /OrderQty in a filter submitted for a FIX connection, even though FIX messages only use numeric tags, or an identifier like /DataPackage/RunDate in a filter submitted for a BFlat connection, even though BFlat does not support nested elements.

The message type is responsible for constructing a set of identifiers from a message. In most cases, the mapping is simple. However, see the documentation for the message type for details, or if the mapping is unclear. For example, a composite-local message type adds the number of the part to the beginning of each XPath within the part (so, a top-level field of /name in the first part of the message has an identifier of /0/name).

AMPS Data Types

Each value in AMPS is assigned a data type when the message type module parses the value. AMPS operators and functions attempt to convert values into compatible types, based on the type of operation. For example, the * operator (multiplication) will attempt to convert all values to numeric values, while the CONCAT function (string concatenation) will attempt to convert all values to strings. In effect, a value in AMPS can be transparently treated as any type to which it can be meaningfully converted.

Internally, AMPS uses the following data types. As mentioned above, the message type module is responsible for assigning the type of a value from an incoming message as part of the parsing process. For some types, such as JSON, XML, FIX and NVFIX, the parser infers the type of the value from the field. For other types, such as MessagePack, BFLAT, Google Protocol Buffers, or BSON, the message itself contains information about the type of the field.

As mentioned above, the AMPS expression language does not limit the value to the type assigned by the message type module. Instead, a value in AMPS can be used in any context.

For example, given the following JSON document:

{"a":1,"b":"47"}

The values of /a and /b can be used as either string values or numeric values. AMPS will automatically convert these values as necessary, and AMPS considers the string or numeric representation to be equally correct and valid.

The following table lists the data types in the AMPS expression language:

Type Description Untyped Message Examples
NULL Unknown, untyped value (SQL-92 semantics)

[no field provided]

NFVIX: a=<SOH>

JSON: {"a":null}

XML: <a/>

Boolean True or false JSON: {"e":true}
Integer signed 64-bit integer or unsigned 64-bit integer for values > LONG_MAX

NVFIX: b=24

JSON: {"b":24}

XML: <b>24</b>

Floating point number 64-bit floating point number

NVFIX: c=24.0

JSON: {"c":24.0}

XML: <c>24.0</c>

String Arbitrary sequence of bytes of a specific length

NVFIX: d=Grilled cheese sandwich<SOH>

JSON: {"d":"Grilled cheese sandwich"}

XML: <d>Grilled cheese sandwich</d>

Table 4.1: AMPS data types

Numeric Types and Literals in AMPS Expressions

Numeric values in AMPS are always typed as either integers or floating point values. All numeric types that are less than or equal to the LONG_MAX limit in AMPS are signed, otherwise, the numeric type is unsigned. AMPS message types convert the original numeric types (or original representation for message types that do not have typed values) into the internal AMPS type system for the purposes of expression evaluation.

Within expressions, integer values are all numerals, with no decimal point, and can have a value in the same range as a 64-bit integer. For example:

42
149
-273
18446744073709551610

Within expressions, all numerals with a decimal point are floating-point numbers. AMPS interprets these numerals as double-precision floating point values. For example:

3.1415926535
98.6
-273.0

or, in scientific notation:

31.4e-1
6.022E23
2.998e8

AMPS automatically converts strings that contain numeric values to numbers when strings are used with an operator, function, or comparison that expects a numeric value.

Type Promotion for Numeric Types

AMPS uses the following rules for type promotion when evaluating numeric expressions:

  1. If any of the values in the expression is NaN, the result is NaN.
  2. Otherwise, if any of the values in the expression is floating point, the result is floating point.
  3. Otherwise, all of the values in the expression are integers, and the result is an integer.

Notice that, for division in particular, the results returned are affected by the type of the values. For example, the expression 1 / 5 evaluates to 0 since the result is interpreted as an integer. In comparison, the expression 1.0 / 5 evaluates to 0.2 since the result is interpreted as a floating point value.

When a function or operator that expects a numeric type is provided with a string, AMPS will attempt to convert string values to numeric types as necessary. When converting string values, AMPS recognizes same numeric formats in message data as are supported in the AMPS expression language (see String Literals. If the string is in an unrecognized format, AMPS converts the string as NaN.

String Literals in AMPS Expressions

When creating expressions for AMPS, string literals are indicated with single or double quotes. For example:

/FIXML/Order/Instrmt/@Sym = 'IBM'

AMPS supports the following escape sequences within string literals:

Escape Sequence Definition
\a Alert
\b Backspace
\t Horizontal tab
\n Newline
\f Form feed
\r Carriage return
\xHH Hexadecimal digit where H is (0..9,a..f,A..F)
\OOO Octal Digit (0..7)

Table 4.2: Escape Sequences

Additionally, any character which follows a backslash will be treated as a literal character.

AMPS string operations have no restrictions on character set, and correctly handle embedded NULL characters (\x00) and characters outside of the 7-bit ASCII range. AMPS string operations are not unicode-aware.

NULL, NaN and IS NULL

XPath expressions are considered to be NULL when they evaluate to an empty or nonexistent field reference. In numeric expressions where the operands or results are not a valid number, the XPath expression evaluates to NaN (not a number). The rules for applying the AND and OR operators against NULL and NaN values are outlined in Table 4.3 and Table 4.4.

Operand1 Operand2 Result
TRUE NULL NULL
FALSE NULL FALSE
NULL NULL NULL

Table 4.3: Logical AND with NULL/NaN Values

Operand1 Operand2 Result
TRUE NULL TRUE
FALSE NULL NULL
NULL NULL NULL

Table 4.4: Logical OR with NULL/NaN Values

The NOT operator applied to a NULL value is also NULL, or “Unknown.” The only way to check for the existence of a NULL value reliably is to use the IS NULL predicate. There also exists an IS NAN predicate for checking that a value is NaN (not a number.)

caution To reliably check for existence of a NULL value, you must use the IS NULL predicate such as the filter: /Order/Comment IS NULL

AMPS also provides a COALESCE() function that accepts a set of values and returns the first value that is not NULL. For example, given the following filter expression:

COALESCE(/userCategory,
         /employeeCategory,
         /vendorCategory,
         'restricted') != 'restricted'

AMPS will return the first value that is not NULL, and compare that value to the constant string 'restricted'. Notice that, to make the intent of the filter clear, this example provides a constant value for AMPS to return from the COALESCE if all of the field values are NULL.

The COALESCE function, like other functions in AMPS, is not array-aware. This means that when one of the XPath expressions provided to COALESCE specifies an array in the original message, AMPS provides the first item in the array to the COALESCE function. See Working With Arrays for details.

Grouping and Order of Evaluation

AMPS expressions allow you to group parts of the expression using parentheses. Parts of an expression inside parentheses are evaluated together. 60East recommends using parentheses to group independent parts of an expression to ensure that expression is evaluated in the expected order. For example, in this expression:

( /counter % 3 ) == 0

The clause /counter % 3 is evaluated first, and the result of that evaluation is compared to 0.

Within a group, elements are evaluated left to right in precedence order. For example, given the filter below:

(expression1 OR expression2 AND expression3) OR (expression4 AND
            NOT expression5) ...

AMPS evalutes expression2, then expression3 (since AND has higher precedence than OR), and if they evaluate to false, then expression1 will be evaluated.

AMPS does not guarantee that all parts of an expression will be evaluated if the result of an expression can be determined after only evaluating part of the expression. For example, given the expression:

A_FUNCTION(/a) OR B_FUNCTION(/b)

AMPS only guarantees that B_FUNCTION(/b) will be evaluated if A_FUNCTION(/a) returns false.

Logical Operators

The logical operators are NOT, AND, and OR, in order of precedence. These operators have the usual Boolean logic semantics.

/FIXML/Order/Instrmt/@Sym = 'IBM' OR /FIXML/Order/Instrmt/@Sym = 'MSFT'

As with other operators, you can use parentheses to group operators and affect the order of evaluation

(/orderType = 'rush' AND /customerType IN ('silver', 'gold') )
OR  /customerType = 'platinum'

Arithmetic Operators

AMPS supports the arithmetic operators +, -, *, /, %, and MOD in expressions. The result of arithmetic operators where one of the operands is NULL is undefined and evaluates to NULL.

AMPS distinguishes between floating point and integral types. When an arithmetic operator uses two different types, AMPS will convert the integral type to a floating point value as described in Numeric Types and Literals.

Examples of filter expressions using arithmetic operators:

/6 * /14 < 1000

/Order/@Qty * /Order/@Prc >= 1000000

AMPS numeric types are signed, and the AMPS arithmetic operators correctly handle negative numbers. The MOD and % operators preserve the sign of the first argument to the operator. That is, -5 % 3 produces a result of -2, while 5 % -3 produces a result of 2.

caution When using mathematical operators in conjunction with filters, be careful about the placement of the operator. Some operators are used in the XPath expression as well as for mathematical operation (for example, the '/' operator in division). Therefore, it is important to separate mathematical operators with white space, to prevent interpretation as an XPath expression.

Comparison Operators

The comparison operators can be loosely grouped into equality comparisons and range comparisons. The basic equality comparison operators, in precedence order, are ==, =, >, >=, <, <=, !=, and <>. The == comparison and the = comparison are treated as the same operator and produce the same results.

If these binary operators are applied to two operands of different types, AMPS attempts to convert strings to numbers. If conversion succeeds, AMPS uses the numeric values. If conversion fails because the string cannot be meaningfully converted to a number, strings are always considered to be greater than numbers. The operators consider an empty string to be NULL.

The following table shows some examples of how AMPS compares different types.

Expression Result
1 < 2 TRUE
10 < '2' FALSE, ‘2’ can be converted to a number
'2.000' <> '2.0' TRUE, no conversion to numbers since both are strings
2 = 2.0 TRUE, numeric comparison
10 < 'Crank It Up' TRUE, strings are greater than numbers
10 < '' FALSE, an empty string is considered to be NULL
10 > '' FALSE, an empty string is considered to be NULL
'' = '' FALSE, an empty string is considered to be NULL
'' IS NULL TRUE, an empty string is considered to be NULL

Table 4.5: Comparison Operator Examples

There are also set and range comparison operators. The BETWEEN operator can be used to check the range values.

tip The range used in the BETWEEN operator is inclusive of both operands, meaning the expression /A BETWEEN 0 AND 100 is equivalent to /A >= 0 AND /A <= 100

For example:

/FIXML/Order/OrdQty/@Qty BETWEEN 0 AND 10000

/FIXML/Order/@Px NOT BETWEEN 90.0 AND 90.5

(/price * /qty) BETWEEN 0 AND 100000

The IN operator can be used to perform membership operations on sets of values. The IN operator returns true when the value on the left of the IN appears in the set of values in the IN clause. For example:

/Trade/OwnerID NOT IN ('JMB', 'BLH', 'CJB')

/21964 IN (/14*5, /6*/14, 1000, 2000)

/customer IN ('Bob', 'Phil', 'Brent')

The IN operator returns true for the set of records that would be returned by an equivalent set of = comparisons joined by OR. The following two statements return the same set of records:

/pet IN ('puppy', 'kitten', 'goldfish')
(/pet = 'puppy') OR (/pet = 'kitten') OR (/pet = 'goldfish)

This equivalence means that NULL values in either the field being evaluated, or the set of values provided to the IN clause, always return false.

This also means that, for string values, the IN operator performs exact, case-sensitive matching.

tip When evaluating against a set of values, the IN operator typically provides better performance than using a set of OR operators. That is, a filter written as /firstName IN ('Joe', 'Kathleen', 'Frank', 'Cindy', 'Mortimer') will typically perform better than an equivalent filter written as /firstName = 'Joe' OR /firstName = 'Kathleen' OR /firstName = 'Frank' OR /firstName = 'Cindy' OR /firstName = 'Mortimer'.

Regular Expression Matching

AMPS also provides a regular expression comparison operator, LIKE, to provide regular expression matching on string values. A pattern is used for the right side of the LIKE operator. A pattern must be provided as a literal, quoted value. For more on regular expressions and the LIKE comparison operator, please see Chapter 4.

The string comparison operators described in the section called String Comparison Functions are usually more efficient than equivalent LIKE expressions, particularly when used to compare multiple literal patterns, or when the only purpose of the regular expression is to perform case-insensitive matching. Use LIKE operations when it is not practical to represent the filter condition with the string comparison operators.

Function or Operator Parameters Description
LIKE

The string to be compared

The pattern to evaluate the string against

Case-sensitive

Returns true if the string to be compared matches the pattern.

For example, the following filter uses a PCRE backreference to return true for any message where the /state field contains two identical characters in a row.

/state LIKE '(.)\1'

This operator is not unicode-aware.

Table 4.6: AMPS regular expression comparison

Conditional Operators

AMPS contains support for a ternary conditional IF operator which allows for a Boolean condition to be evaluated to true or false, and will return one of the two parameters. The general format of the IF statement is

IF (BOOLEAN_CONDITIONAL, VALUE_TRUE, VALUE_FALSE)

In this example, the BOOLEAN_CONDITIONAL will be evaluated, and if the result is true, the VALUE_TRUE value will be returned otherwise the VALUE_FALSE will be returned.

For example:

SUM( IF(( (/FIXML/Order/OrdQty/@Qty > 500) AND
          (/FIXML/Order/Instrmt/@Sym ='MSFT')), 1, 0 ))

The above example returns a count of the total number of orders that have been placed where the symbol is MSFT and the order contains a quantity more than 500.

The IF can also be used to evaluate results to determine if results are NULL or NaN. This is useful for calculating aggregates where some values may be NULL or NaN. The NULL and NaN values are discussed in more detail in the section called “NULL, NaN, and IS NULL”.

For example:

SUM(/FIXML/Order/Instrmt/@Qty * IF(
    /FIXML/Order/Instmt/@Price IS NOT NULL, 1, 0))

Working With Arrays

AMPS supports filters that operate on arrays in messages. There are two simple principles behind how AMPS treats arrays.

  • Binary operators that yield true or false (for example, =, <, LIKE) are array aware, as is the IN operator. These operators work on arrays as a whole, and evaluate every element in the array.
  • Arithmetic operators, functions, user-defined functions and other scalar operators, are not array aware, and use the first element in the array.

With these simple principles, you can predict how AMPS will evaluate an expression that uses an array. For any operator, an empty array evaluates to NULL.

Let’s look at some examples. For the purposes of this section, we will consider the following JSON document:

{
    "data"  : [1, 2, 3, "zebra", 5],
    "other" : [14, 34, 23, 5]
}

While these arrays are presented using JSON format for simplicity, the same principles apply to arrays in other message formats.

Here are some examples of ways to use an array in an AMPS filter:

  1. Determining if any element in an array meets a criteria. To determine this, you provide the identifier for the array, and use a comparison operator.

    Filter Evaluates as
    /data = 1 TRUE, /data contains 1
    /data = 'zebra' TRUE, /data contains 'zebra'
    /data != 'zebra' TRUE, /data contains an element that is not 'zebra'
    /data = 42 FALSE, /data does not contain 42
    /data LIKE 'z' TRUE, a member of /data matches 'z'
    /other > 30 TRUE, a member of /other is > 30
    /other > 50 FALSE, no member of /other is > 50

Table 4.7: Array contains element

  1. Determine whether a specific value is at a specific position. To determine this, use the subscript operator [] on the XPath identifier to specify the position, and use the equality operator to check the value at that position.

    Filter Evaluates as
    /data[0] = 1 TRUE, first element of /data is 1
    /data[3] = "zebra" TRUE, fourth element of /data is 'zebra'
    /data[1] != 1 TRUE, second element of /data is not 1
    /other[1] LIKE '4' TRUE, second element of /other matches '4'

Table 4.8: Element at specific position

  1. Determine whether any value in one array is present in another array.

    Filter Evaluates as
    /data = /other TRUE, a value in /data equals a value in /other
    /data != /other TRUE, a value in /data does not equal a value in /other

Table 4.9: Identical elements

  1. Determine whether an array contains one of a set of values.

    Filter Evaluates as
    3 IN (/data) TRUE, 3 is a member of /data
    /data IN (1, 2, 3) TRUE, a member of /data is in (1, 2, 3)
    /data IN ("zebra", "antelope", "li on") TRUE, a member of /data is in ("zebra", "antelope", "lion")

Table 4.10: Set of values in an array

Working With Timestamps

AMPS does not include a dedicated timestamp data type. Instead, AMPS represents timestamps either as a double or a string.

When representing timestamps as a double, AMPS uses standard UNIX timestamps.

When representing timestamps as a string, AMPS formats strings in a format compliant with ISO-8601. The format AMPS uses was chosen to balance parsing speed, precision, readability, and bandwidth. The format uses:

  • Basic format (no delimiters between parts of a date or time)
  • Decimal fractional seconds (with . delimiting the fraction rather than ,)
  • Explicit T to separate date and time
  • Explicit time zone specifier allowed, but not required

A string timestamp has the format of YYYYmmddTHHMMSS[Z] where:

  • YYYY is the four digit year.
  • mm is the two digit month.
  • dd is the two digit day.
  • T the character separator between the date and time.
  • HH the two digit hour.
  • MM the minutes of the time.
  • SS the two digit second.
  • Z is an optional timezone specifier. AMPS timestamps are always in UTC, regardless of whether the timezone is included. AMPS only accepts a literal value of Z for a timezone specifier.

For example, a timestamp for January 2nd, 2015, at 12:35:

20150102T123500Z

Timestamps in string format are used for point-in-time bookmarks, as explicit time specifiers in configuration files, in the timestamp header optionally returned on AMPS messages, and so on.

The timestamp format AMPS uses was chosen to make string comparisons for timestamps work as expected, including simple comparisons like < and >, as well as more sophisticated comparisons like the BETWEEN operator.

AMPS provides functions to convert between string representation of a timestamp and the double representation of a timestamp, as described in Date and Time Functions.

Performance Considerations

This section describes general performance considerations for the AMPS expression language and content filters. The considerations here are aspects of AMPS performance to be aware of in the general case. However, since the AMPS expression language operates on specific data, the structure and size of the messages that your application uses may have more effect on overall performance than the specific expressions used. For example, parsing and filtering a 20MB XML document is inherently more expensive than parsing and filtering a 400 byte BFlat document.

Use Short-Circuiting

When clauses in an expression are joined by OR, AMPS will only evaluate the right side of an OR expression if the left side of the expression is false.

When constructing an expression, this means that there can be a performance advantage to having relatively less-expensive clauses on the lefthand sides of the OR. For example, in the following clause:

/code = 'restricted' OR /notes LIKE 'restricted|limited'

the regular expression comparison is only evaluated if the comparison /code = 'restricted' is false. If the comparison is true, then the overall clause is true, and there is no need to evaluate the regular expression.

Avoid Redundant Expressions

AMPS does not reorder or recombine complex expressions. Where feasible, your application can save work at the server by combining expressions. In particular, if an application is constructing a filter by reading options from various sources, performance can be improved by combining the queries.

For example, in a filter like the following:

/id = '12345' OR /id IN ('12345','23456','34567','45678')
              OR /id IN ('12345','45678','90909')

the comparison against '12345' will be evaluated three times in cases where the value of /id does not match any of the values in the filter.

This filter is equivalent to:

/id IN ('12345','23456','34567','45678','90909')

which produces the same results, but only evaluates the /id field against a given value one time.

Use Specialized Operators For Simple Comparisons, Use LIKE When Necessary

The LIKE operator offers access to full Perl-Compatible Regular Expressions within the AMPS expression language. This flexibility allows for very precise filtering, and the PCRE engine performs well.

However, for comparisons for which AMPS provides a named function, the named function is highly-optimized and will perform somewhat better than the general-purpose regular expression engine.

For example, given a choice between two equivalent expressions:

/state BEGINS WITH('North')

and

/state LIKE '^North'

the version that uses BEGINS WITH will typically perform slightly better than the version that uses the regular expression.

This doesn’t mean that regular expressions or the LIKE operator perform poorly, and the LIKE operator can efficiently match patterns that would be difficult or impossible to match using the other operators. However, for very simple comparisons where AMPS provides a dedicated operator, that operator typically performs slightly better than a regular expression.

The following table shows some examples of regular expressions and the AMPS operator equivalent.

Regular Expression AMPS Operator Equivalent
^something BEGINS WITH('something')
something$ ENDS WITH ('something')
something INSTR(/field, 'something') != 0
(?i)something INSTR_I(/field, 'something') != 0
(?i)^something$ STREQUAL_I(/field, 'something') != 0
^a$|^b$|^c$ IN ('a','b','c')

Table 4.11: Regular expressions and operators

Optimize for Partial Parsing

Most AMPS message types have the ability to partially parse messages. That is, rather than parsing the entire message, the message type can simply find the identifiers that will be used, and stop the parsing process as soon as those identifiers are found.

This optimization is most useful for larger messages. For example, if the SOW key for a topic is based on the /id field of a message, there are content filters active that use the /id field and the /code field, and no other field is being indexed, then given a message like the following:

{"id":24,"code":"A12347","notes":"entered on behalf of a sloth",
     // ... 100K of other data ...
}

In this case, the AMPS parser can stop parsing the message after parsing just the /id field and the /code field, saving the cost of parsing the rest of the message.

Notice that this optimization will only improve performance in cases where AMPS doesn’t need to parse the entire message. For example, if there is a delta_subscribe active for the topic, or if the command being processed is a delta_publish, AMPS will parse the message completely to be able to calculate the deltas. Likewise, if any filter refers to a field that doesn’t appear in the message, AMPS will parse the message completely to be able to determine that the field does not appear in the message.

SOW Queries and Indexing

Queries over topics in the State-of-the-World have additional performance considerations. AMPS maintains indexes over State-of-the-World topics to help locate messages in response to a query.

  • Queries over a topic in the State-of-the-World can use indexes. Where possible, use an exact string match and create a hash index to take advantage of hash indexes.
  • When a query is submitted with an XPath identifier for which no index exists, AMPS will create and populate a memo index for that XPath identifier. This can add to the amount of time a query takes the first time a given XPath identifier is queried. You can specify that AMPS creates a memo index for a given identifier by using the Index configuration item in the Topic definition. Once an index is created, AMPS will continue to search for that XPath identifier in incoming messages for that topic to keep the index up to date.

Notice that indexes are only used for sow commands and during the sow portion of a sow_and_subscribe (or sow_and_delta_subscribe) command. Subscriptions do not use indexes, since there is no need to locate a message: during a subscription, filters are run against the current message.

See SOW Indexing for details.

Constructing Fields

For views, aggregated subscriptions, and SOW topic enrichment, AMPS allows you to construct new fields based on existing data.

When you construct a field, there are two components required:

  • A source expression that produces a value. This expression can include XPath identifiers that extract values from a message, literal values, operators, and functions.
  • A destination identifier that specifies the identifier where the message type will serialize the value produced by the source expression.

The source expressions and the destination identifier are separated by the AS keyword. The format for a field construction expression is as follows:

<source expression>
AS
<destination identifier>

For example, to create a field in a view that calculates the total value of an order by multiplying the /price field times the /quantity field, construct the field as shown below:

<Field>/price * /qty AS /total</Field>

This constructs a field using /price * /qty to as the source expression. Both /price and /qty are taken from the incoming message. When the result of this expression is computed, the value will be produced with the XPath identifier /total as the destination. That value will then be serialized to a message (with the exact format and syntax determined by the message type).

Notice that the grammar for constructing fields does not specify precisely how the field is represented in the message. AMPS constructs the value and provides the XPath identifier to the message type. The message type itself is responsible for serializing the value into the correct representation and structure for that message type.

All of the AMPS operators and functions that are available for filters are available to use in source expressions, including any user-defined functions loaded into the instance.

Depending on the context for field construction, there are additional capabilities available when constructing fields, as described in the following sections.

Constructing Preprocessing Fields

Preprocessing field constructors operate on a single message and construct fields based on that message. The results of the preprocessing field constructor are merged into the incoming message. Any field in the source message that is not changed or removed during preprocessing is left unchanged, so it is not necessary to include all fields in the message in the Preprocessing block.

Because preprocessing fields apply to a specific message, preprocessing fields cannot specify the topic or message type in an XPath identifier. All identifiers in the source expression are evaluated as identifiers in the message being preprocessed. Preprocessing fields are evaluated during the preprocessing phase, so they cannot refer to the previous state of a message.

Using HINT to Control Field Construction

Preprocessing can be used to remove fields from a message. By default, AMPS serializes any field that has an empty string or NULL value after preprocessing. Preprocessing fields can include a directive that specifies that a field that contains a NULL value should be removed from the set of fields rather than serialized with a NULL value. The directive HINT OPTIONAL applied to the XPath identifier specifies that if the result of the source expression is NULL, AMPS does not provide the value for the message type to serialize. For example, the following field constructor removes the /source field from the message if the value provided is not in a specific list of values:

<Field>IF(/source IN ('a','e','f'), /source, NULL)
       AS /source HINT OPTIONAL</Field>

By default, AMPS considers the results of field construction (the processed message) to be distinct from the current message. AMPS rewrites the current message after preprocessing is completed. This means that, by default, the results of fields constructed during preprocessing are not available to other fields within preprocessing. The HINT SET_CURRENT option immediately inserts or updates values in the current message, which makes the new value available to all subsequent Field declarations.

In the sample below, AMPS enriches the message by performing an expensive operation (implemented as a user-defined function) on two input fields, and immediately updates the current message with the output of that operation. AMPS then sets other fields in the processed message using the updated value in the current message.

<Field>EXPENSIVE_UDF_CALL(/dataSet1, /dataSet2)
       AS /processedData HINT SET_CURRENT</Field>
<Field>IF(/processedData > 1000000,
           'A',
           'B') AS /resultClass</Field>

Notice that using HINT SET_CURRENT requires AMPS to process Field declarations in order, which may prevent future optimizations.

Hints can be combined as follows:

<Field>EXPENSIVE_UDF_CALL(/dataSet1, /dataSet2)
       AS /processedData HINT SET_CURRENT,OPTIONAL
</Field>

In this case, if the projected field would be NULL, the field is removed from the current message.

Constructing Enrichment Fields

Enrichment field constructors operate on a single message and construct fields based on that message. Enrichment expressions operate on the current message and change the current message. The results of the enrichment directives are merged into the incoming message. Any field in the source message that is not changed or removed during preprocessing is left unchanged, so it is not necessary to include all fields in the message in the Enrichment directive.

Because enrichment fields apply to a specific message, enrichment fields cannot specify the topic or message type in an XPath identifier. All identifiers in the source expression are evaluated as indentifiers in the message being enriched.

Enrichment fields are constructed during the enrichment phase, so enrichment fields can refer to the previous state of a message. Within an enrichment expression, AMPS provides two special modifiers for XPath identifiers that specify whether an XPath identifier refers to the current incoming message or the previous state of the message. These modifiers apply only to the source expression, and cannot be used in the destination identifier. The modifiers are as follows:

Modifier Description
OF CURRENT Specify that the XPath identifier refers to the incoming message.
OF PREVIOUS Specify that the XPath identifier refers to the previous state of the message in the SOW. If there is no record in the SOW for this message, all identifiers that specify OF PREVIOUS return NULL.

Table 4.12: XPath Identifier Modifiers for Enrichment

Using HINT to Control Field Construction

Enrichment can be used to remove fields from a message. By default, AMPS serializes any field that has an empty string or NULL value after enrichment. Enrichment Field elements can include a directive that specifies that a field that contains a NULL value should be removed from the message rather than serialized with a NULL value. The directive HINT OPTIONAL applied to the XPath identifier specifies that if the result of the source expression is NULL, AMPS does not provide the value for the message type to serialize. For example, the following field constructor removes the /source field from the message if the value provided is not in a specific list of values:

<Field>IF(/source IN ('a','e','f'), /source, NULL)
       AS /source HINT OPTIONAL</Field>

By default, AMPS considers the results of field construction (the enriched message) to be distinct from the current message. AMPS rewrites the current message after enrichment is completed. This means that, by default, the results of fields constructed during enrichment are not available to other fields within enrichment. The HINT SET_CURRENT option immediately inserts or updates values in the current message, which makes the new value available to all subsequent Field declarations.

In the sample below, AMPS enriches the message by performing an expensive operation (implemented as a user-defined function) on two input fields, and immediately updates the current message with the output of that operation. AMPS then sets other fields in the processed message using the updated value in the current message.

<Field>EXPENSIVE_UDF_CALL(/dataSet1, /dataSet2)
       AS /processedData HINT SET_CURRENT</Field>
<Field>IF(/processedData > 1000000,
           'A',
           'B') AS /resultClass</Field>

Notice that using HINT SET_CURRENT requires AMPS to process Field declarations in order, which may prevent future optimizations.

Hints can be combined as follows:

<Field>EXPENSIVE_UDF_CALL(/dataSet1, /dataSet2)
       AS /processedData HINT SET_CURRENT,OPTIONAL
</Field>

In this case, if the projected field would be NULL, the field is removed from the current message.

Constructing View Fields

View field constructors operate over groups of messages, and construct a single output message for each distinct group, as specified by the Grouping element in the View configuration.

When constructing a field in a view, all identifiers used in the source expression must be in one of the underlying topics for the view. When the view uses a Join, the identifiers must include the topic identifier. If the topics in the Join are of different message types, the identifiers must include both the message type and the topic identifier.

For example, the following Field definition multiplies the /quantity from the NVFIX topic orders by the /price from the JSON topic items, and projects the result into the /total field of the view.

<Field>[nvfix].[orders]./quantity * [json].[items]./price AS /total</Field>

Aggregate Functions

AMPS provides a set of aggregation functions that can be used in a Field constructor for a view and in the projection option of an aggregated subscription. These functions return a single value for each distinct group of messages, as identified by distinct combinations of values in the Grouping clause.

Each of these functions takes a single value as an argument. That value is typically provided by a single message within a group. There are no special limitations on the value, and the value can be a literal value, an identifier directing AMPS to extract the value from the message, or a function.

For example, given a set of messages like the following:

{"id":1, "item":1,"qty":10, "oid":1, ...}
{"id":2, "item":2,"qty":10, "oid":1, ...}
{"id":3, "item":3,"qty":25, "oid":1, ...}

With a view definition that has a Projection clause and Grouping clause like the following:

<Projection>
   <Field>/oid</Field>
   <Field>SUM(/qty) AS /totalOrderQty</Field>
   <Field>SUM(IF((/qty % 10) == 0,1,0)) AS /evenOrderCount</Field>
</Projection>
<Grouping>
   <Field>/oid</Field>
</Grouping>

AMPS will produce the following record:

{"oid":1,"totalOrderQty":45,"evenOrderCount":2}

Notice that the first SUM() function simply extracts the value of the /qty from each message, while the second SUM() function uses the output of the IF statement for each message.

Because aggregate functions operate over groups of messages, these functions are only available when constructing fields for aggregate purposes, either in a view or an aggregated subscription. The functions described in this section are not available to filters, and are not available for constructing fields during SOW topic enrichment.

Function Description
AVG Average over an expression. Returns the mean value of the values specified by the expression.
COUNT Count of values in an expression. Returns the number of values specified by the expression.
COUNT_DISTINCT Count of number of distinct values in an expression, ignoring NULL. Returns the number of distinct values in the expression. AMPS type conversion rules apply when determining distinct values.
MIN Minimum value. Returns the minimum out of the values specified by the expression.
MAX Maximum value. Returns the maximum out of the values specified by the expression.
STDDEV_POP Population standard deviation of an expression. Returns the calculated standard deviation.
STDDEV_SAMP Sample standard deviation of an expression. Returns the calculated standard deviation.
SUM Summation over an expression. Returns the total value of the values specified by the expression.

Table 4.13: AMPS Aggregation Functions

Null values are not included in aggregate expressions with AMPS, nor in ANSI SQL. COUNT will count only non-null values; SUM will add only non-null values; AVG will average only non-null values; and MIN and MAX ignore NULL values, and so on.

MIN and MAX can operate on either numbers or strings, or a combination of the two. AMPS compares values using the principles described for comparison operators. For MIN and MAX, AMPS determines order based on these rules:

  • Numbers sort in numeric order.
  • String values sort in ASCII order.
  • When comparing a number to a string, convert the string to a number, and use a numeric comparison. If that is not successful, the value of the string is higher than the value of the number.

For example, given a field that has the following values across a set of messages:

24, 020, 'cat', 75, 1.3, 200, '75', '42'

MIN will return 1.3, MAX will return 'cat'. Notice that different message types may have different support for converting strings to numeric values: AMPS relies on the parsing done by the message type to determine the numeric value of a string.