6. Constructing Fields

For views, aggregated subscriptions, and SOW topic enrichment, AMPS allows you to construct new fields based on existing data.

When you construct a field, there are two components required:

  1. A source expression that produces a value. This expression can include XPath identifiers that extract values from a message, literal values, operators, and functions.
  2. A destination identifier that specifies the identifier where the message type will serialize the value produced by the source expression.

The source expression and the destination identifier are separated by the AS keyword. The format for a field construction expression is as follows:

<source expression>
AS
<destination identifier>

For example, to create a field in a view that calculates the total value of an order by multiplying the /price field times the /qty field, construct the field as shown below:

<Field>/price * /qty AS /total</Field>

This constructs a field using /price * /qty as the source expression. Both /price and /qty are taken from the incoming message. When the result of this expression is computed, the value will be produced with the XPath identifier /total as the destination. That value will then be serialized to a message (with the exact format and syntax determined by the message type).

Notice that the grammar for constructing fields does not specify precisely how the field is represented in the message. AMPS constructs the value and provides the XPath identifier to the message type. The message type itself is responsible for serializing the value into the correct representation and structure for that message type.

All of the AMPS operators and functions that are available for filters are available to use in source expressions, including any user-defined functions loaded into the instance.

Depending on the context for field construction, there are additional capabilities available when constructing fields, as described in the following sections.

Constructing Preprocessing Fields

Preprocessing field constructors operate on a single message and construct fields based on that message. The results of the preprocessing field constructor are merged into the incoming message. Any field in the source message that is not changed or removed during preprocessing is left unchanged, so it is not necessary to include all fields in the message in the Preprocessing block.

Since preprocessing fields apply to a specific message, preprocessing fields cannot specify the topic or message type in an XPath identifier. All identifiers in the source expression are evaluated as identifiers in the message being preprocessed. Preprocessing fields are evaluated during the preprocessing phase, so they cannot refer to the previous state of a message.

Using HINT to Control Field Construction

Preprocessing can be used to remove fields from a message. By default, AMPS serializes any field that has an empty string or NULL value after preprocessing. Preprocessing fields can include a directive that specifies that a field that contains a NULL value should be removed from the set of fields rather than serialized with a NULL value. The directive HINT OPTIONAL applied to the XPath identifier specifies that if the result of the source expression is NULL, AMPS does not provide the value for the message type to serialize. For example, the following field constructor removes the /source field from the message if the value provided is not in a specific list of values:

<Field>IF(/source IN ('a','e','f'), /source, NULL)
       AS /source HINT OPTIONAL</Field>

By default, AMPS considers the results of field construction (the processed message) to be distinct from the current message. AMPS rewrites the current message after preprocessing is completed. This means that, by default, the results of fields constructed during preprocessing are not available to other fields within preprocessing. The HINT SET_CURRENT option immediately inserts or updates values in the current message, which makes the new value available to all subsequent Field declarations.

In the sample below, AMPS enriches the message by performing an expensive operation (implemented as a user-defined function) on two input fields, and immediately updates the current message with the output of that operation. AMPS then sets other fields in the processed message using the updated value in the current message.

<Field>EXPENSIVE_UDF_CALL(/dataSet1, /dataSet2)
       AS /processedData HINT SET_CURRENT</Field>
<Field>IF(/processedData > 1000000,
           'A',
           'B') AS /resultClass</Field>

Notice that using HINT SET_CURRENT requires AMPS to process Field declarations in order, which may prevent future optimizations.

Hints can be combined as follows:

<Field>EXPENSIVE_UDF_CALL(/dataSet1, /dataSet2)
       AS /processedData HINT SET_CURRENT,OPTIONAL
</Field>

In this case, if the projected field would be NULL, the field is removed from the current message.

Constructing Enrichment Fields

Enrichment field constructors operate on a single message and construct fields based on that message. Enrichment expressions operate on the current message and change the current message. The results of the enrichment directives are merged into the incoming message. Any field in the source message that is not changed or removed during preprocessing is left unchanged, so it is not necessary to include all fields in the message in the Enrichment directive.

Since enrichment fields apply to a specific message, enrichment fields cannot specify the topic or message type in an XPath identifier. All identifiers in the source expression are evaluated as identifiers in the message being enriched.

Enrichment fields are constructed during the enrichment phase, so enrichment fields can refer to the previous state of a message. Within an enrichment expression, AMPS provides two special modifiers for XPath identifiers that specify whether an XPath identifier refers to the current incoming message or the previous state of the message. These modifiers apply only to the source expression, and cannot be used in the destination identifier. The modifiers are as follows:

XPath identifier modifiers for enrichment
Modifier Description
OF CURRENT Specify that the XPath identifier refers to the incoming message.
OF PREVIOUS Specify that the XPath identifier refers to the previous state of the message in the SOW. If there is no record in the SOW for this message, all identifiers that specify OF PREVIOUS return NULL.

Using HINT to Control Field Construction

Enrichment can be used to remove fields from a message. By default, AMPS serializes any field that has an empty string or NULL value after enrichment. Enrichment Field elements can include a directive that specifies that a field that contains a NULL value should be removed from the message rather than serialized with a NULL value. The directive HINT OPTIONAL applied to the XPath identifier specifies that if the result of the source expression is NULL, AMPS does not provide the value for the message type to serialize. For example, the following field constructor removes the /source field from the message if the value provided is not in a specific list of values:

<Field>IF(/source IN ('a','e','f'), /source, NULL)
       AS /source HINT OPTIONAL</Field>

By default, AMPS considers the results of field construction (the enriched message) to be distinct from the current message. AMPS rewrites the current message after enrichment is completed. This means that, by default, the results of fields constructed during enrichment are not available to other fields within enrichment. The HINT SET_CURRENT option immediately inserts or updates values in the current message, which makes the new value available to all subsequent Field declarations.

In the sample below, AMPS enriches the message by performing an expensive operation (implemented as a user-defined function) on two input fields, and immediately updates the current message with the output of that operation. AMPS then sets other fields in the processed message using the updated value in the current message.

<Field>EXPENSIVE_UDF_CALL(/dataSet1, /dataSet2)
       AS /processedData HINT SET_CURRENT</Field>
<Field>IF(/processedData > 1000000,
           'A',
           'B') AS /resultClass</Field>

Notice that using HINT SET_CURRENT requires AMPS to process Field declarations in order, which may prevent future optimizations.

Hints can be combined as follows:

<Field>EXPENSIVE_UDF_CALL(/dataSet1, /dataSet2)
       AS /processedData HINT SET_CURRENT,OPTIONAL
</Field>

In this case, if the projected field would be NULL, the field is removed from the current message.

Constructing View Fields

View field constructors operate over groups of messages, and construct a single output message for each distinct group, as specified by the Grouping element in the View configuration.

When constructing a field in a view, all identifiers used in the source expression must be in one of the underlying topics for the view. When the view uses a Join, the identifiers must include the topic identifier. If the topics in the Join are of different message types, the identifiers must include both the message type and the topic identifier.

For example, the following Field definition multiplies the /quantity from the NVFIX topic orders by the /price from the JSON topic items, and projects the result into the /total field of the view.

<Field>[nvfix].[orders]./quantity * [json].[items]./price AS /total</Field>

Aggregate Functions

AMPS provides a set of aggregation functions that can be used in a Field constructor for a view and in the projection option of an aggregated subscription. These functions return a single value for each distinct group of messages, as identified by distinct combinations of values in the Grouping clause.

Each of these functions takes a single value as an argument. That value is typically provided by a single message within a group. There are no special limitations on the value, and the value can be a literal value, an identifier directing AMPS to extract the value from the message, or a function.

For example, given a set of messages like the following:

{"id":1, "item":1,"qty":10, "oid":1, ...}
{"id":2, "item":2,"qty":10, "oid":1, ...}
{"id":3, "item":3,"qty":25, "oid":1, ...}

With a view definition that has a Projection clause and Grouping clause like the following:

<Projection>
   <Field>/oid</Field>
   <Field>SUM(/qty) AS /totalOrderQty</Field>
   <Field>SUM(IF((/qty % 10) == 0,1,0)) AS /evenOrderCount</Field>
</Projection>
<Grouping>
   <Field>/oid</Field>
</Grouping>

AMPS will produce the following record:

{"oid":1,"totalOrderQty":45,"evenOrderCount":2}

Notice that the first SUM() function simply extracts the value of the /qty from each message, while the second SUM() function uses the output of the IF statement for each message.

Since aggregate functions operate over groups of messages, these functions are only available when constructing fields for aggregate purposes, either in a view or an aggregated subscription. The functions described in this section are not available to filters, and are not available for constructing fields during SOW topic enrichment.

The set of functions provided in AMPS have been chosen to be efficient to compute over high volumes of rapidly changing data.

AMPS aggregation functions
Function Description
AVG Average over an expression. Returns the mean value of the values specified by the expression.
COUNT Count of values in an expression. Returns the number of values specified by the expression.
COUNT_DISTINCT Count of the number of distinct values in an expression, ignoring NULL. Returns the number of distinct values in the expression. AMPS type conversion rules apply when determining distinct values.
MIN Minimum value. Returns the minimum out of the values specified by the expression.
MAX Maximum value. Returns the maximum out of the values specified by the expression.
STDDEV_POP Population standard deviation of an expression. Returns the calculated standard deviation.
STDDEV_SAMP Sample standard deviation of an expression. Returns the calculated standard deviation.
SUM Summation over an expression. Returns the total value of the values specified by the expression.

Null values are not included in aggregate expressions with AMPS, nor in ANSI SQL. COUNT will count only non-null values, SUM will add only non-null values, AVG will average only non-null values, and MIN and MAX ignore NULL values, and so on.

MIN and MAX can operate on either numbers or strings, or a combination of the two. AMPS compares values using the principles described for comparison operators. For MIN and MAX, AMPS determines order based on these rules:

  • Numbers sort in numeric order.
  • String values sort in ASCII order.
  • When comparing a number to a string, convert the string to a number, and use a numeric comparison. If that is not successful, the value of the string is higher than the value of the number.

For example, given a field that has the following values across a set of messages:

24, 020, 'cat', 75, 1.3, 200, '75', '42'

MIN will return 1.3, MAX will return 'cat'. Notice that different message types may have different support for converting strings to numeric values: AMPS relies on the parsing done by the message type to determine the numeric value of a string.