4. User-Defined Functions

What’s A UDF?

User defined functions allow you to extend the capability of AMPS expressions. Once you create and register a function, you can use the function in filters, enrichment, and view projections – any context where you can use a built-in AMPS non-aggregate function.

Functions receive values from a single message and return a single value. When used in a filter, the UDF is called once for each message evaluated for the filter. When used to project a field for view or a JOIN, the UDF is called each time that AMPS needs to construct that field – that is, once for each output message. (Notice that, in a JOIN, the UDF may receive fields from any number of underlying topics: however, AMPS constructs the joined set of fields for the output message before calling the UDF.)

You can use user-defined functions in any context where you can use built-in AMPS functions: in fact, AMPS provides many of the built-in functions through this interface.

When Should I Create a UDF?

Create a user defined function when:

  • Your application requires functionality (filtering, transformation of values) that cannot be implemented using the AMPS built-in functions.
  • You can perform that functionality efficiently in server-side code.
  • You have control over your AMPS deployments and can ensure that the module that implements the UDF will be present in every instance where an application may require the UDF.

A user-defined function cannot be used to aggregate multiple messages into a single value. A UDF is called for a single message and produces a single result.

In current releases of AMPS, user-defined functions cannot produce arrays or compound values. You can return simple, scalar values from a user-defined function, or use the provided functions to return a string value.

Implementing a UDF

Notice that, unlike other AMPS module types, UDFs do not use a context. You implement a UDF as a function with C linkage and the following signature:

static void function_name(amps_expression_value result ,
                          unsigned long argcount,
                          amps_expression_value_arrayargs);

Your function processes the arguments provided in the args parameter and returns the value in the result.

AMPS sets no particular requirements on the name of the function. The function name can be any valid C function name.

Working with Arguments

There are two steps to working with an argument provided to a UDF. First, you extract the value from the provided argument aray. Second, you convert the value to the type you need to use in your function.

The AMPS external API provides the amps_expression_value_get function to retrieve a specific value from an amps_expression_value_array. The function takes the array to extract the value from, and the position of the value to extract. For example, the following code returns the first argument from an amps_expression_value_array named args.

amps_expression_value myVal = amps_expression_value_get(args,0);

Once the value is extracted, the AMPS external API offers a set of functions for retrieving an underlying C value for the amps_expression_value. For example, to retrieve a string value, you provide a pointer and variable to hold the length, then call amps_expression_value_as_string to retrieve the string:

char* value = NULL;
size_t valueLen = 0;
// value will point to the beginning of the string,
// valueLen will specify the length of the string.
amps_expression_value_as_string(myVal, &value, &valueLen);

The AMPS external API provides similar functions for all of the types recognized by the AMPS expression engine. See Working with Expression Values for more information on expression values.

Setting the Return Value

The return value from your UDF is the amps_expression_value provided as the first argument when AMPS calls your function. To return a value from your UDF, you use the AMPS external API to set the type and value of the amps_expression_value.

For example, the following line of code sets the return value to a boolean FALSE:

amps_expression_value_set_bool(result, 0);

The following line of code sets the return value to a double the value of the variable calculation:

amps_expression_value_set_double(result, calculation);

The value and type of the amps_expression_value provided to your function at the point of return is the value and type of the return from your UDF.

The AMPS external API provides a convenience function for setting an output value to one of the input values. The amps_expression_value_set_value function simply sets the value of the first argument to the value of the second argument. The function handles any type provided as an input value. For example, the following line of code sets the result to the first value of the input arguments to the UDF:

amps_expression_value_set_value(result, amps_expression_value_get(args,0));

The named methods are provided for simple, scalar values. For strings, AMPS must manage the memory allocated for the string, as described in the next section.

Constructing Strings

AMPS provides a special set of functions for constructing strings within a UDF. These functions enable AMPS to correctly manage the lifetime of the memory allocated for the string. Notice that, because the lifetime of the string is based on the lifetime of the results returned from the function, you must not use the allocator provided in the amps_module_init function to allocate memory for strings returned from a UDF.

There are two ways to create a string that can be returned from a UDF. Which method to use depends on the lifetime of the string you are returning.

  • String lifetime is guaranteed to exceed the evaluation of the full expression. If the string value that you are returning is guaranteed to be valid while AMPS uses the result of the UDF and is guaranteed to be properly freed afterwards if necessary, use amps_expression_value_set_cstr. For example, if your function translates a set of numeric codes to a fixed set of strings, you could use this function to return a pointer to a static string in your module. Likewise, if your function will simply return the value of one of the input parameters, you can use this function to set the output value to point to the string in the input parameter (since AMPS is already managing the lifetime of the input parameter).

    In this case, the return value structure does not take ownership of the string, and will not free the string when AMPS is done with the return value. For a static string, there is no need to free the string. For the contents of another parameter to the function, that parameter has ownership of the string and will free it when AMPS is finished with the function results.

    tip Strings allocated on the stack as local variables are freed when the function returns, which means that the memory the return value references will be in an indeterminate state when AMPS evaluates the results of the expression. Use amps_expression_value_allocate_cstr with strings allocated on the stack.

  • Return value manages the string lifetime. For other cases, use amps_expression_value_allocate_cstr to allocate memory that will be owned by the return value, and will be freed when AMPS has no more need of the return value. This function returns a pointer to the beginning of the allocated memory, and you use that pointer to copy the string into the newly allocated memory.

The following table lists the string construction functions.

Function  

char *

amps_expression_value_allocate_cstr

(amps_expression_value *v,

uint32_t len);

Sets the type of the provided amps_expression_value to string, and allocates space for a string of size len. Returns a pointer to the beginning of the allocated memory.

With this function, the amps_expression_value owns the allocated memory and will free the memory when the value is destroyed.

void amps_expression_value_set_cstr

( amps_expression_value *v,

const char * p,

uint32_t len);

Sets the type of the amps_expression_value to string, and sets the value to point to a string of size len starting at p.

With this function, the amps_expression_value does not own the memory that p points to, and does not free the memory when the value is destroyed.

Table 4.1: String Construction Functions

Making UDF Functions Array-Aware

AMPS includes functions for working with array values passed into a UDF. By default, when a value provided to a UDF is an array, AMPS provides the first value in the array to the UDF. This is similar to the way that most existing functions in AMPS work.

AMPS also provides functions that you can use to make your function array-aware. Notice that a UDF must return a single value as a return type: however, this capability can be used to perform operations on array values. For example, you could implement an ARRAY_LEN() function to allow clients to write filters like ARRAY_LEN(/items) < 10.

The following table lists the functions provided for working with arrays:

Function  
int
amps_expression_value_is_array(
     amps_expression_value v);
Returns TRUE (non-zero) if the provided amps_expression_value contains an array, 0 otherwise.
amps_expression_value_array
amps_expression_value_as_array(
   amps_expression_value v,
   size_t* outCount);
Retrieves the array from the provided amps_expression_value and sets outCount to the number of items in the array.
amps_expression_value
amps_expression_value_get(
   amps_expression_value_array array,
   size_t pos);
Retrieves the value at position pos. The first item in the array is at position 0.

Table 4.2: Array Functions

Registering the UDF with AMPS

For AMPS to call your UDF, you must register the function with AMPS during module initialization. The AMPS utility API includes the following function for registering the module with AMPS:

Function  
amps_register_udf( amps_udf_t udf, amps_register_udf, const char *name, size_t paramcount )

Registers a UDF with AMPS. The udf parameter is a pointer to the function to call for this UDF

name is the name of the function within AMPS expressions.

paramcount is the number of parameters for this UDF.

For example, to set up a UDF that is implemented in a function named do_stuff function that takes one parameter and direct AMPS to call this function whenever the directive DO_STUFF appears in a filter, you would register your function from amps_module_init as follows:

amps_register_udf(do_stuff,
                 "DO_STUFF",
                  1);

Table 4.3: Registering a UDF function