How to write an internal UDF function

<< Declarative referential integrity versus triggers | Database technology articles | Using descriptors with UDFs >>

How to write an internal UDF function

By Claudio Valderrama - Copyright IBPhoenix Publications

When InterBase was created, it introduced concepts and features ahead of its time. As other database products evolved, they acquired similar features and diluted InterBase's stronghold on these claims to distinction. One important feature was the ability to support external functions (alias "user defined functions" or UDFs) that could be developed in any program language capable of handing the so-called C calling convention.

How a UDF works

A Firebird UDF follows the same concept as functions in Pascal: it is a routine that takes zero or more parameters and always returns a value.

UDF support did not change noticeably in InterBase from versions 4 to 6. In Firebird 1 it was enhanced by enabling DSQL to process a syntax, previously only accepted by QLI, to allow descriptors as UDF parameters. It was modified slightly again for Firebird version 2. Although the engine's code supports polymorphism (see the FUN_resolve function in fun.epp) the system tables prevent the user from defining two functions with the same name. There are also traces of an idea to allow Boolean functions but, currently, the engine allows only functions that return a non-Boolean value.

Writing external code and compiling it into a Windows dynamic library or a UNIX shared library has the advantage that the engine needs no modification at all to access and support a new function: the functions are registered in the database by a DDL statement and the engine stores the information in system tables. The engine loads the library the first time it finds a reference to one of the functions defined in it.

Drawbacks

Nothing in life has only advantages and external functions are no exception. They suffer from four drawbacks:

Allocating dynamic memory in them, which will have to be freed later by the engine, is tricky. The database engine exports a special function that host code should use for memory allocation (malloc()). Borland added the SQL keyword FREE_IT to tell the engine to deallocate that memory.
Since the engine has no information about the purpose, semantics and behavior of any external function, it is up to the UDF developer to ensure that its interface with the engine works properly. The data type of the return parameter is fixed at declaration time and the engine expects the value returned at runtime to conform with the declaration. Returning something else may be disastrous, ranging from strange results to crashing the engine. This is in contrast with a built-in function: with SUM(), for example, the engine can calculate the best type for the return value because it has all the knowledge about the function's behavior. Given a floating point column, SUM() will return a double precision number as the result; given an exact numeric column, the result will be an exact numeric.

Vulnerability from exposing libraries

The external function has no access to internal data structures in the engine so it has to work with the limited information that travels in the parameters. To give the UDF more access could be seen as an easy route for trashing the engine's memory. Before InterBase v. 5, a declaration could direct the engine to load an external function library from any place accessible to the engine. It was recognised as a security problem and InterBase v.6 restricted the engine's default search path for external function libraries to the /UDF directory beneath the server's root directory.

Internal user-defined functions

Another sort of user-defined function is available, less known because it involves interaction with the engine's code. They are the internal user functions -- let's call them IUFs.

Inside the JRD directory are two files where IUF's can be defined: builtin.cpp and functions.cpp. The structure of both files is pretty much the same, so it is a matter of convenience where the functions are defined. The name of the former—builtin.cpp seems to suggest that it is the best place. Historically though, it was designated for writing test and QA functions, so we'll use the functions.cpp file and leave builtin.cpp as a place where the core development team can use to implement functions for general use.

Like any user-defined function, a built-in function needs to be registered in the system tables to be available.

How the engine finds a function

The engine follows a predefined sequence to find a requested function. This is important, since UDF's can be written in at least three places, two of them hard-coded in the engine.

The call sequence, with indenting to reflect who calls whom, is:

 FUN_lookup_function (fun.epp)
    ISC_lookup_entrypoint (flu.cpp)
       FUNCTIONS_entrypoint (functions.cpp)
       FLU_lookup_module (flu.cpp)
       search_for_module(flu.cpp)
    BUILTIN_entrypoint (builtin.cpp)

The request analysis phase (par.cpp) will call FUN_lookup_function to check for the existence of the module and of the function within the module; if so, it will then load it. In rare cases, even the international support (intl.cpp) may call FUN_lookup_function as a last resort to find code that handles some character set.

In turn, FUN_lookup_function will read the system table RDB$FUNCTIONS, where the engine stores function declarations, looking for a name match in RDB$ENTRY_POINT. The first match returns with the selected function. The last function list to be scanned is the one statically defined inside builtin.cpp — the BUILTIN_entrypoint is the last resort.

In the list of functions that have been loaded by previous calls, FUNCTIONS_entrypoint is the first that does real finding work on being invoked. Given two functions with the same name and same module name, the first one loaded wins because, next time, the search is satisfied from that list of loaded functions. The engine doesn't bother to go to the system tables or internal static arrays in functions.cpp and builtin.cpp.

`functions.cpp`

Now, let's analyze the structure of functions.cpp:

License and main comments

Headers

The FN struct

The main struct that holds the information for each function is a general function signature named FN. A general function signature is used or an array couldn't hold different function prototypes.

First, the module name, which is invented to differentiate from external modules, is proved as a C string. No corresponding real module exists because the only thing that is loaded is the definition in this same file.

The function name follows, the equivalent of an entry point in a DLL or shared library. Again, it is an invented name, because the functions reside in this same file.

Finally, the function itself has to be passed, coercing it to type FPTR_INT.

Headers for functions

Functions that have been declared here are "test" (probably created in ancient times, considering it uses a signature that only QLI can register in system tables) and a function "ni"” that was created during Firebird 1 development to test that the IUF functionality still worked through DSQL.

Definition of the `isc_functions` array

The isc_functions[] array comes after the headers for the IUFs that we want to define, each element of the array corresponding to an FN structure. The "end mark" of the array is an element containing all three members initialized to null. This should always be the last element.

It illustrates an interesting point, often overlooked: a function can be declared as many times as is necessary, with a different logical name for each declaration. This is true also for external function libraries, as demonstrated by fbudf.

Utility function `FUNCTIONS_entrypoint`

This is engine code — do not touch!

Function body

Now the body for our IUF's. Finding the definitions for the "test" and "ni" functions we discover that "test" converts a number into its string representation and "ni" returns the first descriptor as a pointer, if it is not null; otherwise it returns the second parameter. Note that having a null pointer to a descriptor is only one of the three ways the engine signals the presence of an SQL NULL condition.

As a first experiment, we will modify the engine to do something simple: we will return the length of an argument. When we receive NULL, we cannot assert a length, so we return NULL, too. We'll call this function byte_len, since it doesn't attempt to be character-set-aware; it simply counts ASCII characters as bytes.

Header

First, we add a header that contains the declaration for the descriptors. Beneath the existing

 #include "../jrd/jrd.h" /* For MAXPATHLEN Bug #126614 */

we write:

 #include "../jrd/dsc_proto.h"

Next, we add the header for our function following the other function headers. So, beneath the existing

 static DSC* ni(DSC*, DSC*);

we add:

 static SLONG* byteLen(const dsc*);

Adding it to the array

Even though the function exists here, it remains invisible to any statement until it is declared to the database. We have to add our function to the isc_functions array of IUF's so the engine can find it and accept the declaration of the function.

We show here the complete array:

 static const FN isc_functions[] = {
       {"test_module", "test_function", (FPTR_INT) test},
       {"test_module", "ni", (FPTR_INT) ni},
       {"test_module", "ns", (FPTR_INT) ni},
       {"test_module", "nn", (FPTR_INT) ni},
       {"test_module", "byte_len", (FPTR_INT) byteLen},
       {0, 0, 0}
 };

In a typical external library containing UDF code, such as ib_udf and fbudf, the function name as declared in the library is the entry_point to be used in the declaration. However, for IUFs, we have one more name, in addition to the function's name (we called it byteLen): we have the name declared in the array above.

In DECLARE EXTERNAL FUNCTION <name>, any <name> can be given for use by the SQL statements that are going to call it. However, as with any UDF, the name in the array (in this case, byte_len) must be referenced in the function declaration by the ENTRY_POINT attribute of the DECLARE EXTERNAL FUNCTION statement. It is this name that is used to direct the engine to the correct function. Care should be taken to preserve the case in both the entry_point and the module_name, or the engine will be unable to find the function when we need the function — even though the engine accepted and stored the declaration.

Here the byte_len function is declared with the name 'kitten':

 SQL> declare external function kitten int by descriptor returns int free_it
 entry_point 'BYTE_LEN' module_name 'TEST_MODULE';
 SQL> select kitten(name) from animals;

Result:

 Statement failed, SQLCODE = -902
 Access to UDF library "TEST_MODULE.DLL" is denied by server administrator

Did we forget to commit the declaration? We do the commit and retry. Same result.

The real problem is, we changed the names of the entry_point and module_name to uppercase and the engine could not find them. It proceeded to go looking for an external library hosting the function.

Since the execution happened in Windows, the engine uppercased the name and appended the DLL extension. It couldn't find it, so it complained.

The "access denied" error typically occurs in Firebird 1.5 and above, because the engine tries to search directories that have not been configured via the UDFAccess parameter in firebird.conf. Prior to v.1.5, the error message would have been module not found, which is more appropriate, since nobody asked the engine to look for the module in a forbidden path. Room for improvement.

Our module name is test_module but it could be another name. Choose a name that is unlikely to clash with the name an external UDF library.

Writing the function body

Now, we move down beneath the other functions and write our function's body code below the "ni" function. This is our first attempt. We'll lift testing code from the fbudf external function library to detect the three ways the engine has to signal NULL to a UDF:

 /* byteLen: return the length in bytes of a given argument.
    For NULL, return NULL, too.
    v = input descriptor
    rc = return value, allocated dynamically. To be freed by the engine.
    The declaration through SQL is:
    declare external function sys_byte_len
    int by descriptor
    returns int free_it
    entry_point 'byte_len' module_name 'test_module';
 */
 static SLONG* byteLen(const dsc* v)

 {
    if (!v || !v->dsc_address || (v->dsc_flags & DSC_null))
           return 0;

    SLONG* rc = (SLONG*) malloc(sizeof(SLONG));
    *rc = DSC_string_length(v);
    return rc;
 }

Memory allocation for parameters

Remember, we are running in the context of the engine. Any fatal bug in the UDF is a run-time error in the engine. The engine typically shuts itself down in response to runtime errors.

By reference vs by value

By default, the UDF returns values by reference (pointer). If we return by value, we lose the ability to tell the engine we are returning NULL. Therefore, if we received NULL, we would have to return zero. That is inconsistent: the size of an unknown value is in turn unknown. But that means also that the space for the returned value has to be allocated dynamically: if we return the address of a local variable, this will be invalid outside the function.

We could return the address of a static variable inside the procedure, since this is a global variable whose name is only known to the function that declares it. But this works only for testing: if we happen to have several requests running, we may face disaster (at least wrong results), as several threads running the UDF would overwrite the same variable.

To allocate memory dynamically, a UDF has to use the ib_util_malloc function defined by a utility library provided by the engine. However, since we are writing code inside the engine, we can safely work directly with malloc, because we know that definition of ib_util malloc is:

 void* EXPORT ib_util_malloc(long size)
 {
       return malloc(size);
 }

Never use the C++ new operator unless Firebird provides an alternative means in the future.

The FREE_IT keyword

The purpose of the FREE_IT keyword that appears in the declaration (shown in the comment at the top of the function) is to state that the engine will free the memory once the UDF returns and the returned value is retrieved.

Input data type

Although the input type is declared as int by descriptor", the declared type has no effect here: the engine will send any data type that is provided to the UDF, without the need to declare the same UDF again and again with different parameter types. However, internally, the engine may use the declared type.

Changing the SQL syntax to avoid the dummy type won't help much, as passing an unknown data type to the engine would confuse it. All data types are declared in jrd/dsc_pub.h and the unknown type is assigned the value zero. Usually, it means the engine has no knowledge of the type. There are places where it means the engine received something it could not deal with or an error occurred that prevented the type from being determined. Assume we need to declare a dummy input type, even if we use descriptors.

Testing the function

Now, we'll try out the functionality of our creation:

 F:\fb2dev\fbbuild\firebird2\temp\debug\firebird\bin>isql
 Use CONNECT or CREATE DATABASE to specify a database
 SQL> create database 'pb';
 SQL> declare external function sys_byte_len int by descriptor
 CON> returns int free_it entry_point 'byte_len' module_name 'test_module';
 SQL> select sys_byte_len('hello') from rdb$database;

 SYS_BYTE_LEN
 ============
            5

This is correct. The string literal 'hello' has five ASCII characters.

 SQL> select sys_byte_len(rdb$function_name) from rdb$functions;

Nothing?

This is because isql by default uses two transactions: one that autocommits any DDL change and the other that processes DML statements. The DML transaction uses snapshot isolation by default, so it cannot yet see the changes made to the system table when the engine's internal processing of the "declare external function" statement was passed to the engine through isql's DDL transaction.

The solution to the problem is to commit:

 SQL> commit;
 SQL> select sys_byte_len(rdb$function_name) from rdb$functions;

Result:

 SYS_BYTE_LEN
 ============
           31

In case this particular result seems surprising, most system fields (fields in system tables) are defined as plain CHAR types. A few are defined as VARCHAR. You'll notice our function calls DSC_string_length, some code that is defined in jrd/dsc.cpp. If you inspect that code, you'll discover that:

For CHAR (dtype_text), the stated length of the field is returned. This is the maximum length.

For CSTRING (dtype_cstring, used optionally just for passing or getting string values to/from UDF's), the maximum length minus one byte is returned, since this is assumed to be a C style (zero-terminated) string.

For VARCHAR (stype_varying), the stated length of the field minus the size of a USHORT (two bytes) is returned. This is because the "address" field of the descriptor contains the length, then the string. This arrangement is known as the "vary" structure, defined as paramvary in the externally visible ibase.h and the original is defined in firebird.h. Again, the maximum length is used.

For other types, DSC_string_length calls, in turn, DSC_convert_to_text_length. We'll refer to that result as "tl". There are three cases:

when the descriptor has no scale, tl is returned
when the scale is less than zero, tl plus one is returned, for the decimal point (values are kept as integer quantities with a negative power of ten)
when the scale is positive, the value has a positive power of ten. We need as many positions as the scale indicates, so we add the scale to tl

Again, for all those data types, the maximum value is returned. For any integer we get 11, for example.

Putting the function to use

Now we have to put our function into production. We'll create a table with animal names. Since we aren't going to try to do anything useful with it, we won't bother to define an ID column, a primary key or other typical elements. We'll use just a VARCHAR of maximum length 50, with only ASCII contents:

 SQL> create table animals(name varchar(50));
 SQL> insert into animals values('cat');
 SQL> insert into animals values('frog');
 SQL> insert into animals values('horse');
 SQL> insert into animals values('platypus');
 SQL> insert into animals values('nandu');
 SQL> commit;
 SQL> select sys_byte_len(name) from animals;

 SYS_BYTE_LEN
 ============
           50
           50
           50
           50
           50

It is no surprise that the names all return the maximum length of the varchar from the sys_byte_len expression, knowing how our helpers DSC_string_length and its callee DSC_convert_to_text_length work. We have reused functionality in the engine, but it doesn't give us the dynamic length.

Now, another test:

 SQL> select sys_byte_len(NULL) from rdb$database;
 Statement failed, SQLCODE = -104
 Dynamic SQL Error
 -SQL error code = -104
 -Token unknown - line 1, char 21
 -NULL

Again, not surprising: the engine demands that NULL must be cast to some known type before accepting it in expressions. The only places where the literal NULL is accepted are in comparisons (where an equality test makes no sense and should be replaced by IS NULL) and when inserting or updating records.

So we please the engine:

 SQL> select sys_byte_len(cast(NULL as int)) from rdb$database;

 SYS_BYTE_LEN
 ============
 <null>

This is what we want: NULL as the input argument, NULL as the result.

Making the function useful

At this point, our little function is not very satisfactory. We wrote minimal code but sacrificed usability. We are going to rewrite it.

The comment at the top of the function is still the same. Then:

 static SLONG* byteLen(const dsc* v)
 {
    if (!v || !v->dsc_address || (v->dsc_flags & DSC_null))
      return 0;
    SLONG& rc = *(SLONG*) malloc(sizeof(SLONG));
    switch (v->dsc_dtype)
    {
      case dtype_text:
      {
          const UCHAR* const ini = v->dsc_address;
          const UCHAR* end = ini + v->dsc_length;
          while (ini < end && *--end == ' '); // empty loop body
          rc = end - ini + 1;
          break;
      }
      case dtype_cstring:
      {
          rc = 0;
          for (const UCHAR* p = v->dsc_address; *p; ++p, ++rc);
                                           /* empty loop body */
          break;
      }
      case dtype_varying:
      {
          rc = reinterpret_cast<const vary*>(v->dsc_address)->vary_length;
          break;
      }
      default:
      {
          rc = DSC_string_length(v);
          break;


       }
    }
    return &rc;
 }

The changes

Several changes were done:

First, the result from malloc was assigned to a C++ reference. This makes code easier to read and less prone to programming mistakes. With a pointer, it takes only one place where we forget to use *rc instead of rc and the code changes meaning, leading to a wrong or catastrophic outcome. The reference keeps the address of the variable or address it was assigned initially, so when we return &rc we are returning really the result of the malloc function.

Secondly, we give special custom treatments to CHAR, CSTRING and VARCHAR, respectively:

For CHAR (dtype_text) we ignore the trailing blanks.
For CSTRING (dtype_cstring) we apply the equivalent of strlen() to the string. We can't apply strlen() directly because it works for CHAR not unsigned CHAR.
For VARCHAR (dtype_varying) we recognize that the "address" data member of the descriptor is really the address of the aforementioned "vary" structure (paramvary for application code), force the conversion and get the length from that structure.

For the non-string cases, we still call DSC_string_length. This means that we still get the maximum length in bytes applicable to the string representation of the other types.

Looking at the table _DSC_convert_to_text_length in dsc.cpp we can see that integers will return length 11 and BIGINT(NUMERIC with PRECISION 18) will return 21. If we wanted to calculate the exact length of any number, we need more custom code to convert the number to string, then apply strlen() to the string. In dialect 3, the length of DATE, TIME and TIMESTAMP is always the same.

Testing the revised code

Now let's put our code to the test.

Strings

 SQL> select sys_byte_len('hello') from rdb$database;

 SYS_BYTE_LEN
 ============
            5


 SQL> select sys_byte_len(rdb$function_name) from rdb$functions;

 SYS_BYTE_LEN
 ============
           12


 SQL> select sys_byte_len(name), name from animals;

 SYS_BYTE_LEN NAME
 ============ ===========
            3 cat
            4 frog
            5 horse
            8 platypus
            5 nandu


 SQL> select sys_byte_len(cast(NULL as int)) from rdb$database;

 SYS_BYTE_LEN
 ============
       <null>


 SQL> select sys_byte_len(cast(rdb$function_name as varchar(31))) from rdb$functions;

 SYS_BYTE_LEN
 ============
           31

The result is an interesting illustration of how the engine works: applying sys_byte_len to the column RDB$FUNCTION_NAME gives 5 because our code knows how to strip blanks from CHAR types. However, casting a CHAR(31) forces the creation of a VARCHAR(31) preserving the trailing blanks as part of the field itself. Since for VARCHAR we rely on the vary struct, we get this result:

 SQL> select sys_byte_len(cast(rdb$function_name as varchar(31)) || '') from rdb$functions;

 SYS_BYTE_LEN
 ============
           12

As soon as we concatenate the previous expression with the empty string, we get 12 again. This gives us insight into the engine's internal behavior: be it SQL-compliant or not, the engine converts both values to VARCHAR, then concatenates them and converts the concatenated string to CHAR type again. Since our code knows how to strip trailing blanks from CHAR, we got 12.

Now, concatenating anything with NULL, as well as adding something to NULL, should be NULL and we get the expected result:

 SQL> select sys_byte_len(rdb$function_name || cast(NULL as varchar(1))) from rdb$functions;

 SYS_BYTE_LEN
 ============
       <null>


 SQL> select sys_byte_len('') from rdb$database;

 SYS_BYTE_LEN
 ============
            1


 SQL> select sys_byte_len( || ) from rdb$database;

 SYS_BYTE_LEN
 ============
            1

These two results are not strange. By default, literals are considered CHAR. Since CHAR doesn't have a length indicator and the minimum field length that can be declared is CHAR(1), we get 1.

 SQL> select sys_byte_len(cast('' as varchar(1))) from rdb$database;

 SYS_BYTE_LEN
 ============
            0


 SQL> select sys_byte_len(cast( ||  as varchar(1))) from rdb$database;

 SYS_BYTE_LEN
 ============
            0

We tried the same, but forced the conversion to VARCHAR. As expected, we got zero: the engine recognized that a delimited string without any space between the quotes is a VARCHAR(0).

Note that in all examples, single quotes have been used. This is what SQL dialect 3 expects for a literal string. If you copy the examples and confuse the pairs of single quotes with a double-quote, you'll get exceptions when you try to run them.

Numerals

Finally, we get our result for numerals:

 SQL> select sys_byte_len(9) from rdb$database;

 SYS_BYTE_LEN
 ============
           11


 SQL> select sys_byte_len(cast(9 as varchar(30))) from rdb$database; 

 SYS_BYTE_LEN
 ============
            1

As explained, the literal 9 is considered an INTEGER. If we want to get the length of a number as a string without modifying our function, we have to be explicit and cast the number to a VARCHAR, as in the second statement.

Blobs

Now let's see what happens with blobs. We know a blob field is really a field that keeps the ID of the blob as a QUAD, a structure equivalent in size to INT64 or BIGINT for SQL.

 SQL> create table binary(a int[5, 5], b blob);
 SQL> commit;
 SQL> insert into binary(a, b) values(NULL, NULL);
 SQL> commit;
 SQL> select * from binary;

                 A                 B
 ================= =================
            <null>            <null>


 SQL> select sys_byte_len(a), sys_byte_len(b) from binary;

 SYS_BYTE_LEN SYS_BYTE_LEN
 ============ ============
       <null>       <null>

Until now, we got the expected results: giving sys_byte_len a NULL should return a NULL. Then let's try to insert something in those fields. We can't insert into an array field directly from SQL but we can insert a literal string into a blob from SQL.

 SQL> insert into binary(a, b) values(NULL, 'hello world!');
 SQL> commit;
 SQL> select sys_byte_len(a), sys_byte_len(b) from binary;

 SYS_BYTE_LEN SYS_BYTE_LEN
 ============ ============
       <null>       <null>
       <null>       9

As before, we get the maximum size of the field type when represented as a string. In the case of blobs, only the blob_ID is displayed, with the format

 FFFF:FFFF

Hence, we get 9 as the result.

To know the real size of the blob (in bytes), we would need to open it using the public definition in ibase.h regarding the blob_callback structure, read the information from the data member "blob_total_length" and close the blob.

Since we are inside the engine, we could try to optimize by using the internal functions BLB_open and BLB_close instead and then use only the internal "blb" structure to read the data member "blb_length". If we went with the internal functions then, for safety, we would need to emulate what the callback functions do for external external UDFs.

For example, blob_put_segment:

 THREAD_ENTER();
 thread_db* tdbb = JRD_get_thread_data();
 BLB_put_segment(tdbb, blob, buffer, length);
 THREAD_EXIT();

Code such as this is necessary because it isn't possible to use the blob_callback structure at this point. When we work with descriptors, the engine sends the blb structure, not the blob_callback's wrapper functions. For more information, study blb.cpp (low level blob handling) and fun.epp (external function handling).

Registering our function

Our last step is to register the function. Instead of giving the user a script, we register the IUDF automatically in each new database as it is created. Initialization of fresh databases is the task for ini.epp. This file actually has code for automatic minor ODS updates, but that functionality was disabled in Firebird v1.5 because it led to problems.

The orthodox way to modify ini.epp is:

1. Create an array with the rows to be inserted in a system table. See for example the arrays named triggers and trigger_messages near the top of this file. The array may come from some header file; for example "generators" is taken from trig.h, "relfields" (source for RDB$RELATIONS and RDB$RELATION_FIELDS) comes from relations.h and "types" (input for RDB$TYPES) comes from types.h and intlnames.h.

2. Find the function INI_format. Just before the DFW_perform_system_work call, invoke a new function to insert the needed fields. The signature logic would be like this:

 /* store trigger messages to go with triggers */

    handle1 = NULL;
    for (const trigger_msg* message = trigger_messages;
       message->trigmsg_name; ++message)
    {
          store_message(tdbb, message, &handle1);
    }
    CMP_release(tdbb, handle1);

3. The declaration of the function should be:

 static void store_message(thread_db*, const trigger_msg*, jrd_req**);

The body of store_message gives you an idea what has to be done. Typically, the function calls

 SET_TDBB(tdbb);

at the beginning and then goes to insert, one row at a time. The loop is controlled by the caller. This is why a pointer to the handle is passed, so the handle returns initialized to the caller that will release it.

Given the small amount of information we need to save, we'll put the code in place, after the loop and CMP_release calls for trigger messages and before the DFW_perform_system_work call. Since we need to insert two parameters, the code will be repetitive:

 // CVC: Starting demonstration code to register IUDF automatically.
 handle1 = NULL;
 STORE(REQUEST_HANDLE handle1) F IN RDB$FUNCTIONS
    PAD("SYS_BYTE_LEN", F.RDB$FUNCTION_NAME);
    F.RDB$FUNCTION_NAME.NULL            = FALSE;
    //F.RDB$FUNCTION_TYPE                 <null>
    //F.RDB$QUERY_NAME                    <null>
    //FRDB$DESCRIPTION                    <null>
    /* gds__vtov("test_module",
       F.RDB$MODULE_NAME, sizeof(RDB$MODULE_NAME) - 1); */
    strcpy(F.RDB$MODULE_NAME, "test_module");
    F.RDB$MODULE_NAME.NULL               = FALSE;
    PAD("byte_len", F.RDB$ENTRYPOINT);
    F.RDB$ENTRYPOINT.NULL                = FALSE;
    F.RDB$RETURN_ARGUMENT                = 0;
    F.RDB$SYSTEM_FLAG                    = RDB_system;
 END_STORE;
 CMP_release(tdbb, handle1);

 handle1 = NULL;

 STORE(REQUEST_HANDLE handle1) FA IN RDB$FUNCTION_ARGUMENTS
   PAD("SYS_BYTE_LEN", FA.RDB$FUNCTION_NAME);
    FA.RDB$FUNCTION_NAME.NULL         = FALSE;
    FA.RDB$ARGUMENT_POSITION          = 0;
    FA.RDB$MECHANISM                  = - (int) FUN_reference;
                                     // FREE_IT
    FA.RDB$FIELD_TYPE                 = (int) blr_long;
    FA.RDB$FIELD_SCALE                = 0;
    FA.RDB$FIELD_LENGTH               = 4;
    //FA.RDB$FIELD_SUB_TYPE             <null>
    //FA.RDB$CHARACTER_SET_ID           <null>
    FA.RDB$FIELD_PRECISION            = 0;
    //FA.RDB$CHARACTER_LENGTH           <null>
 END_STORE;

 STORE(REQUEST_HANDLE handle1) FA IN RDB$FUNCTION_ARGUMENTS
    PAD("SYS_BYTE_LEN", FA.RDB$FUNCTION_NAME);
    FA.RDB$FUNCTION_NAME.NULL         = FALSE;
    FA.RDB$ARGUMENT_POSITION          = 1;
    FA.RDB$MECHANISM                  = (int) FUN_descriptor;
    FA.RDB$FIELD_TYPE                 = (int) blr_long;
    FA.RDB$FIELD_SCALE                = 0;
    FA.RDB$FIELD_LENGTH               = 4;
    //FA.RDB$FIELD_SUB_TYPE             <null>
    //FA.RDB$CHARACTER_SET_ID           <null>
    FA.RDB$FIELD_PRECISION            = 0;
    //FA.RDB$CHARACTER_LENGTH           <null>
 END_STORE;

 CMP_release(tdbb, handle1);
 // CVC: Finishing demonstration code to register IUDF automatically.

We got those values easily, by registering the UDF with standard SQL, then reading the resulting values in the system tables and changing them to mnemonic constants.

For example, given that the mechanism for the first parameter is -1, we replace it by FUN_reference from val.h but with negative value that means the engine should free the parameter (properly allocated by the UDF with ib_util_malloc as explained previously). The field type in both cases is 8, so we replaced it with blr_long from blr.h since, in the engine, long means INT32.

Note: system tables use BLR data types that are different from the data type in the descriptor's dsc_dtype data member: there it is 9 for dtype_long.

Now, creating any new database will register the UDF registered without any manual intervention. Because it is compiled inside the engine at the same time as the rest of the engine's code, there is no risk that the IUF will not be found when its is called, as may happen with entry points in external libraries, for various reasons.

The system flag

One thing we did that the SQL code cannot do is to register the UDF with the system flag activated (the constant RDB_system is 1). The system flag can, of course, be set from NULL or zero to 1 by directly manipulating the system tables. However, note that, once activated, the system flag cannot be deactivated by setting it back to 0 or NULL — a system trigger forbids it.

This paper was written by Claudio Valderrama and is copyright IBPhoenix Inc. You may republish it verbatim, including this notation. You may update, correct, or expand the material, provided that you include a notation that the original work was produced by Claudio Valderrama and IBPhoenix.