Creating UDFs in Delphi

<< Using descriptors with UDFs | Database technology articles | Implementing a Soundex function >>

What is a UDF?
How to create a DLL in Delphi^Ž
Declare the functions to the database
Test the functions
Issues to be aware of when creating
UDFs
1. Multi-threading issues
2. Memory management issues
References

Creating UDFs in Delphii^Ž

By Brett Bandy and Wayne Shaddock

This paper will provide step by step instructions for creating a UDF in Delphi^Ž. It will lead you through the process of creating a DLL in Delphi^Ž and declaring your functions to a database so that they can be used.

What is a UDF?

An InterBase^Ž UDF is a function that resides in a shared library. Windows calls its shared libraries DLLs (Dynamic Link Libaries). Thus, to create a UDF to use with InterBase^Ž you must create a DLL. Further, InterBase^Ž is written in C so when it deals with UDFs it requires them to behave in a certain fashion. All functions pass their parameters on the stack. However, not every language pushes those parameters on the stack in the same fashion. InterBase^Ž requires all UDFs to adhere to the C calling convention. Fortunately, Delphi^Ž can create DLLs that use the C calling convention.

How to create a DLL in Delphi^Ž

Delphi^Ž provides a wizard to get you started in creating your DLL. You use this wizard by bringing up the New Item dialog (menu item: File / New). You click on the DLL icon and it generates a new DLL project for you. Delphi^Ž differentiates a DLL project from a normal executable project via the keyword 'Library'. You will notice the Library keyword at the top of your project's file. I will list the steps required to create the DLL first, then explain each of them in greater detail.

Create the DLL project
Save the project with desired project name
Create a new unit for your functions
Export the functions
Build the library

I have already talked about creating a DLL project via the Delphi^Ž wizard, so I'll move on to the next step. Before you do anything else save your project. I strongly recommend that you save often when working in Delphi^Ž. So, save your project and give it a name that represents what the project does. In the case of a DLL project, the DLL by default will be named the same as the name of the project. So... name wisely.

Next, you should create a unit in which to develop your DLLs functions. It is also possible to create all your functions in the project file, but it is adviseable to create units in which to write your functions. Using units allows you to seperate your functions into logical groups. For example, FreeUDFLib is comprised of several units. There is a unit for string functions, as well as a unit for blob functions, just to name a few. It makes it easier to find and work with code that is logically separated into smaller groups (in this case Delphi^Ž units). It is also easier to reuse code if it is broken down into smaller units. So... create your unit to house your functions.

To create your unit you again use the New Item dialog (File / New). Select the unit icon and it will create a new unit and add it to your project. You should then save your project, which will require you to give the new unit a file name. As a side note, I suggest that when naming projects and units that you give them descriptive names. You don't need to stick to the 8.3 file name convention.

At this point you are set to start coding your functions. The details of programming and coding are outside the scope of this paper. I will assume that you are successful in your function creation and will move on.

At this point you have a project, saved I hope, with a set of functions. You next need to tell Delphi^Ž that you want to export the set of functions that you have just created. Exporting a function allows other applications to link with your DLL and use your exported functions. If a function is not exported then it cannot be used by applications that link with your DLL. To export functions in Delphi^Ž you must use the keyword 'exports' along with a list of functions that you are exporting. The following library file shows the use of the 'exports' keyword to export three functions from the library Delphi^Ž UDF. The functions sysDate, sysTime, and returnFive are available for applications to use when they link with this DLL, called Delphi^Ž UDF.dll.

 library Delphi^Ž UDF; 

 uses 
 SysUtils, 
 Classes, 
 brettFuncs in 'brettFuncs.pas'; 

 exports 
 sysDate, 
 sysTime, 
 returnFive; 

 begin 
 end.

So far you have created your project, coded the functions, and listed all the functions that are to be exported from the DLL. The only thing left is to build the DLL. To build a project in Delphi^Ž you just select the menu item to build your project (Project / Build projectname). Barring any coding errors your DLL should exist in the directory you saved your project in.

You are now done with the Delphi^Ž part of this exercise. You have created a DLL that exports your functions. Congratulations!

Declare the functions to the database

Move the DLL to directory where it can be loaded. Now that you have a working DLL you need to declare the functions to a database so that you can use them with InterBase^Ž. Before you create your function declaration script you must make sure that InterBase^Ž can find the DLL. InterBase^Ž uses the Win32 API function LoadLibrary to load DLLs and execute functions. LoadLibrary has a set order for finding DLLs on a system. Here is the search order that LoadLibrary performs when trying to load a library.

Search in the directory from which the calling application was loaded. In the case of InterBase^Ž we are talking about the ibserver.exe process. This process is started from the InterBase^Ž /bin directory (C:Program Files/InterBase Corp/InterBase/bin by default).
Search in the current directory.
Search in the Windows system directory. For Win95 this is the windowssystem directory. For NT this is the winntsystem32 directory and the winntsystem (16-bit system directory).
Search in the windows directory. For Win95 this is the windows directory. For NT this is the winnt directory.
Search all directories specified in the PATH environment variable.

The UDF definitions are stored in an InterBase^Ž database. This means that you have to declare the UDFs to every database that you want to use them in. Here is the syntax for declaring a function to a database.

 DECLARE EXTERNAL FUNCTION name [ datatype | CSTRING ( int) 
 [, datatype | CSTRING ( int) ]] 
 RETURNS { datatype [BY VALUE] | CSTRING ( int)} [FREE_IT] 
 ENTRY_POINT ' entryname' 
 MODULE_NAME ' modulename';

I have provided an example function definition:

 declare external function getSysDate 
 returns CSTRING(10) FREE_IT 
 entry_point "sysDate" 
 module_name "Delphi^ŽUDF.dll";

This example does not have any input parameters, but does have one return parameter. The return parameter is defined as a CSTRING() which is a null-terminated string. Consult the InterBase^Ž Language Guide for more details on declaring functions to a database.

Test the functions

Now that the functions have been built and declared to the database they should be ready to use. You can test your functions by using isql and a SELECT statement involving the functions. So... an example query that would test the getSysDate function, which is defined above, would be:

 select getSysDate() from rdb$database;

This query uses the rdb$database table so that only 1 row will be returned. This query will execute the UDF getSysDate and display the string that it returns.

Always test in an environment that mimics the production environment This is only a first pass at testing your UDFs. Further testing should be done to ensure that your UDF will work in a concurrent multi-user environment. As always, I recommend setting up your test environment to copy the environment where your database will be put into production.

Issues to be aware of when creating UDFs

I will next discuss a few of the issues that arise when developing UDFs for InterBase^Ž. The issues that will be discussed should be fully understood BEFORE UDFs are deployed to a production environment. A lot of these issues only show up in multi-user environments, which is why I suggest that your test environment mimic the production environment of the database.

Multi-threading issues

The most common set of problems occur because InterBase^Ž v.4.21 and later employ the Superserver architecture. This architecture is a single-process, multi-threaded server. Because of this UDFs must be thread-safe. I will discuss a few of the major multi-threading issues.

Calling other functions

For a UDF to be thread-safe its code must be thread-safe as well as all functions that it uses. The UDF becomes not thread-safe if any function it calls is not thread-safe. For example, if I had a thread-safe UDF, but then added a call to a non-thread-safe library (non-multithreaded C runtime library), then my function is then not thread-safe and may cause the ibserver.exe process to abnormally terminate with an access violation. So, every function that you call directly and every function that is called indirectly must be thread-safe as well.

Synchronization

When sharing resources in a multi-threaded environment, those resources must have a means to control access to them. In a multi-threaded environment multiple instances of your UDF may be executed concurrently. If your UDF uses some sort of shared resource, anything from a global variable to a memory-mapped file, you must setup some sort of protection so that only one instance of your UDF can access the shared resource at any given time. If you don't control access then that resource may get corrupted. Every platform InterBase^Ž runs on has a set of synchronization techniques (mutexes, semaphores, ...) that can be used to control access to shared resources. The discussion of synchronization techniques is beyond the scope of this paper. Just let it be known that there are several ways to synchronize access to shared resources and it is up to the developer to decide which one is the correct choice for the situation.

When you start dealing with more than one shared resource then you open your UDF up for deadlock situations. A deadlock situation is basically where one function, lets say funcA, has a locked resource, but needs another locked resource. Additionally, there is a second function, lets say funcB, that holds the locked resource needed by funcA. FuncB itself needs the locked resource held by funcA. Each function has a resource, and each function needs a resource held by the other function. Neither function will be able to complete its task, because it will never get the second resource that it requires. This is a deadlock situation. The only way to resolve a deadlock situation is to terminate one of the functions so that the other may finish its task. This is not a desired course of action.

When you are writing UDFs that use more than one shared resource you need to be wary of deadlock situations. There is a general rule to follow when dealing with multiple shared resources. Code all your functions to acquire the shared resources in the same order. So, every function would have the same order for acquiring the set of shared resources required. What this does is prevent the sort of deadlock situation described above. Only one function can have the first shared resource on the list, and no other shared resource can be held until the first one is acquired. There are many other issues involved in deadlock avoidance, but setting up a resource acquisition strategy such as this will reduce the chances of a deadlock occurring.

Non-synchronized variables may cause corruption When using variables in your application it is wise to understand the scope of where each variable is visible and who can access those variables. As a rule: global variables are a no-no. Global variables are available to all threads in the process. This means that every instance of your UDF that is executed has access to the global variable. You must ensure that the global variable is protected so that concurrent executions of your UDF don't corrupt the variable.

Likewise, when using static local variables you must ensure that the variable is protected. Static local variables are variables local to a function, but don't go "out of scope" when the function ends. They exist for the life of the application. Unlike normal local variables, there is only one copy of the variable, not one for each execution of the function. You must ensure that these static local variables are accessed simultaneously, which in most cases will corrupt the variable.

Thread-local storage

Thread-local storage (TLS) is allocated memory that is associated with a particular thread. TLS allows you to create a variable that is local to a specific thread, a memory location which can be viewed only by the thread that it is created for. Windows defines two types of TLS; dynamic and static. Right to the point; dynamic TLS will work, but static TLS will NOT.

Dynamic TLS is named as such because you dynamically allocate the local storage at runtime. Windows defines a set of functions to manipulate dynamic TLS. I list them here:

 TlsAlloc 
 TlsFree 
 TlsSetValue 
 TlsGetValue

All access to dynamic TLS is through these functions. Developers just need to call these functions to manipulate the TLS for each thread. The functions work in the context of the current thread. For example, if you execute the TlsSetValue function it will set the value for the storage allocated for the current thread. The developer doesn't have to worry about how to access the current thread's storage or how to synchronized access to the storage area. Windows controls the low level access and synchronization to the TLS. For those of you that are familiar with FreeUDFLib, this is the mechanism that Greg Deatz uses.

I won't go into great detail into why static TLS will NOT work, but will delve into a brief explanation. See Advanced Windows for more details on static TLS. The short of it is that storage for all static TLS variables is allocated for each thread when it is created. However, and here lies the problem, when DLLs are loaded they can contain static TLS variables. The dynamically loaded DLL (your UDF) will be mapped into the process and the static TLS variables will be mapped into all existing threads. The key point here is that it will be mapped only into the existing threads. InterBase^Ž dynamically creates new threads when necessary. So, all threads that are created after the UDF is loaded will not have this static TLS variable mapped. For example, say the first connection executes the UDF function. The DLL is dynamically loaded by ibserver.exe and the DLL's static TLS variables are mapped for each of the existing threads. Now, the server gets bombarded by attachments. There are so many attachments that the server creates a number of new worker threads to handle the attachments. One of these attachments executes the UDF that is already loaded. The UDF relies on the assumption that the static TLS variable exists and can be accessed. Depending on what the static TLS variable is used for the function can return bad data (corruption) to InterBase^Ž or worse: try to write to the location and either corrupt InterBase^Ž's process space or try to write outside of InterBase^Ž's process space. Either one will frequently result in an Access Violation and the server will crash.

Memory management issues

Memory management issues deal more with memory leaks than server crash problems. However, as we all know InterBase^Ž doesn't do very well when the system is low on memory. In fact, InterBase^Ž will often crash when the operating system says no more memory. So, it is advantageous to understand the memory management issues when developing UDFs for InterBase^Ž.

On Windows platforms there isn't a standard memory allocation function. Each compiler implements its own memory management functions. There also isn't any standard for how to implement memory management. Because of this you cannot mix and match your memory management calls. If you allocate memory with one compiler's allocation function, you must free the memory with that same compiler's deallocation function. You can't, for instance, allocate memory with MicroSoft Visual C++'s allocation function and free it with a Borland compiler's deallocation function.

InterBase^Ž v.5.x is compiled with MSVC++ (don't ask), and we make use of the memory management functions provided by MSVC++. This means that all InterBase^Ž memory allocation and deallocations must be with the MSVC++ memory management functions. This causes a problem for certain types of UDFs that are written with a Borland compiler (BC++, Delphi^Ž, or C++Builder^Ž). UDFs that dynamically allocate memory for return parameters and specify the FREE_IT keyword are subject to memory leaks. The FREE_IT keyword specifies that the memory associated with the return parameter should be freed by InterBase^Ž when it is done with it. This causes a problem for InterBase^Ž when the memory is dynamically allocated using a Borland compiler. InterBase^Ž cannot free the memory, because it wasn't allocated by MSVC++'s memory allocation function. What it boils down to is the fact that InterBase^Ž and the UDF are using different pools to draw memory from. InterBase^Ž, using MSVC++ memory management functions, doesn't have access to the pool of memory being used by the UDF. So, everytime the UDF is called it will allocate a chunk of memory that is never freed, but all references to it are released. This causes a memory leak and bloats the ibserver.exe process. This memory will not be freed until the ibserver.exe process is stopped and restarted.