Java UDF functional specification

<< InterBase and Java: storing and retrieving BLOB data | Database technology articles | Using Firebird with Open Office and Windows >>

Java^Ž UDF functional specification

Copyright www.IBPhoenix.com

Introduction

The term native UDF is used to distinguish UDFs which use the standard C linkage conventions. C, C++, and Delphi compilers can create UDF libraries (DLLs or Unix shared libraries) which use the C calling convention.

Native UDFs are deployed natively as shared UDF libraries (.dll or .so files), loaded dynamically as needed. Java^Ž UDFs are not deployed as a native shared library. Since native UDFs are encoded natively, native InterBase^Ž type representations are passed directly to/from native UDFs.

For example, a language such as Delphi can represent the C data type ISC_QUAD* directly using the Delphi syntax

 type 

   ISC_QUAD = record

     isc_quad_high : Integer ;
     isc_quad_low : Cardinal ; 

   end;

   PISC_QUAD = ^ISC_QUAD;

Such a pointer may be passed on the stack to the Delphi UDF in exactly the same way as it would be passed to a C UDF. So executing native UDFs for any compiled language supporting C data type representations and C calling conventions requires no additional support within the InterBase^Ž engine.

Java^Ž, on the other hand, is not compiled, and with no pointers or structures, has no language support for C data type representations, and does not support native C calling conventions. So executing Java^Ž UDFs requires communication with a Java Virtual Machine (JVM), and Java^Ž objects will need to be created within the InterBase^Ž engine in a portable way, independently of the object representation used by any particular JVM implementation.

The Java Native Interface (JNI) provides a mechanism for creating and manipulating portable Java^Ž objects as internal C structures and pointers. So for example, the representation for an 8-byte C date structure (ISC_QUAD*) can be converted by the engine to a java.sql.Date object before being passed to a Java^Ž UDF method. The java.sql.Date object representation would be created within the engine in a portable way via the JNI, but the actual underlying C representation for the object would be peculiar to whatever JVM implementation happens to be embedded. Similarly, the representation for an InterBase^Ž C string (char*) can be converted by the engine to a Java^Ž Unicode java.lang.String object before being passed to a Java^Ž UDF method.

Note: Beyond the lack of language support for C data type representation, in fact, the Java^Ž Language Specification does not even define how objects are laid out in memory. Java^Ž does not expose the structure of objects. So while a string in C has a known format as a sequence of character bytes in memory, this is not the case for a java.lang.String object. The JNI API provides a means to create and access Java^Ž objects in C without exposing the internal memory layout (structures) of Java^Ž objects which can vary from one VM port to another.

So there are three key requirements for supporting Java^Ž UDFs from within InterBase^Ž:

Provide support for describing Java^Ž UDF structures in so far as Java^Ž UDF descriptions differ from native UDF descriptions.
Provide support for converting InterBase^Ž native data types to/from JNI representations within the engine before passing/returning such representations to/from a Java^Ž UDF.
Provide a means to embed and configure a Java Virtual Machine (JVM) for interpreting Java^Ž UDFs during SQL execution.

The functional implications of these requirements are detailed in section User interface/usability, but may be summarized briefly as follows:

A DDL syntax to distinguish Java^Ž UDFs from native UDFs. Java^Ž UDF invocations (DML) will be syntactically identically with native UDF invocations, but must be distinguishable semantically. Furthermore, Java^Ž UDF descriptions are different from native UDF descriptions. For example, unlike native UDFs, Java^Ž UDFs are not known by a MODULE_NAME and ENTRY_POINT. Instead, Java^Ž UDF DDL must describe a Java^Ž UDF's class and method name. In addition, Java^Ž UDF DDL will use Java^Ž UDF datatype names, and will not employ descriptors such as FREE_IT, and BY VALUE, which are only meaningful in the context of native UDFs and pointer-based memory management.
A system table for describing Java^Ž UDFs and associated metadata extract.
A set of Java^Ž UDF datatypes and associated Java^Ž classes.
An extended mechanism for InterBase^Ž error handling in support of recoverable Java^Ž exceptions.
JVM configuration options, including the specification of a Java^Ž class path.
An installation which deploys a Java Runtime Environment (JRE).

This specification assumes an understanding of native UDFs, and does not attempt to describe the meaning of UDFs in general, except where the behavior of Java^Ž UDFs differs from that of native UDFs.

Also note that Sun changed the name of Java^Ž 1.2 to Java^Ž 2, and JDK 1.2 to Java^Ž 2 SDK. These terms are now used synonymously.

Description

Support for Java^Ž UDFs (User Defined Functions) will allow for an external library of Java^Ž classes and methods to be utilized anywhere that a SQL function may be used. This provides for runtime SQL execution to perform data manipulation tasks by communicating directly with a Java Virtual Machine (JVM) local to the InterBase^Ž server. Through the use of the Java Native Interface (JNI) we can embed and use a Java^Ž VM in a standardized way that will work with any VM implementation supporting the JNI. All future evolutions of the JNI will maintain complete binary compatibility.

The recursive evaluation (execution) of SQL containing a Java^Ž UDF invocation will perform all necessary conversions of the UDF arguments and UDF return values from the InterBase^Ž native datatype structures to the corresponding data and character representations used by Java^Ž. For example, InterBase^Ž DATEs, TIMEs, and TIMESTAMPs will be converted to Java^Ž objects of class java.sql.Date, java.sql.Time, and java.sql.Timestamp respectively. Blob UDFs will be supported through the use of a customized public Java^Ž Blob library which invokes callbacks to the InterBase^Ž engine for performing gets and puts on Blob segments.

User interface/usability

When defining a Java^Ž UDF, there are two declarations to consider. The first declaration is for the actual Java^Ž Method in some external Java^Ž class library. The second declaration is for the UDF itself as declared to a database. Although the two declarations must correspond by the typing of their arguments and return value, they are nonetheless distinct declarations, and they will be referred to as the Java^Ž Method Declaration and the Java^Ž UDF Declaration.

For native UDF declarations, UDF type names, rather than C type names, are used to denote the types of UDF arguments and return values. For example,

 DECLARE EXTERNAL FUNCTION foo
   BLOB,             
 // C type is a blob structure pointer *Blob
   CSTRING(n),       
 // C type is *char, InterBase type is CHAR or VARCHAR
   NUMERIC(n),        
 // C type is ISC_QUAD*
 int or short, 
 //InterBase type is INT64, INTEGER, or SMALLINT.
   RETURNS TIMESTAMP 
 // C type is ISC_QUAD* from the ibase.h file
   ENTRY_POINT "C-function-name"
   MODULE_NAME "udflib.dll";

Similarly for Java^Ž UDFs, an extended class of UDF type names are used to denote the types of UDF arguments and UDF return values rather than the actual Java^Ž type names used in the Java Method Declaration. For example,

 DECLARE EXTERNAL JAVA FUNCTION foo
   BLOB,             
 // Java type is class com.borland.interbase.Blob
   JSTRING(n),       
 // Java type is java.lang.String, InterBase type is CHAR or VARCHAR
   NUMERIC(n),       
 // Java type is java.math.BigInteger
   RETURNS TIMESTAMP 
 // Java type is java.sql.Timestamp
   CLASS "class-name"
   METHOD "method-name";

Although Java^Ž type names are not used in the Java^Ž UDF declaration, each UDF datatype corresponds strictly with a Java^Ž type or class.

Note: JSTRING is a new UDF type name to be introduced in support of Java^Ž UDFs. The example above is meant to be introductory and illustrative only, the meaning of the Java^Ž UDF declaration syntax will be described later.

When a Java^Ž UDF is invoked by InterBase^Ž, arguments are provided whose engine native types must be converted to the corresponding Java^Ž types. Because of the necessary datatype conversions from InterBase^Ž native structures to Java^Ž representations, Java^Ž UDF invocations must be distinguishable from native UDF invocations. Given a Java^Ž UDF declaration, it must also be possible [for the user] to infer the Java^Ž types of the corresponding method arguments and method return value. Therefore the SQL syntax for declaring a UDF must provide a means to indicate that the UDF is a Java^Ž UDF, as well as provide a means to indicate the Java^Ž UDF types of the UDF arguments and UDF return value.

Java^Ž UDF datatypes

The correspondence between the datatyping of Java^Ž UDFs and their corresponding Java^Ž Methods is as follows:

Java^Ž UDF Declared Type	Java^Ž Method Declared Type	Description
`JSTRING`	`java.lang.String`	The UDF type `JSTRING` indicates that the Java^Ž UDF expects an object of class `java.lang.String` to be passed or returned. This is analagous to `CSTRING` for native UDFs. Except that native UDFs perform no implicit character conversions, and no character encoding is enforced on the passed C strings (C strings are passed byte-for-byte). InterBase^Ž `CHAR` or `VARCHAR` fields of any character set may be passed to Java^Ž UDFs by an implicit conversion to a Java^Ž Unicode String. Any native InterBase^Ž character set which is convertable to and from intermediary Unicode FSS by the InterBase^Ž engine will be supported. Design note: The conversion from the intermediary Unicode FSS to a 2-byte Unicode representation is handled by the Java^Ž UDF implementation. The user will not be aware of the intermediary Unicode FSS representation.
`TIMESTAMP, TIME, or DATE`	`java.sql.Timestamp`, `java.sql.Time`, or `java.sql.Date`	Dates and times may be passed to and from Java^Ž UDFs by converting InterBase^Ž dates and times to and from Java^Ž dates and times as described by the JDBC `java.sql` interfaces. Design note: This conversion code currently exists in InterClient, and may be reused.
`BLOB`	`com.borland.interbase.Blob`	The UDF type `BLOB` indicates that the Java^Ž UDF method expects an object of class `com.borland.interbase.Blob` to be passed or returned. A `Blob` class is introduced to encapsulate an InterBase^Ž blob and provide the necessary methods for getting and putting segments to the blob.
`NUMERIC(p,s) or DECIMAL(p,s)`	`java.math.BigInteger`	Exact numerics could be passed to and from Java^Ž UDFs by converting the InterBase^Ž `INT64`, `INTEGER`, and `SMALLINT` representations to and from Java^Ž `longs`, `ints`, and `shorts` respectively, depending on precision `p`. But, as with native UDFs, this would not provide `scale`, and the user is therefore burdened with having to know the scale of the field ahead of time, and make any required scale adjustments within the UDF. Using `java.math.BigInteger` is a better choice as this provides the exact value of the numeric pre-adjusted for the scale. Note: As with native UDFs, precisions greater than 18 are not supported, but could be accomodated by `java.math.BigInteger` in some future release of InterBase^Ž. `java.math.BigInteger` is the standard JDBC class used for large numerics.
`DOUBLE PRECISION`	`double`	The Java^Ž UDF method expects a Java^Ž `double` to be passed or returned.
`INTEGER`	`int`	The Java^Ž UDF method expects a Java^Ž `int` to be passed or returned.
`SMALLINT`	`short`	The Java^Ž UDF method expects a Java^Ž `short` to be passed or returned.

So, for example, a Java^Ž UDF may be declared to the database using a type name of BLOB, but the actual Java^Ž Method invoked by the UDF must be declared using class com.borland.interbase.Blob.

Java^Ž UDF declaration syntax and semantics

The Java^Ž UDF declaration syntax (DDL) must be supported by DSQL, ISQL (which, as a design note, happens to be built on top of DSQL), and GPRE. The Java^Ž UDF invocation syntax (DML) will be identical with the native UDF invocation syntax. Which Java^Ž method is actually executed as a result of a Java^Ž UDF invocation depends on three settings of class name, method name, and classpath.

The proposed syntax for declaring Java^Ž UDFs will follow. Please see the section Syntax conventions for a description of the extended BNF notation used below. The LALR(1) syntax for Java^Ž UDFs is deferred as a design consideration.

 DECLARE EXTERNAL JAVA FUNCTION udf-name
 [ java-udf-datatype .,..]
 [ RETURNS { java-udf-datatype
           | PARAMETER argument-position } ]
 CLASS "class-name"
 METHOD "method-name";

 java-udf-datatype ::=
             JSTRING (maximum-character-length)
           | NUMERIC(p,s) | NUMERIC(p) | DECIMAL(p,s) | DECIMAL(p)
           | DATE | TIME | TIMESTAMP
           | BLOB
           | DOUBLE PRECISION
           | INTEGER
           | SMALLINT

The semantics of java-udf-datatype have already been described under Java^Ž UDF Datatypes above. For native UDFs, type CSTRING requires a maximum-byte-length qualifier. For Java^Ž UDFs, type JSTRING requires a maximum-character-length qualifier. A brief semantics of the other syntactic components follows:

udf-name is the string token representing the invocable UDF name. This is the name that is used when invoking the UDF in an SQL expression, and is not necessarily the same as the method-name.
RETURNS PARAMETER argument-position is used to indicate that the return value is stored in the parameter identified by argument-position.
CLASS "class-name" is used to indicate the class name containing the Java^Ž method for the defined function.
METHOD "method-name" is used to indicate the static Java^Ž method name for the defined function.

Note: The number of parameters to a native UDF is limited to 10. There is no such limit to the number of parameters to a Java^Ž UDF.

Note: For simplicity, java-udf-datatype is used for both input-parameter datatypes and return-parameter datatypes. However, certain limitations are imposed on the syntactic rules. In particular, for both native UDFs and Java^Ž UDFs, a BLOB may not be used as a return-parameter datatype, instead a RETURNS PARAMETER n clause must be used.

Note: Memory management of returned values from Java^Ž UDFs does not need to be explicitly controlled by the user as with native UDFs via BY VALUE and FREE IT. Internally, Java^Ž objects created within the engine will have their references destroyed after use. So if a user's Java^Ž UDF code maintains no references to the returned data, that data is eligible for garbage collection by the VM.

Note: The setting of a classpath will be a JVM configuration, and not a Java^Ž UDF setting.

Public class signature for `com.borland.interbase.Blob`

The user interface, or class signature, provided in support of type BLOB is as follows:

 package com.borland.interbase;
 /**
  * This class represents a Blob as passed to a Java^Ž UDF.
  * A Blob UDF cannot open or close a Blob, 
  * but instead invokes Blob methods to perform Blob access.
  * A UDF that returns a Blob does not actually define a return value.
  * Instead, the return-Blob must be passed as the last 
  * input parameter to the UDF.
  **/

 public class Blob

 {
   /**
    * Read a Blob segment into a buffer, and return the number 
    * of bytes read.
    **/

   public int getSegment (byte[] buffer)

   /**
    * Write a Blob segment of bytesToPut bytes from a buffer.
    **/

   public void putSegment (byte[] buffer, int bytesToPut);

   /**
    * Returns the total number of segments in the Blob.
    **/

   public long numberOfSegments ();

   /**
    * The size, in bytes, of the largest single segment in the Blob.
    **/

   public int maxSegmentLength ();

   /**
    * Returns the actual total size, in bytes, of the Blob.
    **/

   public long size ();

 }

JVM configuration

A JVM may be shared by the InterBase^Ž server and all its connections (users). The JVM is thread-safe and therefore may be shared by concurrent query threads. The JVM must be configured when the JVM is initialized, so the JVM may only be configured once after the InterBase^Ž server is started, and the configuration must be at the server level. If the JVM is to be reconfigured, the InterBase^Ž server must be shutdown and restarted.

Let's consider the functional requirements for a configurable JVM.

Functional requirements for a server-wide configurable JVM

First off, we'll need to have a way to configure the server to enable Java^Ž UDF support. This could be an option in the ibconfig file such as:

 LOAD_JAVA_VIRTUAL_MACHINE TRUE

or it could be a system environment variable of the same name. The default for LOAD_JAVA_VIRTUAL_MACHINE must be FALSE since InterBase^Ž installations in general will not deploy Java^Ž UDFs.

When the JVM is initialized, the classpath for all user-defined Java^Ž classes must be supplied. The classpath indicates the location of all Java^Ž class libraries for the Java^Ž UDFs and must be local to the InterBase^Ž server. By default, the classpath for all user-defined Java^Ž functions could be

 <interbase-dir>/java_udfs

where <interbase-dir> is the InterBase^Ž installation directory. The default classpath of <interbase-dir>/java_udfs could be modified manually by setting a startup configuration parameter such as JAVA_UDF_CLASSPATH. Like LOAD_JAVA_VIRTUAL_MACHINE, JAVA_UDF_CLASSPATH could also be a server-side system environment variable set before the InterBase^Ž server starts.

All directories and jar files in the classpath setting are separated by semi-colons according to the standard Java^Ž conventions for setting classpath on Windows. Here is an example setting:

 JAVA_UDF_CLASSPATH c:interbasejava_udfs;d:fredsUdfsmathUdfs.jar

For native UDFs, a library module (.dll file) is specified along with the UDF declaration. However, specifying a Java^Ž archive (.jar file) along with a Java^Ž UDF declaration is not possible because all Java^Ž class libraries must be known in advance of initializing the JVM. Therefore, Java^Ž archive files (.jar files) and Java^Ž class library directory locations containing Java^Ž UDFs should be appended to the JAVA_UDF_CLASSPATH variable in the ibconfig file. See section Requirements and constraints for more details.

For Java^Ž UDFs which utilize native libraries via JNI, the directory location of the native libraries (.dll files) must also be provided when the JVM is initialized. By default, it is assumed that Java^Ž UDFs are written in pure Java^Ž. However, if Java^Ž UDFs are utilized which call into native libraries, these libraries must be specified in the JAVA_UDF_NATIVE_LIBRARY_PATH variable of the ibconfig startup file. All directories in the path setting are separated by semi-colons according to the standard Java^Ž conventions for setting path on Windows.

 JAVA_UDF_NATIVE_LIBRARY_PATH d:fredsUdfNativeLibs

There is a secondary option of when to create the JVM. The JVM could be created when the InterBase^Ž server starts up (accepted), or alternatively, it could be created upon invocation of the first Java^Ž UDF (rejected). Which choice is taken would affect the design under JDK 1.1 because of threading issues. So an ancillary design issue is addressed here.

Design note: In JDK 1.1, the main thread which created the JVM must be maintained for the life of the embedding application, and only this main thread may destroy the JVM (thereby releasing JVM resources). Therefore, in JDK 1.1, a transient query thread cannot be used to create the JVM, as would be tempting to do if the JVM is created on the first invocation of a Java^Ž UDF during SQL execution. If the JVM has not yet been created, then the first transient query thread to invoke a Java^Ž UDF must yield to the dedicated main thread to create the JVM. This main thread must also destroy the JVM at server shutdown time. If the JVM is already started, any transient query thread may "attach" itself to the JVM before invoking it, and "detach" itself from the JVM before being returned to the internal pool of InterBase^Ž query threads. These design requirements have changed in the JDK 1.2 version of the JNI, in which any thread may destroy the JVM.

The idea of loading the JVM upon Java^Ž UDF invocation, rather than server startup has been rejected. Here's a quote from Mark Duquette which best explains why:

If we opt for a single JVM implementation, then I would vote for loading the JVM on server startup. It is usually acceptable for an application to take a little time to load as opposed to an element being accessed (e.g. performing a SELECT after a large delete took forever because of garbage collection).

Functional requirements for multiple connection-wide JVMs (rejected alternative)

This alternative is academic, being that it is actually not possible given the current JVM implementations, and would probably not be a desirable alternative even if it were possible, but it is included here for completeness.

Alternatively to a single server-wide JVM, separate JVMs could be created for each connection which requests a JVM. This gives control over the configuration of the JVM to the user connection and does not require server restart for a new JVM configuration to accomodate some new connection.

The JNI provides a mechanism for creating multiple JVMs to facilitate thread isolation in multi-threaded programming environments. One simple way to allocate JVMs is to create a dedicated JVM for each connection which needs Java^Ž UDF support. In this case, a JVM may be created and configured when a connection which requests Java^Ž UDF support is established to a database. A connection requesting a JVM may specify a server-side classpath, as well as a server-side native library path if necessary. Other ways of distributing multiple JVMs are possible, such as one JVM per query (way too costly), but one JVM per requesting connection is probably the most logical if one opted for multiple JVMs.

Multiple connection-wide JVMs could be configured in the same way as a single server-wide JVM is configured via the ibconfig file. However, this does not allow for differing JVM configurations between connections. One way to allow for connection-level JVM configurations is through the use of Database Parameter Block (DPB) options. The DPB parameters would be analagous to the ibconfig parameters in the server-wide JVM scenario:

 isc_dpb_load_java_virtual_machine
 isc_dpb_java_udf_classpath
 isc_dpb_java_udf_native_library_path

SQL support would also need to be surfaced by extending the syntax of the CONNECT statement. For example:

 CONNECT "employee.gdb" 
 LOAD_JAVA_VIRTUAL_MACHINE JAVA_UDF_CLASSPATH "d:java_udfs";

Alternatively, we could eliminate the need for isc_dpb_load_java_virtual_machine and LOAD_JAVA_VIRTUAL_MACHINE by first having connection requests check the system tables for any Java^Ž UDF entry in the database. Then, create a JVM if and only if a Java^Ž UDF is found in the system tables. The overhead in checking for Java^Ž UDFs in the system tables may be minimized by using an in-memory DBB flag to indicate existence of Java^Ž UDFs in the database. This flag would be set only at the first attachment of a client to the database, and would be used by subsequent attachments.

Because each JVM maintains its own object memory, using multiple JVMs would present some difficulties if static class variables were modified by a UDF. Because of this and the amount of resources that would be required by multiple JVMs, a single server-wide JVM is undoubtedly our best option under the super-server model. In fact, I asked a JavaSoft JNI engineer the following question to get an idea of the intended usage of multiple JVMs:

Question: Regarding JNI_CreateJavaVM(). Since the JVM is multi-threaded, under what circumstances would an application ever want more than one instance of the JVM?

Answer: For isolation (say different System.out's). The API for multiple VMs was added but never implemented and is not clear whether it will ever be.

Here's a further comment giving another reason for rejecting this design alternative:

IMHO, a single JVM which can be shared by concurrent query threads is the best option. Not only would this be less resource intensive, but also would allow a single point of contact for our InterBase^Ž server. We would not have to add complexity of managing multiple JVM's invoked by each query thread. Also, this enhances the speed of query execution by avoiding a JVM start for each query.

A system table for Java^Ž UDFs

Rather than introduce a new system table for Java^Ž UDFs (eg. RDB$JAVA_FUNCTIONS), the existing system table for native UDFs (RDB$FUNCTIONS) will be extended to accomodate Java^Ž UDFs. Reusing the RDB$FUNCTIONS system table will have the least affect on existing middleware and application products, and will help to ensure that function names are unique. Here's the proposed new schema for RDB$FUNCTIONS.

Column Name	Datatype	Length	Description
`RDB$FUNCTION_NAME`	`CHAR`	`31`	Unique name for a native function or Java^Ž UDF.
`RDB$FUNCTION_TYPE`	`SMALLINT`		Prior to V7 this was reserved for future use. For V7 this field is used to indicate whether the function is a native UDF or a Java^Ž UDF. 0 indicates native, 1 indicates Java^Ž.
`RDB$QUERY_NAME`	`CHAR`	`31`	Alternative name for the function that can be used in ISQL.
`RDB$DESCRIPTION`	`BLOB`	`80`	Subtype Text: Contains a user-written description of the function being defined.
`RDB$MODULE_NAME`	`VARCHAR`	`253`	For native UDFs, this names the function library where the executable function is stored. For Java^Ž UDFs, this field is `NULL`. This field is nullable.
`RDB$ENTRYPOINT`	`CHAR`	`31`	For native UDFs, this is the entry point within the function library for the function being defined. For Java^Ž UDFs, this field is `NULL`. This field is nullable.
`RDB$RETURN_ARGUMENT`	`SMALLINT`		Position of the argument returned to the calling program; this position is specified in relation to other arguments.
`RDB$SYSTEM_FLAG`	`SMALLINT`		Indicates whether the function is user-defined (value of 0) or system-defined (value of 1).
`RDB$CLASS_NAME`	`VARCHAR`	`253`	For Java^Ž UDFs, this is the Java^Ž class name containing the Java^Ž method for the defined function. For native UDFs, this is `NULL`.
`RDB$METHOD_NAME`	`VARCHAR`	`253`	For Java^Ž UDFs, this is the Java^Ž method name for the defined function. For native UDFs, this is `NULL`.

Table RDB$FUNCTION_ARGUMENTS should not need to be modified as this describes the InterBase^Ž types of UDF arguments.

Exception handling

Testing has shown that the Java^Ž VM will crash with a segmentation violation upon UDF invocation if the Java^Ž UDF Declaration and the Java Method Declaration signatures do not match. Exceptions occuring from within the JVM will be trapped by the engine, then an appropriate error message will be logged, and the server will exit gracefully. Unlike native UDFs, Java^Ž UDF exceptions include both abnormal terminations of the Java^Ž VM, and normal Java^Ž exceptions thrown from within the UDF Java^Ž method itself. So the engine will trap both normal Java^Ž exceptions thrown from a Java^Ž method, as well as abnormal terminations of the Java^Ž VM. Furthermore, the server should not exit for a Java^Ž exception, as it does for an abnormal termination such as a segmentation violation. Rather, the server should log a message for the Java^Ž exception (by way of the status vector) and abort the associated query, but not exit.

Design note: The implementation could leverage the work done for UDF exception handling in which the server does not terminate. This is not currently in force for 6.0 since it's unsafe to continue the server after a segmentation violation.

Deploying the Java^Ž runtime

In order for end users to use Java^Ž UDFs, they'll need to have a Java^Ž runtime environment installed on their server. The Java^Ž 2 SDK software can serve as a runtime environment. However, we shouldn't assume all users have the Java^Ž 2 SDK software installed, and the Java^Ž 2 SDK software license doesn't allow us to redistribute SDK software files.

To solve this problem, Sun provides the Java^Ž 2 runtime environment as a free, redistributable runtime environment, available for Win32 and Solaris systems. By distributing the JRE with InterBase^Ž, we can ensure that customers will have the correct version of the Java^Ž platform for running our software.

The Java Runtime Environment (JRE) is the minimum standard Java^Ž platform for running applications written in the Java^Ž programming language. It contains the Java^Ž virtual machine, Java^Ž core classes, and supporting files. The JRE does not contain any of the development tools (such as appletviewer or javac) or classes that pertain only to a development environment.

The Win32 version comes with a built-in installation program suitable for end-users. Solaris versions require the developer to provide installation support. This means the InterBase^Ž install could invoke the Sun JRE installation exe for Win32 if desired, but must install the JRE files manually on Solaris.

The Java^Ž 2 runtime environment for Win32 is available both with and without international support. The non-international version is much smaller, but is suitable only for English-speaking users.

We also must make sure that our installation procedure never overwrites an existing JRE installation, unless the existing runtime environment is an older version.

The Win32 installation program records program information in the Windows Registry. This registry information includes the software version, which we will need to compare with the Java^Ž 2 runtime environment version compatible with our InterBase^Ž software.

One approach is to install the Java^Ž 2 runtime environment files manually into our own InterBase^Ž directory or any other directory specified by the installer. If we choose this approach, we must redistribute the JRE in its entirety except for some optional files which we may choose not to redistribute. The files that are optional are listed in the JRE README. They are mostly for functionality such as internationalization and localization which we may or may not need. The Java^Ž 2 Runtime Environment software can only be redistributed if all required files are included. Arbitrary subsetting of the Java^Ž 2 runtime environment is not allowed. See the JRE LICENSE file for specifics. We will also have to include license provisions in our InterBase^Ž license file.

The Java^Ž 2 runtime environment includes bin and lib subdirectories which both must reside in the same parent JRE directory. The bin directory contains about two dozen dlls and exes. The lib directory contains various jars and associated files. These files are too numerous to enumerate here.

In the case of the Win32 Java^Ž 2 runtime environment, the native C runtime library, msvcrt.dll, should be copied to the Windows system directory. The location of this directory varies on different operating systems, but is usually

winntsystem32 on Windows NT
windows98system on Windows 98
windowssystem on Windows 95

Although InterBase^Ž already distributes this file, it is stated here for the record that this file should be included in redistributions of the Win32 version of the Java^Ž 2 runtime environment.

Metadata extract utility for Java^Ž UDFs

For each Java^Ž UDF declared to the database, extract out

 DECLARE EXTERNAL JAVA FUNCTION udf-name
 [ java-udf-datatype .,..]
 [ RETURNS { java-udf-datatype
           | PARAMETER argument-position } ]
 CLASS "class-name"
 METHOD "method-name"; 

 java-udf-datatype ::=
             JSTRING (maximum-character-length)
           | NUMERIC(p,s) | NUMERIC(p) | DECIMAL(p,s) | DECIMAL(p)
           | DATE | TIME | TIMESTAMP
           | BLOB
           | DOUBLE PRECISION
           | INTEGER
           | SMALLINT

Standard Java^Ž UDF library

A custom Java^Ž UDF library comparable to our FreeUDF library, or Gregory Deatz' or MER System's native UDF libraries is not really necessary because most of these functions already exist in the core Java^Ž class libraries. One of the advantages of providing support for Java^Ž UDFs is that the core Java^Ž class libraries already provide a wealth of built-in methods ready for use. This also means it is unnecessary to port the standard InterBase^Ž native UDF library. However, a standard set of Blob UDFs, especially for converting String data to and from Blobs, would be a useful add-on Java^Ž UDF library.

Linking with unknown Java^Ž virtual machines

The ability to embed a JVM from within a native application such as InterBase^Ž requires us to link with a Java^Ž virtual machine implementation. How we link with a Java^Ž virtual machine depends on whether we intend to deploy with only one particular virtual machine implementation or a variety of virtual machine implementations from different vendors. Because the JNI does not specify the name of the native library that implements a Java^Ž virtual machine, we should be prepared to work with Java^Ž virtual machine implementations that are shipped under different names. In general, different vendors may name their virtual machine implementations differently. For example, on Win32, Sun's virtual machine is shipped as javai.dll in the JDK release 1.1, and as jvm.dll in the Java^Ž 2 SDK, and Microsoft's virtual machine will go by yet some other name.

The solution is to use programmatic run-time dynamic linking to load the particular virtual machine library specified in the ibconfig variable JAVA_VIRTUAL_MACHINE_LIBRARY. This variable could hold just the name of the library, such as jvm, or the absolute path to the library, such as C:\jdk1.2\jre\bin\classic\jvm.dll, but it is preferrable that the absolute path is used since relying on LoadLibrary() or dlopen() to search for jvm.dll makes InterBase^Ž susceptible to configuration changes, such as additions to the PATH environment variable.

 JAVA_VIRTUAL_MACHINE_LIBRARY "c:\jdk1.2\jre\bin\classic\jvm.dll"

Design Note: Linking in this way, we would not need to make explicit JNI function calls from within the InterBase^Ž engine code, and we would therefore not need to link the engine with jvm.lib for entry point information. Rather, JNI function calls would be found by their function name address in the dynamically loaded JVM library. For example, the following Win32 code finds the function entry point for AttachCurrentThread given a virtual machine library:

 // Return a function pointer to the JNI function 
 // "AttachCurrentThread" in a variable JVM library.

      void *findAttachCurrentThread (char *jvmLibrary)

      {

        HINSTANCE hVM = LoadLibrary (jvmLibrary);
        if (hVM == NULL) return NULL;
        return GetProcAddress (hVM, "AttachCurrentThread");

      }

The Solaris version is:

 // Return a function pointer to the JNI function 
 // "AttachCurrentThread" in a variable JVM library.

      void *findAttachCurrentThread (char* jvmLibrary)

      {

        void *libVM = dlopen (jvmLibrary, RTLD_LAZY);
        if (libVM == NULL) return NULL;
        return dlsym (libVM, "AttachCurrentThread");

      }

Requirements and constraints

The Java^Ž analog to the native UDF module name (.dll file) is the Java^Ž classpath. The Java^Ž classpath must be known in advance at the time the JVM is initialized. The Java^Ž classpath is not specified at UDF declaration time. Rather, the Java^Ž classpath is specified at server startup time. This has the disadvantage that the classpath is fixed for the life of the server, but has the advantage that if Java^Ž libraries are renamed, the Java^Ž classpath can be reconfigured without having to redeclare the UDF to the database. Whereas with native UDFs, the .dll module name is hardwired in the system tables for the UDF. So if a .dll module name changes, the native UDF must be redeclared to the database.

Like native UDFs, the invocation of a Java^Ž UDF will release the engine thread lock by performing a thread-exit before transferring control to the Java^Ž runtime (invoking the UDF). When the Java^Ž UDF returns, a thread-enter will be performed to regain the thread lock on the engine.

The JVM port must provide native Java^Ž thread support for the deployed platform. Therefore initial support for Java^Ž UDFs will be for Win32 only. Here's a quote from Sun's JNI FAQ (https://java.sun.com/products/jdk/faq/jnifaq.html):

"The Solaris Java^Ž VM shipped with JDK 1.1 is not suitable for embedding into certain native applications. Because it depends on a user-level thread package to implement Java^Ž threads, the VM overrides a number of system calls in order to support non-blocking I/O. This may have undesirable affects on the hosting native application. In addition, the Invocation API function AttachCurrentThread is not supported on Solaris.

We plan to fix these problems in the near future by releasing a Java^Ž VM directly supported by Solaris native threads."

This non-native Java^Ž thread implementation was known as green threads.

But further information is now to be found at the new Java^Ž 2 JNI FAQ (https://java.sun.com/products/jdk/faq/jni-j2sdk-faq.html#nativethreads):

"8. (Solaris) Has support for native threads gotten any better?
Yes. As of JDK/JRE 1.1.3 you could download a Solaris Native Threads Pack which was fully supported. In the Java^Ž 2 SDK, native threads is integrated into the release. If you use the invocation API on Solaris to embed the JVM into your application, we recommend the use of the native threads VM (see also Q11 on linker issues on Solaris).

Lest you ask, we have always used only native threads on the Win32 platforms."

This will need to be tested directly on Solaris for confirmation. Note that we must embed the native threads VM since green threads and native threads don't mix, and of course InterBase^Ž already links with -lthread -lc which is required for access to Solaris native threads.

Because of significant differences in the JNI API between Java^Ž 1 and Java^Ž 2, only Java^Ž 2 and above will be supported.

The JVM port must support the Java^Ž 2 JNI interfaces.

The blr and dyn generation for Java^Ž UDFs is deferred as a design consideration.

Thread-safety of Java^Ž UDFs is up to the author of the Java^Ž UDF class library. However, this is a relatively easy task in Java^Ž.

Performance of Java^Ž UDFs will be inferior to that of native UDFs because of the necessary internal conversions from native InterBase^Ž datatypes to Java^Ž objects.

Migration issues

Although RDB$ENTRYPOINT and RDB$MODULE_NAME are nullable fields, the modified system table for RDB$FUNCTIONS could affect applications which do not allow for a null RDB$ENTRYPOINT or RDB$MODULE_NAME.

Open issues

Any way to force garbage collection of JVM by InterBase^Ž?
Server-wide UDFs - this is a desirable feature for both native and Java^Ž UDFs.

Syntax conventions

The syntax diagram conventions mostly follow BNF, with a few variations to enhance readability. Here is a description of the general rules for specifying syntax in this extended BNF. Please be aware that BNF is a high-level specification syntax, and is not a low-level LALR(1) syntax as used by parser generators such as YACC. The LALR(1) syntax for Java^Ž UDFs is deferred as a design consideration. These rules are taken from the book "SQL Instant Reference" by Martin Gruber, Sybex Publishing.

The symbol ::=means "is defined as". It is used to further clarify parts of a statement's syntax diagram.
Keywords appear in all uppercase letters. These reserved words are literals that are actually written as part of the statement.
Place holders for specific values, such as domain-name in the CREATE DOMAIN domain-name statement, appear in italic type. These place holders identify the type of value that should be used in a real statement; they are not literals to be written as part of the statement. This is not a standard BNF convention.
Optional portions of a statement appear in square brackets (@ and @).
A vertical bar (|) indicates that whatever precedes it may optionally be replaced by whatever follows it.
Braces ({ and }) indicate that everything within them is to be regarded as a whole for the purpose of grouping.
Ellipses (...) indicate that the preceding portion of the statement may be repeated any number of times.
Ellipses with an interposed comma (.,..) indicates that the preceding portion may be repeated any number of times, with the individual occurrences separated by commas. The final occurence should not be followed by a comma. Note: This is not a standard BNF convention; it is used for clarity and simplicity.
Parenthesis () used in syntax diagrams are literals. They indicate that parenthesis are to be used in forming the statement. They do not specify a way of grouping the diagram as braces or square brackets do.