Note: This paper was prepared in November 1992 for presenation at the ISMM conference on knowledge and information management (CIKM) in Baltimore, MD. It references a the July 1992 SQL3 base document. It has not been updated since that time. Object SQL: Language Extensions for Object Data Management Leonard J. Gallagher Information Systems Engineering Division National Institute of Standards and Technology Gaithersburg, MD 20899 Abstract Database Language SQL is enjoying success as an effective International Standard for the definition and management of relationally structured data. The first SQL standard, in 1986, provided basic language constructs for defining and manipulating tables of data; a revision in 1989 added language extensions for referential integrity and generalized integrity constraints; and the most recent revision in 1992 provides new facilities for schema manipulation and data administration, as well as substantial enhancements for data definition and data manipulation. A companion standard for Remote Database Access (RDA), also completed in 1992, provides the basic services and protocols for SQL interoperability in a client/server environment. National and international SQL standardization committees are now focusing on development of future extensions for meeting the stated requirements of managing complex objects in engineering and multimedia environments. These extensions include object identifiers, abstract data types, inheritance hierarchies, and all of the other features normally associated with object data management. This paper presents the object management features contained in the draft specification for the next SQL standard, discusses problems and alternatives, and addresses planned future standardization activities. 1 Introduction Data management requirements for many scientific and engineering applications often exceed the capabilities of existing database management systems. Such applications often require a logically integrated database of diverse data (e.g. documents, graphics, alphanumeric records, complex objects, images, voice, video) stored in geographically separated data banks under the management and control of heterogeneous data management systems. An over-riding requirement is that these various data managers be able to communicate with each other and provide shared access to data and data operations and methods under appropriate security, integrity, and access control mechanisms. Database Language SQL [8] and its distributed processing counterpart, Remote Database Access (RDA) [10], are important International Standards that are able to address a significant portion of the above data management requirements. SQL is particularly appropriate for the definition and management of data that is structured into repeated occurrences having common data structure definitions. SQL provides a high-level query and update language for set-at-a-time retrieval and update operations, as well as required database management functions for schema and view definition, integrity constraints, schema manipulation, and access control. SQL provides a data manipulation language that is mathematically sound and based on a first-order predicate calculus. SQL is self-describing in the sense that all schema information is queryable through a set of catalog tables. Features of the most recent SQL standard are discussed in [4], [5], and [12]. Early in 1991, technical committees for SQL standardization, operating under the procedures of the American National Standards Institute (ANSI) and the International Organization for Standardization (ISO), committed to enhancing SQL into a computationally complete language for the definition and management of persistent, complex objects. This includes the specification of abstract data types, object identifiers, methods, inheritance, polymorphism, encapsulation, and all of the other facilities normally associated with object data management. Preliminary specifications for these facilities are contained in the most recent SQL3 Working Draft [9]. This article focuses on the new object-oriented facilities proposed for inclusion in SQL3. In particular, Section 2 describes the preliminary status of abstract data type (ADT) facilities and object identifiers. Section 3 discusses newly added features for compound SQL statements, exception handling, and flow of control. Section 4 addresses parameterized data types, including generator types such as LIST, SET, and ARRAY. Section 5 discusses stored procedures, Section 6 describes other supporting features, and Section 7 draws some conclusions. 2 Object Definition Earlier versions of the SQL3 Working Draft specified object-oriented facilities such as User-Defined data types, Assertions, and Triggers to support the object notions of abstraction and encapsulation. They also included the generalization and specialization of tables in a table hierarchy that supported multiple inheritance. In each case, these object concepts were only partially supported without fully enforcing them as a discipline. There was no notion of object identifiers and no notion of class hierarchies for user-defined data types. The newest SQL3 Working Draft [9] now addresses the requirement for true "objects" and "object identifiers" in SQL and also specifies supporting features such as encapsulation, subtypes, inheritance, and polymorphism. The following subsections briefly describe the current status of these features. Of course, as with any draft, the specifications are subject to revision and improvement as they evolve over the next two or three years before final standardization. 2.1 Abstract data type (ADT) The abstract data type facility provides the capability to define and manage persistent data type definitions, including structures and operations on those structures. A new ADT can be "constructed" from any existing data type, including previously defined abstract data types, known to the current SQL environment. An abbreviated version of the current syntax is as follows: CREATE TYPE <ADT name> [ <OID options> ] [ <subtype clause> ] [ <member list> ] The <OID options> are discussed in Section 2.2 and the <subtype clause> is discussed in Section 2.5 below. The <member list> specifies the attributes, operations, and other methods applicable to the ADT definition. Attributes contain an <encapsulation level>; otherwise, an attribute definition has the same syntax as a regular relational column definition. An <encapsulation level> is specified as either PUBLIC, PRIVATE, or PROTECTED. Public components form the interface of the ADT and are visible to all authorized users of the ADT. Private components are totally encapsulated, and are visible only within the definition of the ADT that contains them. Protected components are partially encapsulated, being visible both within their own ADT and within the definitions of all subtypes of that ADT. Operations on an ADT may include EQUALS and LESS THAN. The EQUALS clause identifies a function that specifies conditions under which two instances of the defined data type are considered to be equal. The LESS THAN clause identifies a function that specifies a comparison ordering over data type instances. The definitions of equality and ordering, taken together, specify the semantics to be used in the SQL comparison predicate when applied to ADTs. If the LESS THAN definition is not specified, then the ordering tests of the comparison predicate must return an unknown value. Other methods associated with ADTs include FUNCTION definitions. FUNCTIONs operate on one or more ADT instances and return either Boolean, if the result is to be used as a truth value in a Boolean predicate, or a single value of a defined data type, if the result is to be used as a value specification. Functions may be SQL functions, completely defined in an SQL schema definition, or they may be external function calls to functions defined in standard programming languages (see 2.4). Special "constructor" and "destructor" functions are defined to make or remove instances of an abstract data type. At the present time constructor and destructor functions are invoked implicitly through Insert or Delete operations on a table (see 2.3). Special methods, identified as CAST functions, specify how an ADT may be mapped to other existing data types. For example, an IMAGE ADT may be mapped to a BIT STRING representation. With the ability to include CAST specifications in any ADT definition, a data type definer can define mappings to specific external representations. In this way the internal representation may be kept PRIVATE and not directly accessible, thereby allowing efficient implementation. Abstract data type definitions and SQL or external function declarations are treated just like any other SQL objects as far as access control and other usage privileges are concerned. Access control is independent of encapsulation since encapsulation defines the structure of what is possible to see and access control determines who can see that structure. All names are qualified by the schema name of the containing schema and by the catalog name of the containing catalog. Each such object is "owned" by the authorization identifier of the schema in which it is defined and all privileges to use the object, to "see" its representation, or to modify its definition must be explicitly granted by the object owner. Privileges on existing ADTs may be GRANTed and REVOKEd and new ADTs or function declarations may be ADDed or DROPed from the schema as part of the SQL schema manipulation language. 2.2 Object identifiers Object identity is that aspect of an object that never changes and that distinguishes the object from all other objects. Ideally, an object's identity is independent of its name, structure, or location. Object identity is therefore a unique identification of an object that is independent of the state of that object, and which persists over time. The identity of an object persists even after the object no longer exists (e.g. like a timestamp), so that it may never be confused with the identity of any other object. Other objects can use the identity of an object as a unique way of referencing it. The <OID options> clause in an ADT definition allows several alternatives for object identifier (OID) specification: WITH OID VISIBLE, WITH OID NOT VISIBLE, or WITHOUT OID If WITH OID is specified, then an OID value is generated when the object is created to give that object an immutable identity. The OID can be referenced by an "object reference" in constraints, queries, and other ADT definitions. If NOT VISIBLE is specified, then the OID value may not be passed as a parameter to functions or stored in a host language variable. If WITHOUT OID is specified, then the ADT does not have an object identifier; instead, each instance represents itself just like values of primitive data types do. There is a continuing debate in the SQL standardization committees as to whether SQL should support all three of the above options, or if every new ADT definition should be assumed to carry a unique object identifier. The outcome of this debate will not affect the functionality of the new language, but it may influence its appearance and style. PUBLIC components of an ADT are accessible to authorized users through a special "attribute reference" operator (i.e. <ADT reference>.<attribute name>). If the ADT has WITH OID specified, then the <ADT reference> will be to an object identifier of a specific ADT; if the ADT has WITHOUT OID specified, then the <ADT reference> will be to the ADT value itself. The attribute reference identifies a specific component of the ADT instance and permits the user to read or modify its value. 2.3 Object management Object ADTs are subject to special "constructor" and "destructor" functions that either create a new instance of the ADT and make it part of the database or remove an instance of an ADT from the database. Since SQL is a "table-based" language, SQL designers have to address issues concerning whether or not SQL object instances may exist outside of table occurrences. If SQL objects are allowed to exist outside of tables, then new syntax to manipulate them and new structures to hold collections of them must become part of the language. Although these issues are still subject to debate and modification, the current status is to require that all object manipulation be achieved through table operations. SQL3 currently allows specification of a "tabular" shell over an ADT class. In this way, constructor and destructor functions are automatically invoked when rows are inserted or deleted from the table, and the table itself is the collection of all object occurrences. SQL query and update statements may then be applied to the table without the need for any special language enhancements. If a requirement surfaces later to allow objects to exist and be managed independently of tables, then it can be handled as an upward compatible language enhancement. In order to accommodate this implicit invocation of constructor and destructor functions, minor enhancements are needed to the syntax and semantics of the CREATE TABLE and INSERT statements (see [3]). The statement "CREATE TABLE T OF <ADT name>" creates a tabular envelope around the abstract data type specified by <ADT name> and all attributes of the abstract data type become columns of the new table T. Object instances may then be managed by the usual SQL Select, Update, and Delete statements. The statement "INSERT INTO T <insert spec> ALIAS <name>", using the new ALIAS option with an Insert statement, returns the object identifier of any new object created by the Insert statement to the <name> variable. Syntax rules prohibit specification of an alias when the underlying ADT is specified WITHOUT OID. 2.4 Methods and functions An abstract data type includes not only a collection of attributes but also the methods that define the behavior of the ADT. Some of the methods associated with an ADT might be realized by means of data that is stored in the database, while others might be realized as executable code (functions). An implementation of an ADT is the stored data together with the data structures and code that implement the behavior of the ADT. As seen in Section 2.1, methods may be encapsulated within the ADT definition. Specific methods for determining equality, and ordering when appropriate, are then usable in regular SQL comparison predicates. As we have seen, other methods can be defined as special operators on ADTs or as predicates that return truth values. A single value returned from a function call can be used any place in the SQL language that a single value is allowed. A truth value returned from a function call can be used as one of the terms in a boolean predicate. Functions may be either SQL functions or external functions. SQL functions are defined completely in SQL, whereas external functions have only their interface definition specified in SQL, with the content of the function written in some programming language (e.g. Ada, C, or eventually C++). An SQL function may be "defined", or an external function may be "declared", as an independent schema element, as part of an ADT definition, or as part of a module definition. The syntax of an SQL function is: [CONSTRUCTOR | ACTOR | DESTRUCTOR] FUNCTION <function name> <parameter declaration list> RETURNS <data type> <SQL statement> ; END FUNCTION Only constructor or destructor functions may create or destroy new ADT instances; they have already been discussed above. An actor function is any other function that reads or updates components of an ADT instance or accesses any other parameter declared in the <parameter declaration list>. A parameter in the parameter list consists of a parameter name and an SQL data type. The RETURNS clause specifies the SQL data type of the result returned. Since all data types in an SQL function are SQL types that must accommodate Null values, it is not necessary to worry about "indicator" parameters to convey Nulls. The <SQL statement> may be any SQL statement, including compound statements and control statements (see Section 3). To accommodate SQL functions, SQL3 specifies the following new SQL statements: - A NEW statement that allows creation of a new OBJECT ADT instance; it is only allowed in a CONSTRUCTOR function. - A DESTROY statement that destroys the existence of an OBJECT ADT instance; it is only allowed in a DESTRUCTOR function. - An ASSIGNMENT statement that allows the result of an SQL value expression to be assigned to a free standing local variable, a column, or an attribute of an ADT. - A CALL statement that allows invocation of an SQL procedure. - A RETURN statement that allows the result of an SQL value expression to be returned as the RETURNS value of the SQL function. The syntax of an external function declaration is: DECLARE EXTERNAL <external function name> <formal parameter list> RETURNS <result data type> [ CAST AS <cast data type> ] LANGUAGE <language name> The <formal parameter list> is a list of SQL data types. If a data type in the parameter list is supported in the programming language identified by the LANGUAGE clause, then the corresponding programming language routine has two parameters for that data type; the second parameter is the "indicator" parameter to convey Null values. If a data type in the parameter list is an ADT not supported in the programming language identified by the LANGUAGE clause, then the corresponding programming language routine has two parameters for the base type of each attribute in the ADT definition, recursively. Again, the second parameter in each case is an "indicator" parameter. The actual mapping from the <formal parameter list> in the external function declaration to the parameter list of the programming language routine can become quite complex, so techniques for better management of cross-language calls are under active consideration. The CAST AS clause is a convenience to allow "encapsulated" casting from a programming language data type to an SQL data type. For example, an SQL DATETIME data type that appears in the formal parameter list is automatically cast to its character string literal representation before it is passed to a programming language routine, e.g. FORTRAN. The CAST AS clause could also automatically cause the character string RESULT to be re-cast into an SQL DATETIME value. Since every SQL data type has a defined CAST operation to and from character string representations, it is possible to pass any SQL data type to any programming language that supports character strings. More sophisticated data type correspondence, especially to object programming languages such as C++, will likely be the subject of near-term considerations. 2.5 Subtypes and inheritance Specification of "UNDER <ADT name>" in the <subtype clause> of an ADT definition (see 2.1) permits a new ADT to be defined as a subtype of an existing ADT. A type can have more than one subtype and more than one supertype. Thus a subtype is a specialized type of one or more supertypes and a supertype is a generalized type of one or more subtypes. A supertype shall not have itself as a proper subtype and a subtype family shall have exactly one maximal supertype. Inheritance is an abstraction mechanism that adds to the power of data abstraction by allowing classes of objects to be related hierarchically. Inheritance allows classes to share definitions with other classes, thereby supporting newer, more specialized, data definitions without losing the existing properties and operations of the superclass. Through inheritance, new types can be built under older, less specialized types rather than having to rewrite properties from scratch. Inheritance makes it possible to build a hierarchy of related ADTs, i.e. a "type hierarchy", that share the same interface, and possibly the same representation and implementations. As we move up in the inheritance hierarchy, types become more generalized; as we move down types become more specialized. These generalization/specialization capabilities allow more accurate and succinct modeling of applications. The SQL implementation of a type hierarchy (see [3] and [11]) requires that an instance of a subtype is also an instance of all of its supertypes. Every instance is associated with a "most specific type" that corresponds to the lowest subtype assigned to the instance. At any given time, an instance must have exactly one most specific type. Note that the most specific type of an instance need not correspond to a leaf type in the type hierarchy. For example, a type hierarchy might consist of a maximal supertype PERSON that has STUDENT and EMPLOYEE as two subtypes, and STUDENT may have two subtypes GRAD and UNDERGRAD. An instance in this hierarchy might be created with a most specific type of STUDENT, e.g. a special, non-degree student, even though STUDENT is not a leaf in the type hierarchy. As we saw above, every attribute definition in an ADT has an encapsulation level specified as either PUBLIC, PRIVATE, or PROTECTED. Public and protected components are visible to the definitions of all subtypes of that ADT, but private components are not. A subtype can define constructor, actor, and destructor functions just like any other ADT. All operations of the supertype are invocable from the subtype, so there is a high potential for name conflicts when the subtype defines more specialized operations. Name resolution rules, described in the following section, ameliorate this problem. Since an instance must have at most one "most specific type" associated with it, a given instance cannot have two sibling types simultaneously as its most specific type. For example, in the above student type hierarchy, an instance of PERSON may not be an instance of both STUDENT and EMPLOYEE simultaneously. However, real world examples require that we have some method for modeling a person as both STUDENT and EMPLOYEE, for this can certainly be the case. To handle these situations, SQL provides "multiple inheritance", i.e. a subtype can have more than one direct supertype. With multiple inheritance, we can define a new type STUDENT-EMP which is a subtype of both STUDENT and EMPLOYEE. A person who is both a student and an employee can be modeled as an instance of the STUDENT-EMP type. In this way an instance will satisfy the requirement to always have a "most specific type". Multiple inheritance could lead to ambiguous inheritance of components from its supertypes, so SQL provides some disambiguity rules as follows: - If an attribute in more than one supertype is inherited from a common supertype higher in the hierarchy, then only the one from the common supertype, is inherited. - If an attribute with the same name in each of the supertypes are not inherited from a common supertype higher in the hierarchy, then the type definition is invalid unless the type definer renames the inherited components to remove the name clash. These rules, and other related issues, are subject to improvement and evolution as the SQL ADT facility stabilizes over the next two or three years. 2.6 Polymorphic functions Polymorphism is the ability to invoke an operation on any of several different objects and have that object determine what to do at run- time. A polymorphic function is one that can be applied in the same way to a variety of data objects. Support for polymorphism involves technical decisions concerning early or late binding among objects and the procedures that invoke their methods. To help address some of these technical decisions, a number of techniques have evolved (see [1]), such as: Overloading. The ability to assign the same name to more than one function or procedure -- name resolution is then determined by a set of rules, thereby allowing a processor to distinguish among different functions of the same name by examining the "type" of the input data. Coercion. The ability to omit semantically needed type conversions -- we have already seen that SQL uses this technique in some of its parameter passing to external functions. Inclusion. The ability to manipulate objects of a subtype "as if" they were objects of a supertype -- possibly with a function of the same name that calls different routines. Generalizing. The ability to specify that a parameter should "take on" the type of some supertype during processing of a specific function call. The resolution rules to support polymorphism are derived from one basic concept, i.e. that for any particular function invocation, there must exist a single "best match" from the candidate functions that are "in scope". When a function call is executed, the unique "most specific type" of the various input parameters is used to help define the "best match" according to the following rules, many of which are derived from those used by C++: - Begin with the set of all functions that are "in scope" for a particular function call, i.e. those that are defined or declared with the calling function name in the statement, procedure, module, or schema associated with the function call. - For each argument, determine the set of functions that is a "best match" for that argument, then take the intersection of these sets. Unless this intersection has exactly one member, the call is illegal. That is, the function selected must be a "strictly better match" for at least one argument than every other possible function, but not necessarily the same argument for each function. - To decide which functions are the best match for each argument, agree: an "exact match" is better than one based on type coercion (i.e. CASTing), an implicit conversion to the "closest" supertype is better than SQL or user-defined type coercion, and an implicit SQL- defined CAST is better than an implicit user-defined CAST. 3 Control Structures Support for SQL functions is discussed in Section 2.4 above. We saw there that it was necessary to introduce several "control" statements into the SQL language, e.g. ASSIGNMENT, CALL, and RETURN. The obvious next step was to consider if more control statements and other "programming language" facilities should be added to SQL. In particular, SQL standardization committees decided that it was appropriate to specify SQL syntax and semantics for the following facilities (see [6]): - sequences of SQL statements in a procedure instead of the single SQL statement allowed in the current SQL standard, - flow of control statements, such as looping, branching, etc., - exception handling, so that when an exception is raised, the SQL function or procedure can resolve the issue internally, or propagate the exception to the next outermost exception handler, rather than always returning control to the main calling routine. These 3GL programming language facilities are valuable because they allow procedural encapsulation and they allow complete behavior to be specified within an ADT definition without the need to escape to a procedure written in some other language. Complex behavior can be made available to the host application program via a single call. This offers benefits in both cost and control. In SQL'92, all procedures are single SQL statements so multiple calls must be made to address complex problems. All temporary state and flow of control belong to the host language application, thereby adding complexity to the application program that could be encapsulated in the SQL procedure. The facilities in the following paragraphs are included in the current SQL3 Working Draft [9]. 3.1 Compound statement A compound statement is a statement that allows a collection of SQL statements to be grouped together into a "block". A compound statement may declare its own local variables and specify exception handling for an exception that occurs during execution of any statement in the group. Its syntax is as follows: [ <beginning label>: ] [ <variable declaration list> ] BEGIN [ <SQL statement list> ] [ <exception handler> ] END [ <ending label> ] <exception handler> ::= EXCEPTION [ {WHEN <condition> THEN <SQL statement list>}... ] <condition> ::= <exception name list> | OTHER An <exception name> is unique within a <module> and may be declared with an exception declaration (see 3.2). 3.2 Exception handling An exception declaration establishes a one-to-one correspondence between an SQLSTATE error condition and a user-defined exception name. It's syntax is as follows: DECLARE <exception name> EXCEPTION FOR SQLSTATE <SQLSTATE literal> The exception handling mechanism under consideration for SQL3 is based very strongly on the mechanism defined in Ada. Each compound statement is assumed to have an exception handler; if one is not explicitly defined, then a default handler is provided by the system. When the execution of a statement results in an active exception condition, then the containing exception handler is immediately given control. Ultimately, the exception handler terminates with one of the following behaviors: - The compound statement terminates with the active exception condition still active, or - The compound statement terminates with a new active exception condition, or - The compound statement terminates successfully, as though no exception occurred, and there is no outstanding active exception condition. The exception handler may execute SIGNAL or RESIGNAL statements to identify a new exception name or to pass on the existing exception name. If an exception condition occurs in the exception handler itself, then the compound statement is terminated and that exception condition becomes the "active" exception condition. 3.3 Flow of control The following program flow of control statements are currently specified in the draft SQL3 document: - A CASE statement to allow selection of an execution path based on alternative choices. A <value expression> is executed and, depending on the result, control is transferred to the appropriate block of statements. - An IF statement with THEN, ELSE, and ELSEIF alternatives to allow selection of an execution path based on the truth value of one or more conditions. - A LOOP statement, with a WHILE clause, to allow repeated execution of a block of SQL statements based on the continued true result of the <search condition> in the WHILE clause. A LOOP statement is also allowed to have a statement label. - A LEAVE statement to provide a graceful exit from a block or loop statement. 4 Parameterized Types The ability to define abstract data types does not by itself provide the capability to define "parameterized" types. A parameterized type is really a "type family" with a new data type generated for each value of an input parameter. For example, an ADT definition for VECTOR(N) can be thought of as a family of data types, one for each positive integer value of N. We may think of a "parameter" as any value of a data type known to the SQL environment, e.g. an integer value in the example above. We may also think of a "parameter" as a reference to an existing data type, rather than as a value of that type. For example, we may wish to specify that VECTOR(N) is really a vector of integers, or reals, or decimals with fixed precision and scale. This can be achieved by passing a data type name to the ADT definition instead of just a data type value. The syntax for specifying a parameterized type in SQL3 is very similar to that for specifying a regular ADT, namely: CREATE TYPE TEMPLATE <template name> ( { <template parameter declaration> }... ) <abstract data type body> <template parameter declaration> ::= <parameter name> { <data type> | TYPE } The keyword TEMPLATE indicates that the specification is for a parameterized ADT rather than a regular ADT. The keyword TYPE indicates that a parameter is a data type name rather than a data type value. The <abstract data type body> is analogous to the body of a regular ADT definition. A parameterized type is referenced by specifying the type template name and an actual parameter list. Each actual parameter must be a value, or a data type, that can be determined at syntax evaluation time, i.e. usually a literal or a data type name. If the actual parameter is a data type name, then the formal template parameter must specify TYPE (see [2]). You are allowed to define more than one type template with the same name, just as you may define more than one <SQL function> with the same name. The rules for matching a parameterized type reference to a parameterized type definition are the same as the rules for matching polymorphic functions (see 2.6). 4.1 Generator types A generator type is a special parameterized type supported in the SQL language itself, rather than defined by a user. Examples of generator types are LIST, SET, and ARRAY. At the present time SQL3 only defines a limited number of base data types, including: fixed and variable length character strings, fixed and variable length bit strings, fixed and floating point numerics, dates, times, timestamps, intervals, Boolean, and enumerations. The attributes of an ADT must therefore be defined as one of these base data types or as a previously defined ADT. There is a need to extend the pre-defined, base data types to include "generator" data types such as those specified in the emerging ISO common language-independent datatype (CLID) specification [7]. The CLID specification includes, among others, the following generator types: ARRAY {[<lower>..<upper>]}... OF <base type> LIST OF <base type> SET OF <base type> CHOICE ({<identifier>:<base type>}...) RECORD ({<identifier>:<base type>}...) ARRAY is a parameterized data type consisting of a finite number of dimensions, whose input parameters are the upper and lower limits of each dimension, and a specification of the base data type of each item in the array. Base operations on an array are very simple, consisting only of the ability to select or replace an item from the array and test for array equality. There is no LESS THAN comparison for sequential ordering. LIST creates a new data type whose values are ordered sequences of values from the <base type>, including the empty sequence. Base operations on a list are the very simple Head and Tail operations that determine an ordering. Other operations to deal with "position" within a list could be added in subtype specializations. SET creates a new data type whose values are taken from the power set (i.e. the set of all subsets) of the <base type>, with operations appropriate to a mathematical set algebra (i.e. Union, Intersection, and Complement). In order to ensure uniqueness of representation, the <base type> is required to be discrete, meaning it cannot have a distance function defined that yields any limit points that are elements of the <base type>. CHOICE creates a new data type each of whose values is a single value from one of a set of alternative data types, logically labelled by the name of the alternative selected. Each alternative data type is labelled by a "field name" that is unique within a given Choice-type, but there is no requirement that the <base type>s themselves be unique within a Choice-type definition. Operations on instances of a Choice-type include: checking for equality, creating new Choice instances, testing the field name of a given instance, and casting from a Choice instance to an instance of the base data type of the named field. RECORD creates a new data type whose values are aggregations of named values from a collection of named components. Each component consists of a "field name" and an associated <base type>. Each aggregation has value for each named field. Operations on Record-type allow checks for equality, building an aggregate from field values, and selecting the value of a given field. 4.2 Distinct types Sometimes it is desirable to be able to distinguish between table or ADT attributes that have the same underlying ADT definition. For example, table T1 might have a column named Cartesian_Coordinate that is defined to have the data type POINT and table T2 might have a column named Polar_Coordinate that is also defined to have the data type POINT. The POINT data type may have a DISTANCE function defined to calculate the distance between any two points, but clearly the calculation DISTANCE ( T1.Cartesian_Coordinate, T2.Polar_Coordinate) may not be a meaningful calculation. The SQL3 draft provides a facility for the user to declare that two otherwise equivalent ADT declarations are to be treated as "distinct" data types. The keyword DISTINCT used in an ADT declaration indicates that the resulting type is to be treated as "distinct" from any other declaration of the same ADT. 5 Stored Procedures In the existing SQL'92 standard, a module is a persistent object created by the Module Language. It is a named package of procedures that can be called from an application program, where each procedure consists of exactly one SQL statement. However, there is no requirement that an implementation be able to execute Module Language (the alternative is Embedded SQL) and the resulting persistent module is not stored as part of the SQL schema, is not reflected in the information schema tables, and cannot be passed across an RDA connection to a remote site. In the emerging SQL3 specification, standardization committees have recognized the requirement for some "standard" capability to define persistent modules that "live" in the SQL schema and whose procedures may be called from any SQL statement in the same processing environment. In SQL3 the CREATE MODULE statement has the same status as any other schema definition statement. The result of execution is a module that is managed by SQL rather than by the proprietary facilities of the host operating environment. Module definitions are reflected in the Information Schema just like any other schema object and they are subject to ownership and access control declarations. A module definition consists of a LANGUAGE clause, to identify the programming language to which it is bound, an AUTHORIZATION clause to identify a Schema, and a collection of procedures. Each procedure consists of an SQL parameter list and a single SQL statement, which may be a compound statement (see 3.1). An SQL CALL statement can access any of the procedures, either locally or remotely, and pass parameters to it. The primary benefit of persistent, stored procedures is that implementations are able to optimize groups of statements rather than just individual statements. Entire packages of SQL procedures can be sent across a wire to a remote SQL conforming site, be optimized at that site, and then be executed with a single call when needed. The ISO RDA standardization committee is now considering RDA enhancements that would support database stored procedures that persist beyond the end of an RDA dialogue. 6 Supporting Features The features discussed above are not the only SQL enhancements specified in the SQL3 Working Draft. Some of the following features offer desirable functional extensions while at the same time support the object management goals of abstraction and encapsulation. These features have "preliminary" syntax and semantics specified in SQL3, and generic support for some of them may already be included in various commercial SQL products. However, all SQL3 specifications are subject to substantial evolution or reconsideration before adoption in any future SQL standard, so extreme care must be used before specifying any of them as requirements in an SQL procurement. Dynamic assertions. Support for integrity constraints that are triggered by specific database actions, such as after update or before insertion. Assertions are "dynamic" in that they may reference before and after images of the database or may depend upon temporary data values that only exist at the time the invoking statement is executed. Dynamic triggers. Support for triggering a sequence of database actions based on a specific database action, such as after delete, thereby supporting the object notion of encapsulation. Assertions and Triggers make it possible for object self-management to be fully specified in a database schema, with increased opportunity for performance optimization by the underlying database management system. Recursive expressions. Support for SQL expressions of indefinite, recursive depth, such as those arising out of "bill-of-materials" part's hierarchies. Existential and universal quantifiers. Support for more mathematically based SQL expressions involving the existential and universal quantifiers prevalent in 3-valued predicate logic. SIMILAR predicate. A facility for pattern matching in bit strings and character strings that allows construction of regular expressions equivalent to regular expressions in the POSIX standard, ISO/IEC 9945. Multiple null states. A facility that allows user definitions for an arbitrary number of application specific Null values, such as "Unknown", "Missing", "Not Applicable", "Pending", etc. Each such Null value would have a different representation in the database so that they can be distinguished by query expressions during retrieval or update. Roles and data security. An enhanced facility for database security management that builds upon the existing Grant and Revoke definitions. It extends the security model to include named "roles" in addition to schema objects, actions, and users. With roles defined as a nested collection of authorized actions on schema objects, security administration becomes more efficient and manageable. Savepoints and subtransactions. A subtransaction is a portion of a transaction that is marked for potential rollback without affecting the other parts of the transaction. By setting and releasing savepoints, an application programmer is able to recover more easily from failed subtransactions, thereby leading to more efficient code. Database export and import. Database export provides utilities for unloading a database definition and the data contents of a database into an external form, representable on various media, for the purpose of later automated re-generation. Database import provides utilities for loading a database definition and contents from an external source. Standardized specifications would make a database exported from any conforming SQL implementation importable into any other conforming SQL implementation. Cursor sensitivity. The ability to specify a SENSITIVE option on a cursor definition, so that the cursor will always "see" concurrent modifications, in the same transaction, made to the underlying tables of the cursor definition. This provides a measure of predictability, but possibly with performance implications. Asynchronous DML. Support for being able to name SQL statements so that other work can be done while they are executing. The names provide a mechanism for querying the status of outstanding statements. SQL multimedia class library (SQL/MM). With the existence of abstract data types (ADTs) and generator types in SQL3, there is a new opportunity to consider standardization of the SQL interface to packages of high-level data objects for use in various application areas. It makes sense to standardize packages for science and engineering, full- text and document processing, or methods for the management of multimedia objects such as image, sound, animation, music, and video. A new standardization project has been proposed by ISO/IEC JTC1/SC21/WG3 to develop a "companion standard" for SQL3 that would specify a class library of multimedia and other useful ADT packages. This SQL/MM standard would provide an SQL language binding for multimedia objects defined by other standardization bodies (e.g. SC18 for documents, SC24 for images, and SC29 for photographs and motion pictures). SQL external repository interface (SQL/ERI). Not all data repositories are managed by SQL conforming processors. In fact, most data in the real world is probably accessible only through non-SQL data managers. Such data may be managed by non-SQL object database systems, library information retrieval systems, geographic information systems, or special purpose processors. There is a real user requirement to gain access to this data in a standardized, full-function manner. It makes sense to specify a "client" and a "server" interface to external repositories so that non-SQL systems can act as servers to SQL requests for data. Non-SQL systems could use SQL3 ADT facilities to describe themselves to the external world, and then claim conformance to the new standard as an SQL/ERI Server. They would need only respond to very simple SQL statements over a single ADT class at one time. SQL processors could access this simple server interface, as SQL/ERI Clients, and then provide full-function SQL views over that data to SQL applications. In this way an application program could use the full power and flexibility of the SQL language - with joins, recursive subqueries, transaction management, access control, and searched updates and deletes - without putting an undo processing burden on specialized data repositories. Proposals to pursue an SQL/ERI standard are under discussion in ISO/IEC JTC1/SC21/WG3. 7 Conclusion Database Language SQL provides standardized facilities for defining, managing, and protecting data. With conforming implementations available on all sizes and makes of computing equipment, the SQL standard is leading the way toward unprecedented portability of database applications. The emerging SQL3 specification includes object management capabilities over abstract, user-defined data types, thereby making SQL3 a complete language for creating, managing, and querying persistent objects. The new RDA standard promises to complete the link among SQL products from different vendors, leading to true open systems interconnection and interoperability among conforming SQL systems. Emerging specifications for future revisions of SQL and RDA promise enhanced facilities to support intelligent objects and knowledge-based applications in a distributed processing environment. References [1] Bauer, Jonathan, et al. "Polymorphic functions", document ISO/IEC JTC1/SC21/ WG3 DBL KAW-7 (X3H2-91-135), June 1991. [2] Bauer, Jonathan, et al. "Parameterized types", document ISO/IEC JTC1/SC21/WG3 DBL KAW-41 (X3H2-91-274), October 1991. [3] Beech, David and Hasan Rizvi. "Tables and subtables as sets of objects", document ISO/IEC JTC1/SC21/WG3 DBL KAW-8 (X3H2-91- 142rev), June 1991. [4] Cannan, S.J. and G.A.M. Otten. SQL - The Standard Handbook, McGraw- Hill Book Co, Berkshire SL6 2QL England, October 1992. [5] Date, C.J. with Hugh Darwen. A Guide to the SQL Standard, Addison- Wesley Publishing, Reading, MA 01867 USA, October 1992. [6] Eisenberg, Andrew and B. Johnston. "Additional control statements for SQL", document ISO DBL KAW-14 (X3H2-91-179), June 1991. [7] ISO/IEC CD 11404. Common Language-Independent Datatypes (CLID), Committee Draft, document ISO/IEC JTC1/SC22/WG11 N319R, September 1992. [8] ISO/IEC 9075. Database Language SQL, International Standard ISO/IEC 9075:1992, American National Standard X3.135-1992, American National Standards Institute, New York, NY 10036, November 1992. [9] ISO/IEC SQL Revision. ISO-ANSI Working Draft Database Language SQL (SQL3), Jim Melton - Editor, document ISO/IEC JTC1/SC21 N6931, American National Standards Institute, New York, NY 10036, Tel: +1- 212-642-4900, July 1992. [10] ISO/IEC 9579. Open Systems Interconnection - Remote Database Access (RDA), Part 1: Generic Model and Part 2: SQL Specialization, U.S. public review text, document ANSI BSR X3.217-199x, Global Engineering Documents, Irvine, CA 92714, November 1991. [11] Kulkarni, Krishna, et al. "Inheritance for ADT's", document ISO/IEC JTC1/SC21/WG3 DBL KAW-6 (X3H2-91-133), June 1991. [12] Melton, Jim and Alan Simon. Understanding the New SQL: A Complete Guide, Morgan Kauffman Publishers, San Mateo, CA 94403, October 1992.