The OpenMCL interface translator

Overview

OpenMCL uses an interface translation system based on the FFIGEN system (described here and here) to make the constant, type, structure, and function definitions in a set of .h files available to lisp code.

The basic idea of the FFIGEN scheme is to use the C compiler's frontend and parser to translate .h files into semantically equivalent .ffi files, which use an S-expression - based syntax to represent the definitions contained in those headers. Lisp code can then concentrate on the .ffi representation, without having to concern itself with the semantics of header file inclusion or the arcana of C parsing.

The original FFIGEN system used a modified version of the LCC C compiler to produce .ffi files. Since many LinuxPPC header files contain GCC-specific constructs, OpenMCL's translation system uses a modified version of GCC (called, somewhat confusingly, ffigen.) A LinuxPPC binary is available at ftp://clozure.com/pub/ffigen.tar.gz and source differences are at ftp://clozure.com/pub/ffigen-src.tar.gz

A shell script (distributed with the source and binary packages) called h-to-ffi.sh reads a specified .h file (and optional preprocessor arguments) and writes a (hopefully) equivalent .ffi file to standard output, calling the installed C preprocessor and the ffigen program with appropriate arguments.

Another shell script (distributed with OpenMCL as "ccl:headers;C;populate.sh" calls h-to-ffi.sh on a large number of the header files in /usr/include and creates a parallel directory tree in "ccl:headers;C;usr;include;", populating that directory with .ffi files.

A lisp function defined in "ccl:library;parse-ffi.lisp" translates a specified list of .ffi files into a set of corresponding .lisp files (in "ccl:headers;usr;include;") and, in the process, generates new versions of the GDBM databases ("ccl:headers;constants.gdbm", "ccl:headers;functions.gdbm", "ccl:headers;records.gdbm", and "ccl:headers;types.gdbm".) The .lisp files produced in this step aren't used directly by OpenMCL, but may be interesting as reference material: the information in the .gdbm files is an encoded version of the union of the information in the .lisp files.

Most of the entities in the .gdbm files are named (this is true of all types, constants, and functions and of most record types.) These names (and gensyms used to uniquely identify anonymous records) are mapped to upper case and the resulting strings are used as database keys. (The case of external function names is preserved, and this information is stored - along with parameter type information - in the "value" associated with that key.)

This means that if two distinct foreign entities - the hypothetical functions Open and oPeN, for instance - differ only in case, one of these (arbitrarily) will be accessible in the database under the key OPEN. It's assumed that there are some cases where this occurs, but it's not known how often conflicts happen. At this point, the convenience of being able to ignore case issues from Lisp seems to be more important in practice.

The GDBM databases are used by the #$ and #_ reader macros and are used in the expansion of RREF, RLET, and related macros. GDBM is licensed under different terms (the GPL) than OpenMCL (which is licensed under the LGPL) and this may have implications for those parties wishing to distribute OpenMCL-based applications. The code in OpenMCL that uses GDBM is isolated in the files "ccl:lib;db-io.lisp" and "ccl:binppc;db-io.pfsl" and OpenMCL's use of that code is limited to read-time and macroexpand-time. The intent is that GDBM-related code is easily isolated and removed from an OpenMCL application; another approach would be to replace GDBM - which is very good at what it does - with something that offered different licensing terms.

Details

There's probably no such thing as a "standard" set of Linux header files. (Perhaps it's more accurate to say that there are a large number of standards.) Different releases of different distributions may install different versions of different sets of header files in different locations, and users install different sets of different optional packages. (For the benefit of those not paying careful attention, the operative word seems to be "different".)

The populate.sh shell script generates .ffi files from a set of header files installed on a Debian 2.2 LinuxPPC system with a number of optional and local packages installed. Most of the foreign code used internally in OpenMCL is from a small, fairly stable Linux subset; there may be significant differences between the header file information used to generate the distributed GDBM database files and the APIs used in a more recent distribution. (This may be especially true of some high-level user-interface libraries.)

Rebuilding the GDBM databases, step by step

  1. You may wish to make backup copies of the GDBM databases in "ccl:headers;*.gdbm"
  2. Ensure that the FFIGEN program is installed. See the "README" file in the source or binary archive for specific installation instructions.
  3. Edit the "populate.sh" shell script. When you're confident that the files and preprocessor options match your environment, cd to the "ccl:headers;C;" directory and invoke ./populate.sh. Repeat this step until you're able to cleanly translate all files refrenced in the shell script.
  4. Run OpenMCL:
    ? (require "PARSE-FFI")
    PARSE-FFI
    ? (parse-standard-ffi-files (directory "ccl:headers;C;**;*.ffi"))
    ;;; lots of output ... after a while, shiny new .gdbm files should
    ;;; appear in "ccl:headers;"
    	

Last modified: Mon Sep 17 07:46:05 PDT 2001