Converting DTD Grammars

Provided Stylesheets

Using XSLT, a DTDx document can easily be transformed to another grammar format. A DTD grammar can be flattened or converted to another format like XML Schema. To make this easier, NekoDTD includes the following stylesheets in the package:

Performing a Transformation

In order to convert a DTD file to another format, use the following steps:

  1. Parse the DTD file with NekoDTD and serialize the XML representation to a file.
  2. Process output of first step with stylesheet of choice using an XSLT processor, such as Xalan.

Convenience Batch Files

To eliviate the burden of performing these steps, the NekoDTD package includes a number of useful batch files to run these steps on Windows. [Sorry, no shell scripts are available at this time. But it would be very easy to port the .bat files to .sh.] The following batch files are included:

These batch files assume that you have downloaded Xerces2 and Xalan and placed the appropriate Jar files in the lib/ directory. Note: NekoDTD does not provide these Jar files.

Converting Sample DTD to XML Schema

The data/dtd/ directory contains a sample DTD grammar called test.dtd; an XML document called test.xml that references the DTD for validation in the DOCTYPE line; and an XML document called test-schema.xml that references the generated XML Schema grammar called test.xsd via the xsi:noNamespaceSchemaLocation attribute. For convenience, a copy of the DTD grammar converted to XML Schema is also provided and is called test.xsd.

The sample DTD grammar looks like this:

<!ELEMENT root (foo|(bar,baz)+)*>
<!ATTLIST root version CDATA #FIXED '1.0'>
<!ELEMENT foo EMPTY>
<!ELEMENT bar (#PCDATA)>
<!ELEMENT baz (#PCDATA|mumble)*>
<!ELEMENT mumble ANY>

Running the dtd2xsd batch file as shown with the sample DTD grammar:

> dtd2xsd data/dtd/test.dtd

produces the following equivalent XML Schema grammar:

<?xml version="1.0" encoding="UTF-8" ?> 
<!-- Generated from data/dtd/test.dtd -->
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">
 <!-- <!ELEMENT root (foo|(bar,baz)+)*> -->
 <xsd:element name="root">
  <xsd:complexType>
   <xsd:choice minOccurs="0" maxOccurs="unbounded">
    <xsd:element ref="foo" />
    <xsd:sequence minOccurs="1" maxOccurs="unbounded">
     <xsd:element ref="bar" />
     <xsd:element ref="baz" />
    </xsd:sequence>
   </xsd:choice>
   <!-- <!ATTLIST root version CDATA #FIXED "1.0"> -->
   <xsd:attribute name="version" fixed="1.0">
    <xsd:simpleType>
     <xsd:restriction base="xsd:string" />
    </xsd:simpleType>
   </xsd:attribute>
  </xsd:complexType>
 </xsd:element>
 ...
</xsd:schema>

This can be a very convenient way to convert existing DTDs in order to transition to using XML Schema.

The un-flattened DTDx document provides enough information to analyze the DTD declarations and produce more meaningful XML Schema content types. However, no stylesheet or code is currently provided with NekoDTD to perform this type of processing. [If you would like to write such a stylesheet or a tool to perform this conversion from the DTDx file, please contact me.]