org.cyberneko.html
Class HTMLConfiguration

java.lang.Object
  |
  +--org.apache.xerces.util.ParserConfigurationSettings
        |
        +--org.cyberneko.html.HTMLConfiguration
All Implemented Interfaces:
org.apache.xerces.xni.parser.XMLComponentManager, org.apache.xerces.xni.parser.XMLParserConfiguration

public class HTMLConfiguration
extends org.apache.xerces.util.ParserConfigurationSettings
implements org.apache.xerces.xni.parser.XMLParserConfiguration

An XNI-based parser configuration that can be used to parse HTML documents. This configuration can be used directly in order to parse HTML documents or can be used in conjunction with any XNI based tools, such as the Xerces2 implementation.

This configuration recognizes the following features:

This configuration recognizes the following properties:

For complete usage information, refer to the documentation.

Version:
$Id$
Author:
Andy Clark
See Also:
HTMLScanner, HTMLTagBalancer, HTMLErrorReporter

Inner Class Summary
protected  class HTMLConfiguration.ErrorReporter
          Defines an error reporter for reporting HTML errors.
 
Field Summary
protected static java.lang.String AUGMENTATIONS
          Include infoset augmentations.
protected static java.lang.String BALANCE_TAGS
          Balance tags.
protected static java.lang.String ERROR_DOMAIN
          Error domain.
protected static java.lang.String ERROR_REPORTER
          Error reporter.
protected  org.apache.xerces.xni.XMLDocumentHandler fDocumentHandler
          Document handler.
protected  HTMLScanner fDocumentScanner
          Document scanner.
protected  org.apache.xerces.xni.XMLDTDContentModelHandler fDTDContentModelHandler
          DTD content model handler.
protected  org.apache.xerces.xni.XMLDTDHandler fDTDHandler
          DTD handler.
protected  org.apache.xerces.xni.parser.XMLEntityResolver fEntityResolver
          Entity resolver.
protected  org.apache.xerces.xni.parser.XMLErrorHandler fErrorHandler
          Error handler.
protected  HTMLErrorReporter fErrorReporter
          Error reporter.
protected  java.util.Vector fHTMLComponents
          Components.
protected static java.lang.String FILTERS
          Pipeline filters.
protected  java.util.Locale fLocale
          Locale.
protected  HTMLTagBalancer fTagBalancer
          HTML tag balancer.
protected static java.lang.String NAMES_ATTRS
          Modify HTML attribute names: { "upper", "lower", "default" }.
protected static java.lang.String NAMES_ELEMS
          Modify HTML element names: { "upper", "lower", "default" }.
protected static java.lang.String REPORT_ERRORS
          Report errors.
protected static java.lang.String SIMPLE_ERROR_FORMAT
          Simple report format.
protected static boolean XERCES_2_0_0
          Parser version is Xerces 2.0.0.
protected static boolean XERCES_2_0_1
          Parser version is Xerces 2.0.1.
 
Fields inherited from class org.apache.xerces.util.ParserConfigurationSettings
fFeatures, fParentSettings, fProperties, fRecognizedFeatures, fRecognizedProperties
 
Constructor Summary
HTMLConfiguration()
          Default constructor.
 
Method Summary
protected  void addComponent(HTMLComponent component)
          Adds a component.
 org.apache.xerces.xni.XMLDocumentHandler getDocumentHandler()
          Returns the document handler.
 org.apache.xerces.xni.XMLDTDContentModelHandler getDTDContentModelHandler()
          Returns the DTD content model handler.
 org.apache.xerces.xni.XMLDTDHandler getDTDHandler()
          Returns the DTD handler.
 org.apache.xerces.xni.parser.XMLEntityResolver getEntityResolver()
          Returns the entity resolver.
 org.apache.xerces.xni.parser.XMLErrorHandler getErrorHandler()
          Returns the error handler.
 java.util.Locale getLocale()
          Returns the locale.
 void parse(org.apache.xerces.xni.parser.XMLInputSource source)
          Parses a document.
 void pushInputSource(org.apache.xerces.xni.parser.XMLInputSource inputSource)
          Pushes an input source onto the current entity stack.
protected  void reset()
          Resets the parser configuration.
 void setDocumentHandler(org.apache.xerces.xni.XMLDocumentHandler handler)
          Sets the document handler.
 void setDTDContentModelHandler(org.apache.xerces.xni.XMLDTDContentModelHandler handler)
          Sets the DTD content model handler.
 void setDTDHandler(org.apache.xerces.xni.XMLDTDHandler handler)
          Sets the DTD handler.
 void setEntityResolver(org.apache.xerces.xni.parser.XMLEntityResolver resolver)
          Sets the entity resolver.
 void setErrorHandler(org.apache.xerces.xni.parser.XMLErrorHandler handler)
          Sets the error handler.
 void setFeature(java.lang.String featureId, boolean state)
          Sets a feature.
 void setLocale(java.util.Locale locale)
          Sets the locale.
 void setProperty(java.lang.String propertyId, java.lang.Object value)
          Sets a property.
 
Methods inherited from class org.apache.xerces.util.ParserConfigurationSettings
addRecognizedFeatures, addRecognizedProperties, checkFeature, checkProperty, getFeature, getProperty
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 
Methods inherited from interface org.apache.xerces.xni.parser.XMLParserConfiguration
addRecognizedFeatures, addRecognizedProperties, getFeature, getProperty
 

Field Detail

AUGMENTATIONS

protected static final java.lang.String AUGMENTATIONS
Include infoset augmentations.

REPORT_ERRORS

protected static final java.lang.String REPORT_ERRORS
Report errors.

SIMPLE_ERROR_FORMAT

protected static final java.lang.String SIMPLE_ERROR_FORMAT
Simple report format.

BALANCE_TAGS

protected static final java.lang.String BALANCE_TAGS
Balance tags.

NAMES_ELEMS

protected static final java.lang.String NAMES_ELEMS
Modify HTML element names: { "upper", "lower", "default" }.

NAMES_ATTRS

protected static final java.lang.String NAMES_ATTRS
Modify HTML attribute names: { "upper", "lower", "default" }.

FILTERS

protected static final java.lang.String FILTERS
Pipeline filters.

ERROR_REPORTER

protected static final java.lang.String ERROR_REPORTER
Error reporter.

ERROR_DOMAIN

protected static final java.lang.String ERROR_DOMAIN
Error domain.

fDocumentHandler

protected org.apache.xerces.xni.XMLDocumentHandler fDocumentHandler
Document handler.

fDTDHandler

protected org.apache.xerces.xni.XMLDTDHandler fDTDHandler
DTD handler.

fDTDContentModelHandler

protected org.apache.xerces.xni.XMLDTDContentModelHandler fDTDContentModelHandler
DTD content model handler.

fErrorHandler

protected org.apache.xerces.xni.parser.XMLErrorHandler fErrorHandler
Error handler.

fEntityResolver

protected org.apache.xerces.xni.parser.XMLEntityResolver fEntityResolver
Entity resolver.

fLocale

protected java.util.Locale fLocale
Locale.

fHTMLComponents

protected java.util.Vector fHTMLComponents
Components.

fDocumentScanner

protected HTMLScanner fDocumentScanner
Document scanner.

fTagBalancer

protected HTMLTagBalancer fTagBalancer
HTML tag balancer.

fErrorReporter

protected HTMLErrorReporter fErrorReporter
Error reporter.

XERCES_2_0_0

protected static boolean XERCES_2_0_0
Parser version is Xerces 2.0.0.

XERCES_2_0_1

protected static boolean XERCES_2_0_1
Parser version is Xerces 2.0.1.
Constructor Detail

HTMLConfiguration

public HTMLConfiguration()
Default constructor.
Method Detail

pushInputSource

public void pushInputSource(org.apache.xerces.xni.parser.XMLInputSource inputSource)
Pushes an input source onto the current entity stack. This enables the scanner to transparently scan new content (e.g. the output written by an embedded script). At the end of the current entity, the scanner returns where it left off at the time this entity source was pushed.

Note: This functionality is experimental at this time and is subject to change in future releases of NekoHTML.

Parameters:
inputSource - The new input source to start scanning.

setFeature

public void setFeature(java.lang.String featureId,
                       boolean state)
                throws org.apache.xerces.xni.parser.XMLConfigurationException
Sets a feature.
Specified by:
setFeature in interface org.apache.xerces.xni.parser.XMLParserConfiguration
Overrides:
setFeature in class org.apache.xerces.util.ParserConfigurationSettings

setProperty

public void setProperty(java.lang.String propertyId,
                        java.lang.Object value)
                 throws org.apache.xerces.xni.parser.XMLConfigurationException
Sets a property.
Specified by:
setProperty in interface org.apache.xerces.xni.parser.XMLParserConfiguration
Overrides:
setProperty in class org.apache.xerces.util.ParserConfigurationSettings

setDocumentHandler

public void setDocumentHandler(org.apache.xerces.xni.XMLDocumentHandler handler)
Sets the document handler.
Specified by:
setDocumentHandler in interface org.apache.xerces.xni.parser.XMLParserConfiguration

getDocumentHandler

public org.apache.xerces.xni.XMLDocumentHandler getDocumentHandler()
Returns the document handler.
Specified by:
getDocumentHandler in interface org.apache.xerces.xni.parser.XMLParserConfiguration

setDTDHandler

public void setDTDHandler(org.apache.xerces.xni.XMLDTDHandler handler)
Sets the DTD handler.
Specified by:
setDTDHandler in interface org.apache.xerces.xni.parser.XMLParserConfiguration

getDTDHandler

public org.apache.xerces.xni.XMLDTDHandler getDTDHandler()
Returns the DTD handler.
Specified by:
getDTDHandler in interface org.apache.xerces.xni.parser.XMLParserConfiguration

setDTDContentModelHandler

public void setDTDContentModelHandler(org.apache.xerces.xni.XMLDTDContentModelHandler handler)
Sets the DTD content model handler.
Specified by:
setDTDContentModelHandler in interface org.apache.xerces.xni.parser.XMLParserConfiguration

getDTDContentModelHandler

public org.apache.xerces.xni.XMLDTDContentModelHandler getDTDContentModelHandler()
Returns the DTD content model handler.
Specified by:
getDTDContentModelHandler in interface org.apache.xerces.xni.parser.XMLParserConfiguration

setErrorHandler

public void setErrorHandler(org.apache.xerces.xni.parser.XMLErrorHandler handler)
Sets the error handler.
Specified by:
setErrorHandler in interface org.apache.xerces.xni.parser.XMLParserConfiguration

getErrorHandler

public org.apache.xerces.xni.parser.XMLErrorHandler getErrorHandler()
Returns the error handler.
Specified by:
getErrorHandler in interface org.apache.xerces.xni.parser.XMLParserConfiguration

setEntityResolver

public void setEntityResolver(org.apache.xerces.xni.parser.XMLEntityResolver resolver)
Sets the entity resolver.
Specified by:
setEntityResolver in interface org.apache.xerces.xni.parser.XMLParserConfiguration

getEntityResolver

public org.apache.xerces.xni.parser.XMLEntityResolver getEntityResolver()
Returns the entity resolver.
Specified by:
getEntityResolver in interface org.apache.xerces.xni.parser.XMLParserConfiguration

setLocale

public void setLocale(java.util.Locale locale)
Sets the locale.
Specified by:
setLocale in interface org.apache.xerces.xni.parser.XMLParserConfiguration

getLocale

public java.util.Locale getLocale()
Returns the locale.
Specified by:
getLocale in interface org.apache.xerces.xni.parser.XMLParserConfiguration

parse

public void parse(org.apache.xerces.xni.parser.XMLInputSource source)
           throws org.apache.xerces.xni.XNIException,
                  java.io.IOException
Parses a document.
Specified by:
parse in interface org.apache.xerces.xni.parser.XMLParserConfiguration

addComponent

protected void addComponent(HTMLComponent component)
Adds a component.

reset

protected void reset()
              throws org.apache.xerces.xni.parser.XMLConfigurationException
Resets the parser configuration.


(C) Copyright 2002, Andy Clark. All rights reserved.