The application can set a variety of NekoHTML settings to more
precisely control the behavior of the parser. These settings
can be set directly on the HTMLConfiguration
class
or on the supplied parser classes by calling the
setFeature
and setProperty
methods.
For example:
// settings on HTMLConfiguration org.apache.xerces.xni.parser.XMLParserConfiguration config = new org.cyberneko.html.HTMLConfiguration(); config.setFeature("http://cyberneko.org/html/features/augmentations", true); config.setProperty("http://cyberneko.org/html/properties/names/elems", "lower"); // settings on DOMParser org.cyberneko.html.parser.DOMParser parser = new org.cyberneko.html.parser.DOMParser(); parser.setFeature("http://cyberneko.org/html/features/augmentations", true); parser.setProperty("http://cyberneko.org/html/properties/names/elems", "lower");
Property Id / Description | Values | Default |
---|---|---|
http://cyberneko.org/html/properties/filters
This property allows applications to append custom document processing components to the end of the default NekoHTML parser pipeline. The value of this property must be an array of type org.apache.xerces.xni.parser.XMLDocumentFilter
and no value of this array is allowed to be null. The document
filters are appended to the parser pipeline in array order.
Please refer to the filters
documentation for more information.
| null | |
http://cyberneko.org/html/properties/default-encoding
Sets the default encoding the NekoHTML scanner should use when parsing documents. In the absence of an http-equiv directive in the source document,
this setting is important because the parser does not
have any support to auto-detect the encoding.
| IANA encoding names | |
http://cyberneko.org/html/properties/names/elems
Specifies how the NekoHTML components should modify recognized element names. Names can be converted to upper-case, converted to lower-case, or left as-is. The value of "match" specifies that element names are to be left as-is but the end tag name will be modified to match the start tag name. This is required to ensure that the parser generates a well-formed XML document. | "upper" "lower" "match" | "upper" |
http://cyberneko.org/html/properties/names/attrs
Specifies how the NekoHTML components should modify attribute names of recognized elements. Names can be converted to upper-case, converted to lower-case, or left as-is. | "upper" "lower" | "lower" |