rabbit.html
Class HTMLParser

java.lang.Object
  |
  +--rabbit.html.HTMLParser

public class HTMLParser
extends java.lang.Object

This is a class that is used to parse a block of HTML code into separate tokens. This parser uses a recursive descent approach.


Field Summary
protected  HTMLBlock block
          The block we have.
static int COMMENT
          A HTML comment "<!-- some text -->"
static int DOUBLEQUOTE
          This is the character '"'
static int DQSTRING
          This is a Double Quoted String a "string"
static int END
          This indicates the end of a block.
static int EQUALS
          Equals '='
protected  int index
          Index of the parse.
protected  int lastTagStart
          The last tag started here.
protected  int length
          The size of the data to parse.
static int LT
          Less Than '<'
static int MT
          More Than '>'
protected  int nextToken
          The type of the next token.
protected  byte[] pagepart
          The actual data to parse.
static int SINGELQUOTE
          This is the character '''
static int SQSTRING
          This is a Single Quoted String a 'string'
static int START
          This indicates the start of a block.
static int STRING
          This indicate a String value was found.
protected  int stringLength
          the current start of string.
protected  java.lang.String stringValue
          The current value as a String.
protected  boolean tagmode
          True if were in a Tag, false otherwise.
protected  int tagStart
          The current tag started here.
static int UNKNOWN
          Unknown token.
 
Constructor Summary
HTMLParser()
          Create a new HTMLParser
HTMLParser(byte[] page)
          Create a new HTMLParser for the given page.
 
Method Summary
protected  void arglist(Tag tag)
          Scan an argument list from the block.
protected  java.lang.String getTokenString(int token)
          Get a String describing the token.
protected  boolean isComment()
          Is this tag a comment?
static void main(java.lang.String[] args)
          Simple self test function.
protected  int match(int token)
          Match the token with next token and scan the (new)next token.
protected  void page()
          Scan a page from the block.
 HTMLBlock parse()
          Get a HTMLBlock from the pagepart given.
protected  int scanComment()
          Scan a comment from the block, that is the string up to and including "-->".
protected  int scanQuotedString()
          Scan a quoted tring from the block.
protected  int scanString()
          Scan a String from the block.
 void setText(byte[] page)
          Set the data block to parse.
 void setText(byte[] page, int length)
          Set the data block to parse.
 void setText(java.lang.String page)
          Set the data to parse.
protected  Tag tag(int ltagStart)
          Scan a tag from the block.
protected  java.lang.String value()
          Scan a value from the block.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

pagepart

protected byte[] pagepart
The actual data to parse.


length

protected int length
The size of the data to parse.


nextToken

protected int nextToken
The type of the next token.


index

protected int index
Index of the parse.


tagStart

protected int tagStart
The current tag started here.


stringValue

protected java.lang.String stringValue
The current value as a String.


stringLength

protected int stringLength
the current start of string.


tagmode

protected boolean tagmode
True if were in a Tag, false otherwise.


lastTagStart

protected int lastTagStart
The last tag started here.


block

protected HTMLBlock block
The block we have.


START

public static final int START
This indicates the start of a block.

See Also:
Constant Field Values

STRING

public static final int STRING
This indicate a String value was found.

See Also:
Constant Field Values

SQSTRING

public static final int SQSTRING
This is a Single Quoted String a 'string'

See Also:
Constant Field Values

DQSTRING

public static final int DQSTRING
This is a Double Quoted String a "string"

See Also:
Constant Field Values

SINGELQUOTE

public static final int SINGELQUOTE
This is the character '''

See Also:
Constant Field Values

DOUBLEQUOTE

public static final int DOUBLEQUOTE
This is the character '"'

See Also:
Constant Field Values

LT

public static final int LT
Less Than '<'

See Also:
Constant Field Values

MT

public static final int MT
More Than '>'

See Also:
Constant Field Values

EQUALS

public static final int EQUALS
Equals '='

See Also:
Constant Field Values

COMMENT

public static final int COMMENT
A HTML comment "<!-- some text -->"

See Also:
Constant Field Values

END

public static final int END
This indicates the end of a block.

See Also:
Constant Field Values

UNKNOWN

public static final int UNKNOWN
Unknown token.

See Also:
Constant Field Values
Constructor Detail

HTMLParser

public HTMLParser()
Create a new HTMLParser


HTMLParser

public HTMLParser(byte[] page)
Create a new HTMLParser for the given page.

Parameters:
page - the block to parse.
Method Detail

setText

public void setText(byte[] page)
Set the data block to parse.

Parameters:
page - the block to parse.

setText

public void setText(byte[] page,
                    int length)
Set the data block to parse.

Parameters:
page - the block to parse.
length - the length of the data.

setText

public void setText(java.lang.String page)
Set the data to parse.

Parameters:
page - the block to parse.

getTokenString

protected java.lang.String getTokenString(int token)
Get a String describing the token.

Parameters:
token - the token type (like STRING).
Returns:
a String describing the token (like "STRING")

scanString

protected int scanString()
                  throws HTMLParseException
Scan a String from the block.

Returns:
STRING
Throws:
HTMLParseException - if an error occurs.

scanQuotedString

protected int scanQuotedString()
                        throws HTMLParseException
Scan a quoted tring from the block. The first character is treated as the quotation character.

Returns:
SQSTRING, DQSTRING or UNKNOWN (for strange quotes).
Throws:
HTMLParseException - if an error occurs.

isComment

protected boolean isComment()
Is this tag a comment?

Returns:
true if the block(at current index) starts with !--, false otherwise.

scanComment

protected int scanComment()
                   throws HTMLParseException
Scan a comment from the block, that is the string up to and including "-->".

Returns:
COMMENT or END.
HTMLParseException

match

protected int match(int token)
             throws HTMLParseException
Match the token with next token and scan the (new)next token.

Parameters:
token - the token to match.
Returns:
the next token.
HTMLParseException

value

protected java.lang.String value()
                          throws HTMLParseException
Scan a value from the block.

Returns:
the value or null.
HTMLParseException

arglist

protected void arglist(Tag tag)
                throws HTMLParseException
Scan an argument list from the block.

Parameters:
tag - the Tag that have the arguments.
HTMLParseException

tag

protected Tag tag(int ltagStart)
           throws HTMLParseException
Scan a tag from the block.

Parameters:
ltagStart - the index of the last tag started.
Returns:
the Tag scanned.
HTMLParseException

page

protected void page()
             throws HTMLParseException
Scan a page from the block.

HTMLParseException

parse

public HTMLBlock parse()
                throws HTMLParseException
Get a HTMLBlock from the pagepart given.

HTMLParseException

main

public static void main(java.lang.String[] args)
Simple self test function.