org.xmlpull.v1
Interface XmlPullParser


public interface XmlPullParser

XML Pull Parser is an interface that defines parsing functionlity provided in XMLPULL V1 API (visit this website to learn more about API and its implementations).

There are following different kinds of parser depending on which features are set:

There are only two key methods: next() and nextToken() that provides access to high level parsing events and to lower level tokens.

The parser is always in some event state and type of the current event can be determined by calling getEventType() mehod. Initially parser is in START_DOCUMENT state.

Method next() return int that contains identifier of parsing event. This method can return following events (and will change parser state to the returned event):

START_TAG
XML start tag was read
TEXT
element contents was read and is available via getText()
END_TAG
XML end tag was read
END_DOCUMENT
no more events is available
The minimal working example of use of API would be looking like this:
 import java.io.IOException;
 import java.io.StringReader;

 import org.xmlpull.v1.XmlPullParser;
 import org.xmlpull.v1.XmlPullParserException;
 import org.xmlpull.v1.XmlPullParserFactory;

 public class SimpleXmlPullApp
 {

     public static void main (String args[])
         throws XmlPullParserException, IOException
     {
         XmlPullParserFactory factory = XmlPullParserFactory.newInstance();
         factory.setNamespaceAware(true);
         XmlPullParser xpp = factory.newPullParser();

         xpp.setInput ( new StringReader ( "<foo>Hello World!</foo>" ) );
         int eventType = xpp.getEventType();
         while (eventType != xpp.END_DOCUMENT) {
          if(eventType == xpp.START_DOCUMENT) {
              System.out.println("Start document");
          } else if(eventType == xpp.END_DOCUMENT) {
              System.out.println("End document");
          } else if(eventType == xpp.START_TAG) {
              System.out.println("Start tag "+xpp.getName());
          } else if(eventType == xpp.END_TAG) {
              System.out.println("End tag "+xpp.getName());
          } else if(eventType == xpp.TEXT) {
              System.out.println("Text "+xpp.getText());
          }
          eventType = xpp.next();
         }
     }
 }
 

When run it will produce following output:

 Start document
 Start tag foo
 Text Hello World!
 End tag foo
 

For more details on use of API please read Quick Introduction available at http://www.xmlpull.org

Author:
Stefan Haustein, Aleksander Slominski
See Also:
XmlPullParserFactory, defineEntityReplacementText(java.lang.String, java.lang.String), next(), nextToken(), FEATURE_PROCESS_DOCDECL, FEATURE_VALIDATION, START_DOCUMENT, START_TAG, TEXT, END_TAG, END_DOCUMENT

Field Summary
static byte CDSECT
          TOKEN: CDATA sections was just read (this token is available only from nextToken()).
static int COMMENT
          TOKEN: XML comment was just read and getText() will return value inside comment (this token is available only from nextToken()).
static int DOCDECL
          TOKEN: XML DOCTYPE declaration was just read and getText() will return text that is inside DOCDECL (this token is available only from nextToken()).
static int END_DOCUMENT
          EVENT TYPE and TOKEN: logical end of xml document (available from next() and nextToken()).
static int END_TAG
          EVENT TYPE and TOKEN: end tag was just read (available from next() and nextToken()).
static byte ENTITY_REF
          TOKEN: Entity reference was just read (this token is available only from nextToken()).
static java.lang.String FEATURE_PROCESS_DOCDECL
          FEATURE: Processing of DOCDECL is by default set to false and if DOCDECL is encountered it is reported by nextToken() and ignored by next().
static java.lang.String FEATURE_PROCESS_NAMESPACES
          FEATURE: Processing of namespaces is by default set to false.
static java.lang.String FEATURE_REPORT_NAMESPACE_ATTRIBUTES
          FEATURE: Report namespace attributes also - they can be distinguished looking for prefix == "xmlns" or prefix == null and name == "xmlns it is off by default and only meaningful when FEATURE_PROCESS_NAMESPACES feature is on.
static java.lang.String FEATURE_VALIDATION
          FEATURE: Report all validation errors as defined by XML 1.0 sepcification (implies that FEATURE_PROCESS_DOCDECL is true and both internal and external DOCDECL will be processed).
static byte IGNORABLE_WHITESPACE
          TOKEN: Ignorable whitespace was just read (this token is available only from nextToken()).
static java.lang.String NO_NAMESPACE
          This constant represents lack of or default namespace (empty string "")
static byte PROCESSING_INSTRUCTION
          TOKEN: XML processing instruction declaration was just read and getText() will return text that is inside processing instruction (this token is available only from nextToken()).
static int START_DOCUMENT
          EVENT TYPE and TOKEN: signalize that parser is at the very beginning of the document and nothing was read yet - the parser is before first call to next() or nextToken() (available from next() and nextToken()).
static int START_TAG
          EVENT TYPE and TOKEN: start tag was just read (available from next() and nextToken()).
static int TEXT
          EVENT TYPE and TOKEN: character data was read and will be available by call to getText() (available from next() and nextToken()).
static java.lang.String[] TYPES
          Use this array to convert evebt type number (such as START_TAG) to to string giving event name, ex: "START_TAG" == TYPES[START_TAG]
 
Method Summary
 void defineEntityReplacementText(java.lang.String entityName, java.lang.String replacementText)
          Set new value for entity replacement text as defined in XML 1.0 Section 4.5 Construction of Internal Entity Replacement Text.
 int getAttributeCount()
          Returns the number of attributes on the current element; -1 if the current event is not START_TAG
 java.lang.String getAttributeName(int index)
          Returns the local name of the specified attribute if namespaces are enabled or just attribute name if namespaces are disabled.
 java.lang.String getAttributeNamespace(int index)
          Returns the namespace URI of the specified attribute number index (starts from 0).
 java.lang.String getAttributePrefix(int index)
          Returns the prefix of the specified attribute Returns null if the element has no prefix.
 java.lang.String getAttributeValue(int index)
          Returns the given attributes value Throws an IndexOutOfBoundsException if the index is out of range or current event type is not START_TAG.
 java.lang.String getAttributeValue(java.lang.String namespace, java.lang.String name)
          Returns the attributes value identified by namespace URI and namespace localName.
 int getColumnNumber()
          Current column: numbering starts from 0 (returned when parser is in START_DOCUMENT state!)
 int getDepth()
          Returns the current depth of the element.
 int getEventType()
          Returns the type of the current event (START_TAG, END_TAG, TEXT, etc.)
 boolean getFeature(java.lang.String name)
          Return the current value of the feature with given name.
 int getLineNumber()
          Current line number: numebering starts from 1.
 java.lang.String getName()
          Returns the (local) name of the current element when namespaces are enabled or raw name when namespaces are disabled.
 java.lang.String getNamespace()
          Returns the namespace URI of the current element.
 java.lang.String getNamespace(java.lang.String prefix)
          Return uri for the given prefix.
 int getNamespaceCount(int depth)
          Return position in stack of first namespace slot for element at passed depth.
 java.lang.String getNamespacePrefix(int pos)
          Return namespace prefixes for position pos in namespace stack
 java.lang.String getNamespaceUri(int pos)
          Return namespace URIs for position pos in namespace stack If pos is out of range it throw exception.
 java.lang.String getPositionDescription()
          Short text describing parser position, including a description of the current event and data source if known and if possible what parser was seeing lastly in input.
 java.lang.String getPrefix()
          Returns the prefix of the current element or null if elemet has no prefix (is in defualt namespace).
 java.lang.Object getProperty(java.lang.String name)
          Look up the value of a property.
 java.lang.String getText()
          Read text content of the current event as String.
 char[] getTextCharacters(int[] holderForStartAndLength)
          Get the buffer that contains text of the current event and start offset of text is passed in first slot of input int array and its length is in second slot.
 boolean isEmptyElementTag()
          Returns true if the current event is START_TAG and the tag is degenerated (e.g.
 boolean isWhitespace()
          Check if current TEXT event contains only whitespace characters.
 int next()
          Get next parsing event - element content wil be coalesced and only one TEXT event must be returned for whole element content (comments and processing instructions will be ignored and emtity references must be expanded or exception mus be thrown if entity reerence can not be exapnded).
 int nextToken()
          This method works similarly to next() but will expose additional event types (COMMENT, CDSECT, DOCDECL, ENTITY_REF, PROCESSING_INSTRUCTION, or IGNORABLE_WHITESPACE) if they are available in input.
 java.lang.String readText()
          If the current event is text, the value of getText is returned and next() is called.
 void require(int type, java.lang.String namespace, java.lang.String name)
          test if the current event is of the given type and if the namespace and name do match.
 void setFeature(java.lang.String name, boolean state)
          Use this call to change the general behaviour of the parser, such as namespace processing or doctype declaration handling.
 void setInput(java.io.Reader in)
          Set the input for parser.
 void setProperty(java.lang.String name, java.lang.Object value)
          Set the value of a property.
 

Field Detail

NO_NAMESPACE

public static final java.lang.String NO_NAMESPACE
This constant represents lack of or default namespace (empty string "")

START_DOCUMENT

public static final int START_DOCUMENT
EVENT TYPE and TOKEN: signalize that parser is at the very beginning of the document and nothing was read yet - the parser is before first call to next() or nextToken() (available from next() and nextToken()).
See Also:
next(), nextToken()

END_DOCUMENT

public static final int END_DOCUMENT
EVENT TYPE and TOKEN: logical end of xml document (available from next() and nextToken()).

NOTE: calling again next() or nextToken() will result in exception being thrown.

See Also:
next(), nextToken()

START_TAG

public static final int START_TAG
EVENT TYPE and TOKEN: start tag was just read (available from next() and nextToken()). The name of start tag is available from getName(), its namespace and prefix are available from getNamespace() and getPrefix() if namespaces are enabled. See getAttribute* methods to retrieve element attributes. See getNamespace* methods to retrieve newly declared namespaces.
See Also:
next(), nextToken(), getName(), getPrefix(), getNamespace(java.lang.String), getAttributeCount(), getDepth(), getNamespaceCount(int), getNamespace(java.lang.String), FEATURE_PROCESS_NAMESPACES

END_TAG

public static final int END_TAG
EVENT TYPE and TOKEN: end tag was just read (available from next() and nextToken()). The name of start tag is available from getName(), its namespace and prefix are available from getNamespace() and getPrefix()
See Also:
next(), nextToken(), getName(), getPrefix(), getNamespace(java.lang.String), FEATURE_PROCESS_NAMESPACES

TEXT

public static final int TEXT
EVENT TYPE and TOKEN: character data was read and will be available by call to getText() (available from next() and nextToken()).

NOTE: next() will (in contrast to nextToken ()) accumulate multiple events into one TEXT event, skipping IGNORABLE_WHITESPACE, PROCESSING_INSTRUCTION and COMMENT events.

NOTE: if state was reached by calling next() the text value will be normalized and if the token was returned by nextToken() then getText() will return unnormalized content (no end-of-line normalization - it is content exactly as in input XML)

See Also:
next(), nextToken(), getText()

CDSECT

public static final byte CDSECT
TOKEN: CDATA sections was just read (this token is available only from nextToken()). The value of text inside CDATA section is available by callling getText().
See Also:
nextToken(), getText()

ENTITY_REF

public static final byte ENTITY_REF
TOKEN: Entity reference was just read (this token is available only from nextToken()). The entity name is available by calling getText() and it is user responsibility to resolve entity reference.
See Also:
nextToken(), getText()

IGNORABLE_WHITESPACE

public static final byte IGNORABLE_WHITESPACE
TOKEN: Ignorable whitespace was just read (this token is available only from nextToken()). For non-validating parsers, this event is only reported by nextToken() when outside the root elment. Validating parsers may be able to detect ignorable whitespace at other locations. The value of ignorable whitespace is available by calling getText()

NOTE: this is different than callinf isWhitespace() method as element content may be whitespace but may not be ignorable whitespace.

See Also:
nextToken(), getText()

PROCESSING_INSTRUCTION

public static final byte PROCESSING_INSTRUCTION
TOKEN: XML processing instruction declaration was just read and getText() will return text that is inside processing instruction (this token is available only from nextToken()).
See Also:
nextToken(), getText()

COMMENT

public static final int COMMENT
TOKEN: XML comment was just read and getText() will return value inside comment (this token is available only from nextToken()).
See Also:
nextToken(), getText()

DOCDECL

public static final int DOCDECL
TOKEN: XML DOCTYPE declaration was just read and getText() will return text that is inside DOCDECL (this token is available only from nextToken()).
See Also:
nextToken(), getText()

TYPES

public static final java.lang.String[] TYPES
Use this array to convert evebt type number (such as START_TAG) to to string giving event name, ex: "START_TAG" == TYPES[START_TAG]

FEATURE_PROCESS_NAMESPACES

public static final java.lang.String FEATURE_PROCESS_NAMESPACES
FEATURE: Processing of namespaces is by default set to false.

NOTE: can not be changed during parsing!

See Also:
getFeature(java.lang.String), setFeature(java.lang.String, boolean)

FEATURE_REPORT_NAMESPACE_ATTRIBUTES

public static final java.lang.String FEATURE_REPORT_NAMESPACE_ATTRIBUTES
FEATURE: Report namespace attributes also - they can be distinguished looking for prefix == "xmlns" or prefix == null and name == "xmlns it is off by default and only meaningful when FEATURE_PROCESS_NAMESPACES feature is on.

NOTE: can not be changed during parsing!

See Also:
getFeature(java.lang.String), setFeature(java.lang.String, boolean)

FEATURE_PROCESS_DOCDECL

public static final java.lang.String FEATURE_PROCESS_DOCDECL
FEATURE: Processing of DOCDECL is by default set to false and if DOCDECL is encountered it is reported by nextToken() and ignored by next(). If processing is set to true then DOCDECL must be processed by parser.

NOTE: if the DOCDECL was ignored further in parsing there may be fatal exception when undeclared entity is encountered!

NOTE: can not be changed during parsing!

See Also:
getFeature(java.lang.String), setFeature(java.lang.String, boolean)

FEATURE_VALIDATION

public static final java.lang.String FEATURE_VALIDATION
FEATURE: Report all validation errors as defined by XML 1.0 sepcification (implies that FEATURE_PROCESS_DOCDECL is true and both internal and external DOCDECL will be processed).

NOTE: can not be changed during parsing!

See Also:
getFeature(java.lang.String), setFeature(java.lang.String, boolean)
Method Detail

setFeature

public void setFeature(java.lang.String name,
                       boolean state)
                throws XmlPullParserException
Use this call to change the general behaviour of the parser, such as namespace processing or doctype declaration handling. This method must be called before the first call to next or nextToken. Otherwise, an exception is trown.

Example: call setFeature(FEATURE_PROCESS_NAMESPACES, true) in order to switch on namespace processing. Default settings correspond to properties requested from the XML Pull Parser factory (if none were requested then all feautures are by default false).

Throws:
XmlPullParserException - if feature is not supported or can not be set
java.lang.IllegalArgumentException - if feature string is null

getFeature

public boolean getFeature(java.lang.String name)
Return the current value of the feature with given name.

NOTE: unknown features are always returned as false

Parameters:
name - The name of feature to be retrieved.
Returns:
The value of named feature.
Throws:
java.lang.IllegalArgumentException - if feature string is null

setProperty

public void setProperty(java.lang.String name,
                        java.lang.Object value)
                 throws XmlPullParserException
Set the value of a property. The property name is any fully-qualified URI.

getProperty

public java.lang.Object getProperty(java.lang.String name)
Look up the value of a property. The property name is any fully-qualified URI. I

NOTE: unknown features are always returned as null

Parameters:
name - The name of property to be retrieved.
Returns:
The value of named property.

setInput

public void setInput(java.io.Reader in)
              throws XmlPullParserException
Set the input for parser. Parser event state is set to START_DOCUMENT. Using null parameter will stop parsing and reset parser state allowing parser to free internal resources (such as parsing buffers).

defineEntityReplacementText

public void defineEntityReplacementText(java.lang.String entityName,
                                        java.lang.String replacementText)
                                 throws XmlPullParserException
Set new value for entity replacement text as defined in XML 1.0 Section 4.5 Construction of Internal Entity Replacement Text. If FEATURE_PROCESS_DOCDECL or FEATURE_VALIDATION are set then calling this function will reulst in exception because when processing of DOCDECL is enabled there is no need to set manually entity replacement text.

The motivation for this function is to allow very small implementations of XMLPULL that will work in J2ME environments and though may not be able to process DOCDECL but still can be made to work with predefined DTDs by using this function to define well known in advance entities. Additionally as XML Schemas are replacing DTDs by allowing parsers not to process DTDs it is possible to create more efficient parser implementations that can be used as underlying layer to do XML schemas validation.

NOTE: this is replacement text and it is not allowed to contain any other entity reference

NOTE: list of pre-defined entites will always contain standard XML entities (such as &amp; &lt; &gt; &quot; &apos;) and they cannot be replaced!

See Also:
setInput(java.io.Reader), FEATURE_PROCESS_DOCDECL, FEATURE_VALIDATION

getNamespaceCount

public int getNamespaceCount(int depth)
                      throws XmlPullParserException
Return position in stack of first namespace slot for element at passed depth. If namespaces are not enabled it returns always 0.

NOTE: default namespace is not included in namespace table but available by getNamespace() and not available from getNamespace(String)

See Also:
getNamespacePrefix(int), getNamespaceUri(int), getNamespace(), getNamespace(String)

getNamespacePrefix

public java.lang.String getNamespacePrefix(int pos)
                                    throws XmlPullParserException
Return namespace prefixes for position pos in namespace stack

getNamespaceUri

public java.lang.String getNamespaceUri(int pos)
                                 throws XmlPullParserException
Return namespace URIs for position pos in namespace stack If pos is out of range it throw exception.

getNamespace

public java.lang.String getNamespace(java.lang.String prefix)
                              throws XmlPullParserException
Return uri for the given prefix. It is depending on current state of parser to find what namespace uri is mapped from namespace prefix. For example for 'xsi' if xsi namespace prefix was declared to 'urn:foo' it will return 'urn:foo'.

It will return null if namespace could not be found.

Convenience method for

  for (int i = getNamespaceCount (getDepth ())-1; i >= 0; i--) {
   if (getNamespacePrefix (i).equals (prefix)) {
     return getNamespaceUri (i);
   }
  }
  return null;
 

However parser implementation can be more efficient about.

See Also:
getNamespaceCount(int), getNamespacePrefix(int), getNamespaceUri(int)

getDepth

public int getDepth()
Returns the current depth of the element. Outside the root element, the depth is 0. The depth is incremented by 1 when a start tag is reached. The depth is decremented AFTER the end tag event was observed.
 <!-- outside -->     0
 <root>               1
   sometext           1
     <foobar>         2
     </foobar>        2
 </root>              1
 <!-- outside -->     0
 </pre>
 

getPositionDescription

public java.lang.String getPositionDescription()
Short text describing parser position, including a description of the current event and data source if known and if possible what parser was seeing lastly in input. This method is especially useful to give more meaningful error messages.

getLineNumber

public int getLineNumber()
Current line number: numebering starts from 1.

getColumnNumber

public int getColumnNumber()
Current column: numbering starts from 0 (returned when parser is in START_DOCUMENT state!)

isWhitespace

public boolean isWhitespace()
                     throws XmlPullParserException
Check if current TEXT event contains only whitespace characters. For IGNORABLE_WHITESPACE, this is always true. For TEXT and CDSECT if the current event text contains at lease one non white space character then false is returned. For any other event type exception is thrown.

NOTE: non-validating parsers are not able to distinguish whitespace and ignorable whitespace except from whitespace outside the root element. ignorable whitespace is reported as separate event which is exposed via nextToken only.

NOTE: this function can be only called for element content related events such as TEXT, CDSECT or IGNORABLE_WHITESPACE otherwise exception will be thrown!


getText

public java.lang.String getText()
Read text content of the current event as String.

getTextCharacters

public char[] getTextCharacters(int[] holderForStartAndLength)
Get the buffer that contains text of the current event and start offset of text is passed in first slot of input int array and its length is in second slot.

NOTE: this buffer must not be modified and its content MAY change after call to next() or nextToken().

NOTE: this methid must return always the same value as getText() and if getText() returns null then this methid returns null as well and values returned in holder MUST be -1 (both start and length).

Parameters:
holderForStartAndLength - the 2-element int array into which values of start offset and length will be written into frist and second slot of array.
Returns:
char buffer that contains text of current event or null if the current event has no text associated.
See Also:
getText()

getNamespace

public java.lang.String getNamespace()
Returns the namespace URI of the current element. If namespaces are NOT enabled, an empty String ("") always is returned. The current event must be START_TAG or END_TAG, otherwise, null is returned.

getName

public java.lang.String getName()
Returns the (local) name of the current element when namespaces are enabled or raw name when namespaces are disabled. The current event must be START_TAG or END_TAG, otherwise null is returned.

NOTE: to reconstruct raw element name when namespaces are enabled you will need to add prefix and colon to localName if prefix is not null.


getPrefix

public java.lang.String getPrefix()
Returns the prefix of the current element or null if elemet has no prefix (is in defualt namespace). If namespaces are not enabled it always returns null. If the current event is not START_TAG or END_TAG the null value is returned.

isEmptyElementTag

public boolean isEmptyElementTag()
                          throws XmlPullParserException
Returns true if the current event is START_TAG and the tag is degenerated (e.g. <foobar/>).

NOTE: if parser is not on START_TAG then the exception will be thrown.


getAttributeCount

public int getAttributeCount()
Returns the number of attributes on the current element; -1 if the current event is not START_TAG
See Also:
getAttributeNamespace(int), getAttributeName(int), getAttributePrefix(int), getAttributeValue(int)

getAttributeNamespace

public java.lang.String getAttributeNamespace(int index)
Returns the namespace URI of the specified attribute number index (starts from 0). Returns empty string ("") if namespaces are not enabled or attribute has no namespace. Throws an IndexOutOfBoundsException if the index is out of range or current event type is not START_TAG.

NOTE:

if FEATURE_REPORT_NAMESPACE_ATTRIBUTES is set then namespace attributes (xmlns:ns='...') amust be reported with namespace http://www.w3.org/2000/xmlns/ (visit this URL for description!). The default namespace attribute (xmlns="...") will be reported with empty namespace. Then xml prefix is bound as defined in Namespaces in XML specification to "http://www.w3.org/XML/1998/namespace".
Parameters:
zero - based index of attribute
Returns:
attribute namespace or "" if namesapces processing is not enabled.

getAttributeName

public java.lang.String getAttributeName(int index)
Returns the local name of the specified attribute if namespaces are enabled or just attribute name if namespaces are disabled. Throws an IndexOutOfBoundsException if the index is out of range or current event type is not START_TAG.
Parameters:
zero - based index of attribute
Returns:
attribute names

getAttributePrefix

public java.lang.String getAttributePrefix(int index)
Returns the prefix of the specified attribute Returns null if the element has no prefix. If namespaces are disabled it will always return null. Throws an IndexOutOfBoundsException if the index is out of range or current event type is not START_TAG.
Parameters:
zero - based index of attribute
Returns:
attribute prefix or null if namesapces processing is not enabled.

getAttributeValue

public java.lang.String getAttributeValue(int index)
Returns the given attributes value Throws an IndexOutOfBoundsException if the index is out of range or current event type is not START_TAG.
Parameters:
zero - based index of attribute
Returns:
value of attribute

getAttributeValue

public java.lang.String getAttributeValue(java.lang.String namespace,
                                          java.lang.String name)
Returns the attributes value identified by namespace URI and namespace localName. If namespaces are disbaled namespace must be null. If current event type is not START_TAG then IndexOutOfBoundsException will be thrown.
Parameters:
namespace - Namespace of the attribute if namespaces are enabled otherwise must be null
name - If namespaces enabled local name of attribute otherwise just attribute name
Returns:
value of attribute

getEventType

public int getEventType()
                 throws XmlPullParserException
Returns the type of the current event (START_TAG, END_TAG, TEXT, etc.)
See Also:
next(), nextToken()

next

public int next()
         throws XmlPullParserException,
                java.io.IOException
Get next parsing event - element content wil be coalesced and only one TEXT event must be returned for whole element content (comments and processing instructions will be ignored and emtity references must be expanded or exception mus be thrown if entity reerence can not be exapnded). If element content is empty (content is "") then no TEXT event will be reported.

NOTE: empty element (such as <tag/>) will be reported with two separate events: START_TAG, END_TAG - it must be so to preserve parsing equivalency of empty element to <tag></tag>. (see isEmptyElementTag ())

See Also:
isEmptyElementTag(), START_TAG, TEXT, END_TAG, END_DOCUMENT

nextToken

public int nextToken()
              throws XmlPullParserException,
                     java.io.IOException
This method works similarly to next() but will expose additional event types (COMMENT, CDSECT, DOCDECL, ENTITY_REF, PROCESSING_INSTRUCTION, or IGNORABLE_WHITESPACE) if they are available in input.

If special feature FEATURE_XML_ROUNDTRIP (identified by URI: http://xmlpull.org/v1/doc/features.html#xml-roundtrip) is true then it is possible to do XML document round trip ie. reproduce exectly on output the XML input using getText().

Here is the list of tokens that can be returned from nextToken() and what getText() and getTextCharacters() returns:

START_DOCUMENT
null
END_DOCUMENT
null
START_TAG
null unless FEATURE_XML_ROUNDTRIP enabled and then returns XML tag, ex: <tag attr='val'>
END_TAG
null unless FEATURE_XML_ROUNDTRIP enabled and then returns XML tag, ex: </tag>
TEXT
return unnormalized element content
IGNORABLE_WHITESPACE
return unnormalized characters
CDSECT
return unnormalized text inside CDATA ex. 'fo<o' from <!CDATA[fo<o]]>
PROCESSING_INSTRUCTION
return unnormalized PI content ex: 'pi foo' from <?pi foo?>
COMMENT
return comment content ex. 'foo bar' from <!--foo bar-->
ENTITY_REF
return unnormalized text of entity_name (&entity_name;)
NOTE: it is user responsibility to resolve entity reference
NOTE: character entities and standard entities such as &amp; &lt; &gt; &quot; &apos; are reported as well and are not resolved and not reported as TEXT tokens! This requirement is added to allow to do roundtrip of XML documents!
DOCDECL
return inside part of DOCDECL ex. returns:
 " titlepage SYSTEM "http://www.foo.bar/dtds/typo.dtd"
 [<!ENTITY % active.links "INCLUDE">]"

for input document that contained:

 <!DOCTYPE titlepage SYSTEM "http://www.foo.bar/dtds/typo.dtd"
 [<!ENTITY % active.links "INCLUDE">]>

NOTE: returned text of token is not end-of-line normalized.

See Also:
next(), START_TAG, TEXT, END_TAG, END_DOCUMENT, COMMENT, DOCDECL, PROCESSING_INSTRUCTION, ENTITY_REF, IGNORABLE_WHITESPACE

require

public void require(int type,
                    java.lang.String namespace,
                    java.lang.String name)
             throws XmlPullParserException,
                    java.io.IOException
test if the current event is of the given type and if the namespace and name do match. null will match any namespace and any name. If the current event is TEXT with isWhitespace()= true, and the required type is not TEXT, next () is called prior to the test. If the test is not passed, an exception is thrown. The exception text indicates the parser position, the expected event and the current event (not meeting the requirement.

essentially it does this

  if (getEventType() == TEXT && type != TEXT && isWhitespace ())
    next ();

  if (type != getEventType()
  || (namespace != null && !namespace.equals (getNamespace ()))
  || (name != null && !name.equals (getName ()))
     throw new XmlPullParserException ( "expected "+ TYPES[ type ]+getPositionDesctiption());
 

readText

public java.lang.String readText()
                          throws XmlPullParserException,
                                 java.io.IOException
If the current event is text, the value of getText is returned and next() is called. Otherwise, an empty String ("") is returned. Useful for reading element content without needing to performing an additional check if the element is empty.

essentially it does this

   if (getEventType != TEXT) return ""
   String result = getText ();
   next ();
   return result;
 


This XMLPULL V1 API is free, enjoy! http://www.xmlpull.org/