Parsing an XML document with the DOM API
In this section we shall parse an XML document (the XML document that was created in the previous section) with a DOM parser. DOM parsing creates an in-memory tree-like structure of an XML document, which may be navigated with the DOM API. We shall iterate over the XML document parsed, and output elements and attribute node values.
The DOM parsing API classes are in the oracle.xml.parser.v2
package and the DOM parser factory and parser classes are in the oracle.xml.jaxp
package. First, import these packages into the DOMParserApp.java
class in JDeveloper:
import oracle.xml.jaxp.*; import oracle.xml.parser.v2.*;
Creating the factory
Create a JXDcoumentBuilderFactory
object with the static method newInstance()
. The factory object is used to obtain a parser that may be used to create a DOM document tree from an XML document:
JXDocumentBuilderFactory factory = (JXDocumentBuilderFactory) JXDocumentBuilderFactory.newInstance();
Set the ERROR_STREAM
and SHOW_WARNINGS
attributes on the factory object with the setAttribute()
method. The ERROR_STREAM
attribute specifies the error stream, while the SHOW_WARNINGS
attribute specifies if warnings are to be shown. The value of the ERROR_STREAM
attribute is an OutputStream
object or a PrintWriter
object. The value of the SHOW_WARNINGS
attribute is a Boolean, which can be set to Boolean.TRUE
or Boolean.FALSE
. With the OutputStream
or PrintWriter
specified in the ERROR_STREAM
attribute, parsing errors (if any) get outputted to the specified file. If ErrorHandler
is also set, ERROR_STREAM
is not used. The SHOW_WARNINGS
attribute outputs warnings also:
factory.setAttribute(JXDocumentBuilderFactory.ERROR_STREAM, new FileOutputStream(new File("c:/output/errorStream.txt"))); factory.setAttribute(JXDocumentBuilderFactory.SHOW_WARNINGS, Boolean.TRUE);
Creating a DOM document object
Create a JXDocumentBuilder
object from the factory object by first creating a DocumentBuilder
object with newDocumentBuilder()
method and subsequently casting the DocumentBuilder
object to JXDocumentBuilder. JXDocumentBuilder
is the implementation class in Oracle XDK 11g for the abstract class DocumentBuilder:
JXDocumentBuilder documentBuilder = (JXDocumentBuilder) factory.newDocumentBuilder();
The JXDocumentBuilder
object is used to create a DOM document object from an XML document. A Document
object may be obtained using the JXDocumentBuilder
object with one of the parse()
methods in the JXDocumentBuilder
class. The input to the parser may be specified as InputSource, InputStream, File object
, or a String
URI. Create an InputStream
for the example XML document and parse the document with the parse(InputStream)
method:
InputStream input = new FileInputStream(new File("catalog.xml")); XMLDocument xmlDocument = (XMLDocument) (documentBuilder.parse(input));
The parse()
methods of the JXDocumentBuilder
object return a Document
object, which may be cast to an XMLDocument
object, as the XMLDocument
class implements the Document
interface.
Outputting the XML document components' values
Output the encoding in the XML document using the getEncoding
method, and output the version of the XML document using the getVersion
method:
System.out.println("Encoding: " + xmlDocument.getEncoding()); System.out.println("Version: " + xmlDocument.getVersion());
The XMLDocument
class has various getter methods to retrieve elements in a document. Some of these methods are listed in the following table:
Method Name |
Description |
---|---|
|
Returns the root element. |
|
Returns element for a specified ID. An element that has an ID attribute may be retrieved using this method. An attribute named "id" is not necessarily an ID attribute. An ID attribute is defined in an XML Schema with the xs:ID type and in a DTD with ID attribute type. |
|
Returns a NodeList of elements for a specified tag name. The elements are returned in the order defined in the DOM tree. All the elements of the specified tag name are returned, not just the top-level elements. If the tag name is specified as "*", all the elements in the document are returned. |
|
Returns a NodeList of elements for a specified namespace URI and local name. |
As an example, retrieve title
elements in the namespace http://xdk.com/catalog/journal using the getElementsByTagNameNS
method:
NodeList namespaceNodeList = xmlDocument.getElementsByTagNameNS("http://xdk.com/catalog/journal","title");
Iterate over the NodeList
to output element namespace, element namespace prefix, element tag name, and element text. The getNamespaceURI()
method returns the namespace URI of an element. The getPrefix()
method returns the prefix of an element in a namespace. The getTagName()
method returns the element tag name. Element text is obtained by first obtaining the text node within the element node using the getFirstChild()
method and subsequently the value of the text node:
for (int i = 0; i < namespaceNodeList.getLength(); i++) { XMLElement namespaceElement = (XMLElement) namespaceNodeList.item(i); System.out.println("Namespace URI: " + namespaceElement.getNamespaceURI()); System.out.println("Namespace Prefix: " + namespaceElement.getPrefix()); System.out.println("Element Name: " + namespaceElement.getTagName()); System.out.println("Element text: " + namespaceElement.getFirstChild().getNodeValue()); }
Obtain the root element in the XML document with the getDocumentElement()
method. The getDocumentElement
method returns an Element
object that may be cast to an XMLElement
object if any of the methods defined only in the XMLElement
class are to be used. The Element
object is not required to be cast to an XMLElement
object. We have cast the Element
object to XMLElement
as XMLElement
is Oracle XDK 11g's implementation class for the Element
interface, and we are discussing Oracle XDK 11g:
XMLElement rootElement = (XMLElement) (xmlDocument.getDocumentElement()); System.out.println("Root Element is: " + rootElement.getTagName());
Next, we shall iterate over all the subnodes of the root element. Obtain a NodeList
of subnodes of the root element with the getChildNodes()
method. Create a method iterateNodeList()
to iterate over the subnodes of an Element
. Iterate over the NodeList
and recursively obtain the subelements of the elements in the NodeList
. The method hasChildNodes()
tests to see if a node has subnodes. Ignorable whitespace is also considered a node, but we are mainly interested in the subelements in a node. The NodeList
interface method getLength()
returns the length of a node list, and method item(int)
returns the Node
at a specified index. As class XMLNode
is Oracle XDK 11g's implementation class for the Node
interface, cast the Node
object to XMLNode:
if (rootElement.hasChildNodes()) { NodeList nodeList = rootElement.getChildNodes(); iterateNodeList(rootElement, nodeList); }
If a node is of type element, the tag name of the element may be retrieved. Node type is obtained with the getNodeType()
method, which returns a short
value. The Node
interface provides static fields for different types of nodes. The different types of nodes in an XML document are listed in the following table:
Node Type |
Description |
---|---|
ELEMENT_NODE |
Element node. |
ATTRIBUTE_NODE |
Attribute node. |
TEXT_NODE |
Text node, for example the text in an element such as |
CDATA_SECTION_NODE |
CDATA section node. We discussed a CDATA section in an earlier table. |
ENTITY_REFERENCE_NODE |
Entity reference node. An entity reference refers to the content of a named entity. |
ENTITY_NODE |
Entity node. An entity is defined in a DOCTYPE declaration or an external DTD, and represents an abbreviation for data that is to be used repeatedly. |
PROCESSING_INSTRUCTION_NODE |
Processing Instruction node. We discussed a processing instruction in an earlier section. |
COMMENT_NODE |
Comment node. We discussed a comment node in an earlier section. |
DOCUMENT_NODE |
Document node. The document node represents the complete DOM document tree. |
DOCUMENT_TYPE_NODE |
Doctype node represents the DOCTYPE declaration. |
DOCUMENT_FRAGMENT_NODE |
DocumentFragment node. A document fragment is a segment of a document. |
NOTATION_NODE |
Notation node. A notation is defined in a DOCTYPE declaration or an external DTD. Notations represent the format of unparsed entities (non-XML data that a parser does not parse), format of elements with a notation attribute, and the application to which a processing instruction is sent. An example of a notation is as follows:
|
For an element node, cast the node to XMLElement
and output the element tag name:
if (node.getNodeType() == XMLNode.ELEMENT_NODE) { XMLElement element = (XMLElement) node; System.out.println("Element Tag Name:"+ element.getTagName)) }
The attributes in a element node are retrieved with the getAttributes()
method, which returns a NamedNodeMap
of attributes. The getLength()
method of NamedNodeMap
returns the length of an attribute node list. The method item(int)
returns an Attr
object for the attribute at the specified index. As class XMLAttr
implements the Attr
interface, cast the Attr
object to XMLAttr
. Iterate over the NamedNodeMap
to output the attribute name and value. The hasAttributes()
method tests if an element node has attributes:
if (element.hasAttributes()) { NamedNodeMap attributes = element.getAttributes(); for (int i = 0; i < attributes.getLength(); i++) { XMLAttr attribute = (XMLAttr)attributes.item(i); System.out.println(" Attribute: " + attribute.getName() + " with value " +attribute.getValue()); } }
Running the Java application
The complete DOMParserApp.java
Java application code listing is listed as follows with notes about the different sections in the Java class:
1. First, we add the
package
andimport
statements.package xmlparser; import java.io.*; import oracle.xml.jaxp.*; import oracle.xml.parser.v2.*; import javax.xml.parsers.ParserConfigurationException; import org.w3c.dom.*; import org.xml.sax.SAXException;
2. Next, we add Java class
DOMParserApp
.public class DOMParserApp {
3. Then, we add the
parseXMLDocument
method to parse an XML document.public void parseXMLDocument() { try {
4. Now, we create the
XMLDocument
object by parsing the XML documentcatalog.xml
.JXDocumentBuilderFactory factory = (JXDocumentBuilderFactory) JXDocumentBuilderFactory.newInstance(); factory.setAttribute(JXDocumentBuilderFactory.ERROR_STREAM, new FileOutputStream(new File("c:/output/errorStream.txt"))); factory.setAttribute(JXDocumentBuilderFactory.SHOW_WARNINGS, Boolean.TRUE); JXDocumentBuilder documentBuilder = (JXDocumentBuilder) factory.newDocumentBuilder(); InputStream input = new FileInputStream(new File("catalog.xml")); XMLDocument xmlDocument = (XMLDocument)(documentBuilder.parse(input));
5. Here, we output the document character encoding, the XML version, and namespace node values from the parsed XML document.
System.out.println("Encoding: " + xmlDocument.getEncoding()); System.out.println("Version: " + xmlDocument.getVersion()); NodeList namespaceNodeList = xmlDocument.getElementsByTagNameNS ("http://xdk.com/catalog/journal", "title"); for (int i = 0; i < namespaceNodeList.getLength(); i++) { XMLElement namespaceElement = (XMLElement)namespaceNodeList.item(i); System.out.println("Namespace Prefix: " + namespaceElement. getNamespaceURI()); System.out.println("Namespace URI: " + namespaceElement. getPrefix()); System.out.println("Element Name: " + namespaceElement. getTagName()); System.out.println("Element text: " + namespaceElement.getFirstChild().getNodeValue()); }
6. Next, we obtain the subnodes of the root element and invoke the
iterateNodeList
method to iterate over the subnodes.XMLElement rootElement = (XMLElement)(xmlDocument.getDocumentElement()); System.out.println("Root Element is: " + rootElement.getTagName()); if (rootElement.hasChildNodes()) { NodeList nodeList = rootElement.getChildNodes(); iterateNodeList(rootElement, nodeList); } } catch (ParserConfigurationException e) { System.err.println(e.getMessage()); } catch (FileNotFoundException e) { System.err.println(e.getMessage()); } catch (IOException e) { System.err.println(e.getMessage()); } catch (SAXException e) { System.err.println(e.getMessage()); } }
7. The
iterateNodeList
method has anElement
parameter, which represents the element with subnodes. The second parameter is of the typeNodeList
, which is theNodeList
of subnodes of theElement
represented by the first parameter.public void iterateNodeList(Element elem, NodeList nodeList) { if (nodeList.getLength() > 1) { System.out.println("Element " + elem.getTagName() + " has sub-elements\n"); }
8. Iterate over the
NodeList
.for (int i = 0; i < nodeList.getLength(); i++) { XMLNode node = (XMLNode)nodeList.item(i);
9. If a node is of type
Element
, output theElement
tag name and element text.if (node.getNodeType() == XMLNode.ELEMENT_NODE) { XMLElement element = (XMLElement)node; System.out.println("Sub-element of " + elem.getNodeName()); System.out.println("Element Tag Name:" + element.getTagName()); System.out.println("Element text: " + element.getFirstChild().getNodeValue());
10. If an
Element
has attributes, output the attributes.if (element.hasAttributes()) { System.out.println("Element has attributes\n"); NamedNodeMap attributes = element.getAttributes(); for (int j = 0; j < attributes.getLength(); j++) { XMLAttr attribute = (XMLAttr)attributes.item(j); System.out.println("Attribute: " +attribute.getName() + " with value "+ attribute.getValue()); } }
11. If an
Element
has subnodes, obtain theNodeList
of subnodes and iterate over theNodeList
by invoking theiterateNodeList
method again.if (element.hasChildNodes()) { iterateNodeList(element, element.getChildNodes()); } } } }
12. Finally, we add the
main
method. In themain
method, we create an instance of theDOMParserApp
class and invoke theparseXMLDocument
method.public static void main(String[] argv) { DOMParserApp domParser = new DOMParserApp(); domParser.parseXMLDocument(); } }
13. To run the
DOMParserApp.java
in JDeveloper, right-click on the DOMParserApp.java node in Application Navigator and select Run.14. The element and attribute values from the XML document get outputted.
The complete output from the DOM parsing application is as follows:
Encoding: UTF-8 Version: 1.0 Namespace Prefix: http://xdk.com/catalog/journal Namespace URI: journal Element Name: journal:title Element text: Declarative Data Filtering Root Element is: catalog Element catalog has sub-elements Sub-element of catalog Element Tag Name:journal:journal Element text: Element has attributes Attribute: journal:title with value Oracle Magazine Attribute: journal:publisher with value Oracle Publishing Attribute: journal:edition with value March-April 2008 Attribute: xmlns:journal with value http://xdk.com/catalog/journal Element journal:journal has sub-elements Sub-element of journal:journal Element Tag Name:journal:article Element text: Element has attributes Attribute: journal:section with value Oracle Developer Element journal:article has sub-elements Sub-element of journal:article Element Tag Name:journal:title Element text: Declarative Data Filtering Sub-element of journal:article Element Tag Name:journal:author Element text: Steve Muench Sub-element of catalog Element Tag Name:journal Element text: XML document parsing, DOM API usedDOM parsing application outputElement has attributes Attribute: title with value Oracle Magazine Attribute: publisher with value Oracle Publishing Attribute: edition with value September-October 2008 Element journal has sub-elements Sub-element of journal Element Tag Name:article Element text: Element has attributes Attribute: section with value FEATURES Element article has sub-elements Sub-element of article Element Tag Name:title Element text: Share 2.0 Sub-element of article Element Tag Name:author Element text: Alan Joch
To demonstrate error handling with the ERROR_STREAM
attribute, add an error in the example XML document. For example, remove a </journal>
tag. Run the DOMParserApp.java
application in JDeveloper. An error message gets outputted to the file specified in the ERROR_STREAM
attribute:
<Line 15, Column 10>: XML-20121: (Fatal Error) End tag does not match start tag 'journal'.