What is DOM
DOM is an Object representation of an XML, HTML or XHTML document. In this tutorial we will be dealing with only XML. DOM represents the XML as a Document tree. JAXP provides API for DOM implementation in Java. It also provides parsing interface which can be used to plugin different parsers (JAXP provides a default implementation)
Classes
- org.w3c.dom – Contains classes that are DOM representation of an XML Document and its components. Classes include :
- Document – Represents an entire XML or HTML Document. It is the root of the Document tree.
- Element – Represents an element in an XML or HTML Document. It has methods to access the attributes of an xml element.
- Attribute – Represents an attribute in an Element object.
- CDATASection – Represents CDATA Section. These are blocks of text that can contain characters that are normally part of markup.
- Text – Represents textual content of an element or an Attribute. If the text does not contain markup then all text is contain in a single node, if it contains markup then the various elements are added as children of the Text element.
- Processing Instruction – Represents a Processing Instruction in an XML document.
- Comment – Represents a comment in an XML Document. Contains comment text.
- javax.xml.parsers – Contains interfaces that the DOM and SAX Parsers need to implement :
- DocumentBuilderFactory – Defines a factory that can be used to obtain DOM parsers
- DocumentBuilder – Defines interface methods that can be used to obtain a DOM Object tree from an XML Document
JAXP DOM in action
Lets now see an example of a DOM representation of an XML document. In this example we look at the following:
- Parsing the XML using the default DOM Parser.
- Obtaining the root element
- Obtaining all elements with a specific name
- Obtaining all elements with a specific name and in a specific namespace
- Iterating through all child nodes and parsing through them.
package com.studytrails.xml.jaxp; import java.io.IOException; import javax.xml.parsers.DocumentBuilder; import javax.xml.parsers.DocumentBuilderFactory; import javax.xml.parsers.ParserConfigurationException; import org.w3c.dom.Document; import org.w3c.dom.Element; import org.w3c.dom.NamedNodeMap; import org.w3c.dom.Node; import org.w3c.dom.NodeList; import org.xml.sax.SAXException; public class JaxpDOMExample1 { private static String xmlSource = "http://feeds.bbci.co.uk/news/technology/rss.xml?edition=int"; public static void main(String[] args) throws ParserConfigurationException, SAXException, IOException { JaxpDOMExample1 example = new JaxpDOMExample1(); example.startParsing(); } void startParsing() throws ParserConfigurationException, SAXException, IOException { // create the factory for the DocumentBuilder. JAXP ships with a xerces // as the default DOM parser. DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance(); System.out.println(factory.getClass()); // prints class // com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderFactoryImpl // we want the factory to be namespace aware. This is important if the // XML declares and uses additional namespaces factory.setNamespaceAware(true); // the actual builder or parser DocumentBuilder builder = factory.newDocumentBuilder(); // the Document that represents the XML Document bbcDoc = builder.parse(xmlSource); // the root element. Element rootElement = bbcDoc.getDocumentElement(); System.out.println(rootElement.getNodeName()); // prints rss // search for an element using the name NodeList list = rootElement.getElementsByTagName("channel"); // get the first item in the list Node channel = list.item(0); // get the child nodes NodeList channelChildren = channel.getChildNodes(); int length = channelChildren.getLength(); for (int i = 0; i < length; i++) { Node node = channelChildren.item(i); // node type 1 is text if (1 == node.getNodeType()) { if ("title".equals(node.getNodeName())) // the text element is the child node System.out.println(node.getFirstChild().getTextContent()); } } // get all elements with the name 'link'. We just print the first link NodeList linkList = rootElement.getElementsByTagName("link"); System.out.println(linkList.item(0).getFirstChild().getTextContent()); // <atom:link href="http://feeds.bbci.co.uk/news/technology/rss.xml" // rel="self" type="application/rss+xml"/> // get all elements with the name 'link' and in a specific namespace NodeList linkList2 = rootElement.getElementsByTagNameNS("http://www.w3.org/2005/Atom", "link"); Node atomLink = linkList2.item(0); System.out.println(atomLink.hasAttributes()); // prints true NamedNodeMap atomLinkAttributes = atomLink.getAttributes(); for (int i = 0; i < atomLinkAttributes.getLength(); i++) { Node atomLinkAttribute = atomLinkAttributes.item(i); System.out.println(atomLinkAttribute.getNodeName()); System.out.println(atomLinkAttribute.getNodeValue()); /*prints * href * http://feeds.bbci.co.uk/news/technology/rss.xml * rel * self */ } Node firstChildOfRoot = rootElement.getFirstChild(); System.out.println(firstChildOfRoot.getNodeName()); // prints #text Node siblingOfFirstChild = firstChildOfRoot.getNextSibling(); System.out.println(siblingOfFirstChild.getNodeName()); // prints channel } }