What is DOM

DOM is an Object representation of an XML, HTML or XHTML document. In this tutorial we will be dealing with only XML. DOM represents the XML as a Document tree. JAXP provides API for DOM implementation in Java. It also provides parsing interface which can be used to plugin different parsers (JAXP provides a default implementation)

Classes

org.w3c.dom – Contains classes that are DOM representation of an XML Document and its components. Classes include :
- Document – Represents an entire XML or HTML Document. It is the root of the Document tree.
- Element – Represents an element in an XML or HTML Document. It has methods to access the attributes of an xml element.
- Attribute – Represents an attribute in an Element object.
- CDATASection – Represents CDATA Section. These are blocks of text that can contain characters that are normally part of markup.
- Text – Represents textual content of an element or an Attribute. If the text does not contain markup then all text is contain in a single node, if it contains markup then the various elements are added as children of the Text element.
- Processing Instruction – Represents a Processing Instruction in an XML document.
- Comment – Represents a comment in an XML Document. Contains comment text.
javax.xml.parsers – Contains interfaces that the DOM and SAX Parsers need to implement :
- DocumentBuilderFactory – Defines a factory that can be used to obtain DOM parsers
- DocumentBuilder – Defines interface methods that can be used to obtain a DOM Object tree from an XML Document

JAXP DOM in action

Lets now see an example of a DOM representation of an XML document. In this example we look at the following:

Parsing the XML using the default DOM Parser.
Obtaining the root element
Obtaining all elements with a specific name
Obtaining all elements with a specific name and in a specific namespace
Iterating through all child nodes and parsing through them.

package com.studytrails.xml.jaxp;

import java.io.IOException;

import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;

import org.w3c.dom.Document;
import org.w3c.dom.Element;
import org.w3c.dom.NamedNodeMap;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
import org.xml.sax.SAXException;

public class JaxpDOMExample1 {

	private static String xmlSource = "http://feeds.bbci.co.uk/news/technology/rss.xml?edition=int";

	public static void main(String[] args) throws ParserConfigurationException, SAXException, IOException {
		JaxpDOMExample1 example = new JaxpDOMExample1();
		example.startParsing();

	}

	void startParsing() throws ParserConfigurationException, SAXException, IOException {

		// create the factory for the DocumentBuilder. JAXP ships with a xerces
		// as the default DOM parser.
		DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
		System.out.println(factory.getClass());
		// prints class
		// com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderFactoryImpl
		// we want the factory to be namespace aware. This is important if the
		// XML declares and uses additional namespaces
		factory.setNamespaceAware(true);
		// the actual builder or parser
		DocumentBuilder builder = factory.newDocumentBuilder();

		// the Document that represents the XML
		Document bbcDoc = builder.parse(xmlSource);

		// the root element.
		Element rootElement = bbcDoc.getDocumentElement();
		System.out.println(rootElement.getNodeName());
		// prints rss

		// search for an element using the name
		NodeList list = rootElement.getElementsByTagName("channel");
		// get the first item in the list
		Node channel = list.item(0);
		// get the child nodes
		NodeList channelChildren = channel.getChildNodes();
		int length = channelChildren.getLength();
		for (int i = 0; i < length; i++) {
			Node node = channelChildren.item(i);
			// node type 1 is text
			if (1 == node.getNodeType()) {
				if ("title".equals(node.getNodeName()))
					// the text element is the child node
					System.out.println(node.getFirstChild().getTextContent());
			}
		}

		// get all elements with the name 'link'. We just print the first link
		NodeList linkList = rootElement.getElementsByTagName("link");
		System.out.println(linkList.item(0).getFirstChild().getTextContent());
		// &ltatom:link href="http://feeds.bbci.co.uk/news/technology/rss.xml"
		// rel="self" type="application/rss+xml"/&gt

		// get all elements with the name 'link' and in a specific namespace
		NodeList linkList2 = rootElement.getElementsByTagNameNS("http://www.w3.org/2005/Atom", "link");
		Node atomLink = linkList2.item(0);
		System.out.println(atomLink.hasAttributes()); // prints true
		NamedNodeMap atomLinkAttributes = atomLink.getAttributes();
		for (int i = 0; i < atomLinkAttributes.getLength(); i++) {
			Node atomLinkAttribute = atomLinkAttributes.item(i);
			System.out.println(atomLinkAttribute.getNodeName());
			System.out.println(atomLinkAttribute.getNodeValue());
			/*prints 
			 * href 
			 * http://feeds.bbci.co.uk/news/technology/rss.xml
			 * rel
			 * self
			 */
		}

		Node firstChildOfRoot = rootElement.getFirstChild();
		System.out.println(firstChildOfRoot.getNodeName());
		// prints #text

		Node siblingOfFirstChild = firstChildOfRoot.getNextSibling();
		System.out.println(siblingOfFirstChild.getNodeName());
		// prints channel

	}

}

Java API for XML (JAXP) – DOM

What is DOM

Classes

JAXP DOM in action

Leave a Comment Cancel reply