Google

Aug 1, 2014

Java and XPath tutorial to extract a subset of an XML

Q. What is XPath?
A. XPath is a query language to extract a part of XML document as an SQL is used to extract a part of a database data or a REGEX (i.e. regular expression) is used to extract a part of text. The XPath expressions can return,

XPathConstants.STRING
XPathConstants.NUMBER
XPathConstants.BOOLEAN
XPathConstants.NODE
XPathConstants.NODESET 


Here is a very basic example of XPath in Java to process a very basic XML

<Employee>
   <name type="first">Peter</name>
   <age>25</age>
</Employee>


package com.xml;

import java.io.ByteArrayInputStream;
import java.io.IOException;

import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.xpath.XPath;
import javax.xml.xpath.XPathConstants;
import javax.xml.xpath.XPathExpression;
import javax.xml.xpath.XPathExpressionException;
import javax.xml.xpath.XPathFactory;

import org.w3c.dom.Document;
import org.w3c.dom.NodeList;
import org.xml.sax.SAXException;

public class XpathQuery {

 public static void main(String[] args) {

  String xml = "<Employee><name type=\"first\">Peter</name><age>25</age></Employee>";
  DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
  factory.setNamespaceAware(true);
  DocumentBuilder builder;
  Document document = null;

  try {
   builder = factory.newDocumentBuilder();
   document = builder.parse(new ByteArrayInputStream(xml.getBytes()));

   XPathFactory xpathFactory = XPathFactory.newInstance();
   XPath xpath = xpathFactory.newXPath();

   // get employee by name with XPath expression
   // get name of the employee with age > 18
   XPathExpression expr = xpath.compile("/Employee[age>18]/name/text()");
   NodeList nodes = (NodeList) expr.evaluate(document, XPathConstants.NODESET);
   for (int i = 0; i < nodes.getLength(); i++) {
    System.out.println("age > 15 : " + nodes.item(i).getNodeValue());
   }

   // get the age of Peter
   expr = xpath.compile("/Employee[name='Peter']/age/text()");
   nodes = (NodeList) expr.evaluate(document, XPathConstants.NODESET);
   for (int i = 0; i < nodes.getLength(); i++) {
    System.out.println("age of peter : " + nodes.item(i).getNodeValue());
   }

   // get first name where type=first
   expr = xpath.compile("/Employee/name[@type='first']/text()");
   nodes = (NodeList) expr.evaluate(document, XPathConstants.NODESET);
   for (int i = 0; i < nodes.getLength(); i++) {
    System.out.println("attr type='first': " + nodes.item(i).getNodeValue());
   }

  } catch (ParserConfigurationException | IOException | SAXException
    | XPathExpressionException e) { // Java 6
                           
   e.printStackTrace();
  }

 }
}



Output:

age > 15 : Peter
age of peter : 25
attribute type='first': Peter


Note: Like SQL, you need to learn the XPath query language or syntax. For example @Type means attribute "type", and "/" means root node, etc. Google for XPath syntax to learn more.

Labels:

0 Comments:

Post a Comment

Subscribe to Post Comments [Atom]

<< Home