XML Processing - Java 8 Recipes, 2th Edition (2014)

Java 8 Recipes, 2th Edition (2014)

CHAPTER 20. XML Processing

XML APIs have always been available to the Java developer, usually supplied as third-party libraries that could be added to the runtime class path. Beginning in Java 7, the Java API for XML Processing (JAXP), Java API for XML Binding (JAXB), and the Java API for XML Web Services (JAX-WS) were included in the core runtime libraries. The most fundamental XML processing tasks that you will encounter involve only a few use cases: writing and reading XML documents, validating those documents, and using JAXB to assist in marshalling/unmarshalling Java objects. This chapter provides recipes for these common tasks.

Image Note The source code for this chapter’s examples is available in the org.java8recipes.chapter20 package. See the introductory chapters for instructions on how to find and download this book’s sample source code.

20-1. Writing an XML File

Problem

You want to create an XML document to store application data.

Solution

To write an XML document, use the javax.xml.stream.XMLStreamWriter class. The following code iterates over an array of Patient objects and writes the data to an .xml file. This sample code comes from theorg.java8recipes.chapter20.recipe20_1.DocWriter example:

import javax.xml.stream.XMLOutputFactory;
import javax.xml.stream.XMLStreamException;
import javax.xml.stream.XMLStreamWriter;
...
public void run(String outputFile) throws FileNotFoundException, XMLStreamException,
IOException {
List<Patient> patients = new ArrayList<>();
Patient p1 = new Patient();
Patient p2 = new Patient();
Patient p3 = new Patient();
p1.setId(BigInteger.valueOf(1));
p1.setName("John Smith");
p1.setDiagnosis("Common Cold");
p2.setId(BigInteger.valueOf(2));
p2.setName("Jane Doe");
p2.setDiagnosis("Broken Ankle");
p3.setId(BigInteger.valueOf(3));
p3.setName("Jack Brown");
p3.setDiagnosis("Food Allergy");
patients.add(p1);
patients.add(p2);
patients.add(p3);
XMLOutputFactory factory = XMLOutputFactory.newFactory();
try (FileOutputStream fos = new FileOutputStream(outputFile)) {
XMLStreamWriter writer = factory.createXMLStreamWriter(fos, "UTF-8");
writer.writeStartDocument();
writer.writeCharacters("\n");
writer.writeStartElement("patients");
writer.writeCharacters("\n");
for (Patient p : patients) {
writer.writeCharacters("\t");
writer.writeStartElement("patient");
writer.writeAttribute("id", String.valueOf(p.getId()));
writer.writeCharacters("\n\t\t");
writer.writeStartElement("name");
writer.writeCharacters(p.getName());
writer.writeEndElement();
writer.writeCharacters("\n\t\t");
writer.writeStartElement("diagnosis");
writer.writeCharacters(p.getDiagnosis());
writer.writeEndElement();
writer.writeCharacters("\n\t");
writer.writeEndElement();
writer.writeCharacters("\n");
}
writer.writeEndElement();
writer.writeEndDocument();
writer.close();
}

}

The previous code writes the following file contents:

<?xml version="1.0" ?>
<patients>
<patient id="1">
<name>John Smith</name>
<diagnosis>Common Cold</diagnosis>
</patient>
<patient id="2">
<name>Jane Doe</name>
<diagnosis>Broken ankle</diagnosis>
</patient>
<patient id="3">
<name>Jack Brown</name>
<diagnosis>Food allergy</diagnosis>
</patient>
</patients>

How It Works

The Java standard library provides several ways to write XML documents. One model is the Simple API for XML (SAX). The newer, simpler, and more efficient model is the Streaming API for XML (StAX). This recipe uses StAX defined in the javax.xml.stream package. Writing an XML document takes five steps:

1. Create a file output stream.

2. Create an XML output factory and an XML output stream writer.

3. Wrap the file stream in the XML stream writer.

4. Use the XML stream writer’s write methods to create the document and write the XML elements.

5. Close the output streams.

Create a file output stream using the java.io.FileOutputStream class. You can use a try-block to open and close this stream. Learn more about the new try-block syntax in Chapter 9.

The javax.xml.stream.XMLOutputFactory provides a static method that creates an output factory. Use the factory to create a javax.xml.stream.XMLStreamWriter.

Once you have the writer, wrap the file stream object in the XML writer instance. You will use the various write methods to create the XML document elements and attributes. Finally, you simply close the writer when you finish writing to the file. Some of the more useful methods of theXMLStreamWriter instance are these:

· writeStartDocument()

· writeStartElement()

· writeEndElement()

· writeEndDocument()

· writeAttribute()

After creating the file and XMLStreamWriter, you always should begin the document by calling the writeStartDocumentMethod() method. Follow this by writing individual elements using the writeStartElement() and writeEndElement() methods in combination. Of course, elements can have nested elements. You have the responsibility to call these in proper sequence to create well-formed documents. Use the writeAttribute() method to place an attribute name and value into the current element. You should callwriteAttribute() immediately after calling the writeStartElement() method. Finally, signal the end of the document with the writeEndDocument() method and close the Writer instance.

One interesting point of using the XMLStreamWriter is that it does not format the document output. Unless you specifically use the writeCharacters() method to output space and newline characters, the output will stream to a single unformatted line. Of course, this doesn’t invalidate the resulting XML file, but it does make it inconvenient and difficult for humans to read. Therefore, you should consider using the writeCharacters() method to output spacing and newline characters as needed to create a human readable document. You can safely ignore this method of writing additional whitespace and line breaks if you do not need a document for human readability. Regardless of the format, the XML document will be well formed because it adheres to correct XML syntax.

The command-line usage pattern for this example code is this:

java org.java8recipes.chapter20.recipe20_1.DocWriter <outputXmlFile>

Invoke this application to create a file named patients.xml in the following way:

java org.java8recipes.chapter20.recipe20_1.DocWriter patients.xml

20-2. Reading an XML File

Problem

You need to parse an XML document, retrieving known elements and attributes.

Solution 1

Use the javax.xml.stream.XMLStreamReader interface to read documents. Using this API, your code will pull XML elements using a cursor-like interface similar to that in SQL to process each element in turn. The following code snippet fromorg.java8recipes.DocReader demonstrates how to read the patients.xml file that was generated in the previous recipe:

public void cursorReader(String xmlFile)
throws FileNotFoundException, IOException, XMLStreamException {
XMLInputFactory factory = XMLInputFactory.newFactory();
try (FileInputStream fis = new FileInputStream(xmlFile)) {
XMLStreamReader reader = factory.createXMLStreamReader(fis);
boolean inName = false;
boolean inDiagnosis = false;
String id = null;
String name = null;
String diagnosis = null;

while (reader.hasNext()) {
int event = reader.next();
switch (event) {
case XMLStreamConstants.START_ELEMENT:
String elementName = reader.getLocalName();
switch (elementName) {
case "patient":
id = reader.getAttributeValue(0);
break;
case "name":
inName = true;
break;
case "diagnosis":
inDiagnosis = true;
break;
default:
break;
}
break;
case XMLStreamConstants.END_ELEMENT:
String elementname = reader.getLocalName();
if (elementname.equals("patient")) {
System.out.printf("Patient: %s\nName: %s\nDiagnosis: %s\n\n",id, name,
diagnosis);
id = name = diagnosis = null;
inName = inDiagnosis = false;
}
break;
case XMLStreamConstants.CHARACTERS:
if (inName) {
name = reader.getText();
inName = false;
} else if (inDiagnosis) {
diagnosis = reader.getText();
inDiagnosis = false;
}
break;
default:
break;
}
}
reader.close();
}
}

Solution 2

Use the XMLEventReader to read and process events using an event-oriented interface. This API is called an iterator-oriented API as well. The following code is much like the code in Solution 1, except that it uses the event-oriented API instead of the cursor-oriented API. This code snippet is available from the same org.java8recipes.chapter20.recipe20_1.DocReader class used in Solution 1:

public void eventReader(String xmlFile)
throws FileNotFoundException, IOException, XMLStreamException {
XMLInputFactory factory = XMLInputFactory.newFactory();
XMLEventReader reader = null;
try(FileInputStream fis = new FileInputStream(xmlFile)) {
reader = factory.createXMLEventReader(fis);
boolean inName = false;
boolean inDiagnosis = false;
String id = null;
String name = null;
String diagnosis = null;

while(reader.hasNext()) {
XMLEvent event = reader.nextEvent();
String elementName = null;
switch(event.getEventType()) {
case XMLEvent.START_ELEMENT:
StartElement startElement = event.asStartElement();
elementName = startElement.getName().getLocalPart();
switch(elementName) {
case "patient":
id = startElement.getAttributeByName(QName.valueOf("id")).getValue();
break;
case "name":
inName = true;
break;
case "diagnosis":
inDiagnosis = true;
break;
default:
break;
}
break;
case XMLEvent.END_ELEMENT:
EndElement endElement = event.asEndElement();
elementName = endElement.getName().getLocalPart();
if (elementName.equals("patient")) {
System.out.printf("Patient: %s\nName: %s\nDiagnosis: %s\n\n",id, name, diagnosis);
id = name = diagnosis = null;
inName = inDiagnosis = false;
}
break;
case XMLEvent.CHARACTERS:
String value = event.asCharacters().getData();
if (inName) {
name = value;
inName = false;
} else if (inDiagnosis) {
diagnosis = value;
inDiagnosis = false;
}
break;
}
}
}
if(reader != null) {
reader.close();
}
}

How It Works

Java provides several ways to read XML documents. One way is to use StAX, a streaming model. It is better than the older SAX API because it allows you to both read and write XML documents. Although StAX is not quite as powerful as a DOM API, it is an excellent and efficient API that is less taxing on memory resources.

StAX provides two methods for reading XML documents: a cursor API and an iterator API. The cursor-oriented API utilizes a cursor that can walk an XML document from start to finish, pointing to one element at a time, and always moving forward. The iterator API represents an XML document stream as a set of discrete event objects, provided in the order that they are read in the source XML. The event-oriented, iterator API is preferred over the cursor API at this time because it provides XMLEvent objects with the following benefits:

· The XMLEvent objects are immutable and can persist even though the StAX parser has moved on to subsequent events. You can pass these XMLEvent objects to other processes or store them in lists, arrays, and maps.

· You can subclass XMLEvent, creating your own specialized events as needed.

· You can modify the incoming event stream by adding or removing events, which is more flexible than the cursor API.

To use StAX to read documents, create an XML event reader on your file input stream. Check that events are still available with the hasNext() method and read each event using the nextEvent() method. The nextEvent() method will return a specific type of XMLEvent that corresponds to the start and stop elements, attributes, and value data in the XML file. Remember to close your readers and file streams when you’re finished with those objects.

You can invoke the example application like this, using the patients.xml file as your <xmlFile> argument:

java org.java8recipes.chapter20.recipe20_2.DocReader <xmlFile>

20-3. Transforming XML

Problem

You want to convert an XML document to another format, for example to HTML.

Solution

Use the javax.xml.transform package to transform an XML document to another document format.

The following code demonstrates how to read a source document, apply an Extensible Stylesheet Language (XSL) transform file, and produce the transformed, new document. Use the sample code from the org.java8recipes.chapter20.recipe20_3.TransformXml class to read the patients.xml file and create a patients.html file. The following snippet shows the important pieces of this class:

import javax.xml.transform.TransformerConfigurationException;
import javax.xml.transform.TransformerException;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.Transformer;
import javax.xml.transform.Source;
import javax.xml.transform.stream.StreamResult;
import javax.xml.transform.stream.StreamSource;
...
public void run(String xmlFile, String xslFile, String outputFile)
throws FileNotFoundException, TransformerConfigurationException, TransformerException {
InputStream xslInputStream = new FileInputStream(xslFile);
Source xslSource = new StreamSource(xslInputStream);
TransformerFactory factory = TransformerFactory.newInstance();
Transformer transformer = factory.newTransformer(xslSource);
InputStream xmlInputStream = new FileInputStream(xmlFile);
StreamSource in = new StreamSource(xmlInputStream);
StreamResult out = new StreamResult(outputFile);
transformer.transform(in, out);
...
}

How It Works

The javax.xml.transform package contains all the classes you need to transform an XML document into any other document type. The most common use case is to convert data-oriented XML documents into user-readable HTML documents.

Transforming from one document type to another requires three files:

· An XML source document

· An XSL transformation document that maps XML elements to the new document elements

· A target output file

The XML source document is, of course, your source data file. It will most often contain data-oriented content that is easy to parse programmatically. However, people don’t easily read XML files, especially complex, data-rich files. Instead, people are much more comfortable reading properly rendered HTML documents.

The XSL transformation document specifies how an XML document should be transformed into a different format. An XSL file will usually contain an HTML template that specifies dynamic fields that will hold the extracted contents of a source XML file.

In this example’s source code, you’ll find two source documents:

· chapter20/recipe20_3/patients.xml

· chapter20/recipe20_3/patients.xsl

The patients.xml file is short and contains the following data:

<?xml version="1.0" encoding="UTF-8"?>
<patients>
<patient id="1">
<name>John Smith</name>
<diagnosis>Common Cold</diagnosis>
</patient>
<patient id="2">
<name>Jane Doe</name>
<diagnosis>Broken ankle</diagnosis>
</patient>
<patient id="3">
<name>Jack Brown</name>
<diagnosis>Food allergy</diagnosis>
</patient>
</patients>

The patients.xml file defines a root element called patients. It has three nested patient elements. The patient elements contain three pieces of data:

· Patient identifier, provided as the id attribute of the patient element

· Patient name, provided as the name subelement

· Patient diagnosis, provided as the diagnosis subelement

The transformation XSL document (patients.xsl) is quite small as well, and it simply maps the patient data to a more user-readable HTML format using XSL:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output method="html"/>
<xsl:template match="/">
<html>
<head>
<title>Patients</title>
</head>
<body>
<table border="1">
<tr>
<th>Id</th>
<th>Name</th>
<th>Diagnosis</th>
</tr>
<xsl:for-each select="patients/patient">
<tr>
<td>
<xsl:value-of select="@id"/>
</td>
<td>
<xsl:value-of select="name"/>
</td>
<td>
<xsl:value-of select="diagnosis"/>
</td>
</tr>
</xsl:for-each>
</table>
</body>
</html>
</xsl:template>
</xsl:stylesheet>

Using this stylesheet, the sample code transforms the XML into an HTML table containing all the patients and their data. Rendered in a browser, the HTML table should look like the one in Figure 20-1.

9781430268277_Fig20-01.jpg

Figure 20-1. A common rendering of an HTML table

The process for using this XSL file to convert the XML to an HTML file is straightforward, but every step can be enhanced with additional error checking and processing. For this example, refer to the previous code in the solution section.

The most basic transformation steps are these:

1. Read the XSL document into your Java application as a Source object.

2. Create a Transformer instance and provide your XSL Source instance for it to use during its operation.

3. Create a SourceStream that represents the source XML contents.

4. Create a StreamResult instance for your output document, which is an HTML file in this case.

5. Use the Transformer object’s transform() method to perform the conversion.

6. Close all the relevant streams and file instances, as needed.

If you choose to execute the sample code, you should invoke it in the following way, using patients.xml, patients.xsl, and patients.html as arguments:

java org.java8recipes.chapter20.recipe20_3.TransformXml <xmlFile><xslFile><outputFile>

20-4. Validating XML

Problem

You want to confirm that your XML is valid—that it conforms to a known document definition or schema.

Solution

Validate that your XML conforms to a specific schema by using the javax.xml.validation package. The following code snippet from org.java8recipes.chapter20.recipe20_4.ValidateXml demonstrates how to validate against an XML schema file:

import java.io.File;
import java.io.IOException;
import javax.xml.XMLConstants;
import javax.xml.transform.Source;
import javax.xml.transform.stream.StreamSource;
import javax.xml.validation.Schema;
import javax.xml.validation.SchemaFactory;
import javax.xml.validation.Validator;
import org.xml.sax.SAXException;
...
public void run(String xmlFile, String validationFile) {
boolean valid = true;
SchemaFactory sFactory =
SchemaFactory.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI);
try {
Schema schema = sFactory.newSchema(new File(validationFile));
Validator validator = schema.newValidator();
Source source = new StreamSource(new File(xmlFile));
validator.validate(source);
} catch (SAXException | IOException | IllegalArgumentException ex) {
valid = false;
}
System.out.printf("XML file is %s.\n", valid ? "valid" : "invalid");
}
...

How It Works

When utilizing XML, it is important to validate it to ensure that the correct syntax is in place, and to ensure that an XML document is an instance of the specified XML schema. The validation process involves comparing the schema and the XML document to find any discrepancies. Thejavax.xml.validation package provides all the classes needed to reliably validate an XML file against a variety of schemas. The most common schemas that you will use for XML validation are defined as constant URIs within the XMLConstants class:

· XMLConstants.W3C_XML_SCHEMA_NS_URI

· XMLConstants.RELAXNG_NS_URI

Begin by creating a SchemaFactory for a specific type of schema definition. A SchemaFactory knows how to parse a particular schema type and prepares it for validation. Use the SchemaFactory instance to create a Schema object. The Schema object is an in-memory representation of the schema definition grammar. You can use the Schema instance to retrieve a Validator instance that understands this grammar. Finally, use the validate() method to check your XML. The method call will generate several exceptions if anything goes wrong during the validation. Otherwise, the validate() method returns quietly, and you can continue to use the XML file.

Image Note The XML Schema was the first to receive “Recommendation” status from the World Wide Web consortium (W3C) in 2001. Competing schemas have since become available. One competing schema is the Regular Language for XML Next Generation (RELAX NG) schema. RELAX NG may be a simpler schema and its specification also defines a non-XML, compact syntax. This recipe’s example uses the XML schema.

Run the example code using the following command-line syntax, preferably with the sample .xml file and validation files provided as resources/patients.xml and patients.xsl, respectively:

java org.java8recipes.chapter20.recipe20_4.ValidateXml <xmlFile><validationFile>

20-5. Creating Java Bindings for an XML Schema

Problem

You want to generate a set of Java classes (Java bindings) that represent the objects in an XML schema.

Solution

The JDK provides a tool that can turn schema documents into representative Java class files. Use the <JDK_HOME>/bin/xjc command-line tool to generate Java bindings for your XML schemas. To create the Java classes for the patients.xsd file from Recipe 20-3, you could issue the following command from within a console:

xjc –p org.java8recipes.chapter20.recipe20_5 patients.xsd

This command will process the patients.xsd file and create all the classes needed to process an XML file that validates with this schema. For this example, the patients.xsd file looks like the following:

<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified">
<xs:element name="patients">
<xs:complexType>
<xs:sequence>
<xs:element maxOccurs="unbounded" name="patient" type="Patient"/>
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:complexType name="Patient">
<xs:sequence>
<xs:element name="name" type="xs:string"/>
<xs:element name="diagnosis" type="xs:string"/>
</xs:sequence>
<xs:attribute name="id" type="xs:integer" use="required"/>
</xs:complexType>
</xs:schema>

Executed on the previous xsd file, the xjc command creates the following files in the org.java8recipes.chapter20.recipe20_5 package:

· ObjectFactory.java

· Patients.java

· Patient.java

How It Works

The JDK includes the <JDK_HOME>/bin/xjc utility. The xjc utility is a command-line application that creates Java bindings from schema files. The source schema files can be several types, including XML Schemas, RELAX NG, and others.

The xjc command has several options for performing its work. Some of the most common options specify the source schema file, the package of the generated Java binding files, and the output directory that will receive the Java binding files.

You can get detailed descriptions of all the command-line options by using the tools’ –help option:

xjc –help

A Java binding contains annotated fields that correspond to the fields defined in the XML Schema file. These annotations mark the root element of the schema file and all other subelements. This is useful during the next step of XML processing, which involves either unmarshalling or marshalling these bindings.

20-6. Unmarshalling XML to a Java Object

Problem

You want to unmarshall an XML file and create its corresponding Java object tree.

Solution

Unmarshalling is the process of converting a data format, in this case XML, into a memory representation of the object so that can be used to perform a task. JAXB provides an unmarshalling service that parses an XML file and generates the Java objects from the bindings you created in Recipe 20-4. The following code can read the file patients.xml from the org.java8recipes.chapter20.recipe20-6 package to create a Patients root object and its list of Patient objects:

public void run(String xmlFile, String context)
throws JAXBException, FileNotFoundException {
JAXBContext jc = JAXBContext.newInstance(context);
Unmarshaller u = jc.createUnmarshaller();
FileInputStream fis = new FileInputStream(xmlFile);
Patients patients = (Patients)u.unmarshal(fis);
for (Patient p: patients.getPatient()) {
System.out.printf("ID: %s\n", p.getId());
System.out.printf("NAME: %s\n", p.getName());
System.out.printf("DIAGNOSIS: %s\n\n", p.getDiagnosis());
}
}

If you run the sample code on the chapter20/recipe20_6/patients.xml file and use the org.java8recipes.chapter20 context, the application will print the following to the console as it iterates over the Patient object list:

ID: 1
NAME: John Smith
DIAGNOSIS: Common Cold

ID: 2
NAME: Jane Doe
DIAGNOSIS: Broken ankle

ID: 3
NAME: Jack Brown
DIAGNOSIS: Food allergy

Image Note The previous output comes directly from instances of the Java Patient class that was created from XML representations. The code does not print the contents of the XML file directly. Instead, it prints the contents of the Java bindings after the XML has been marshalled into appropriate Java binding instances.

How It Works

Unmarshalling an XML file into its Java object representation has at least two criteria:

· A well-formed and valid XML file

· A set of corresponding Java bindings

The Java bindings don’t have to be autogenerated from the xjc command. Once you’ve gained some experience with Java bindings and the annotation features, you may prefer to create and control all aspects of Java binding by handcrafting your Java bindings. Whatever your preference, Java’s unmarshalling service utilizes the bindings and their annotations to map XML objects to a target Java object and to map XML elements to target object fields.

Execute the example application for this recipe using this syntax, substituting patients.xml and org.java8recipes.chapter20.recipe20_6 for the respective parameters:

java org.java8recipes.chapter20.recipe20_6.UnmarshalPatients <xmlfile><context>

20-7. Building an XML Document with JAXB

Problem

You need to write an object’s data to an XML representation.

Solution

Assuming you have created Java binding files for your XML schema as described in Recipe 20-4, you use a JAXBContext instance to create a Marshaller object. You then use the Marshaller object to serialize your Java object tree to an XML document. The following code demonstrates this:

public void run(String xmlFile, String context)
throws JAXBException, FileNotFoundException {
Patients patients = new Patients();
List<Patient> patientList = patients.getPatient();
Patient p = new Patient();
p.setId(BigInteger.valueOf(1));
p.setName("John Doe");
p.setDiagnosis("Schizophrenia");
patientList.add(p);

JAXBContext jc = JAXBContext.newInstance(context);
Marshaller m = jc.createMarshaller();
m.marshal(patients, new FileOutputStream(xmlFile));
}

The previous code produces an unformatted but well-formed and valid XML document. For readability, the XML document is formatted here:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<patients>
<patient id="1">
<name>John Doe</name>
<diagnosis>Schizophrenia</diagnosis>
</patient>
</patients>

Image Note The getPatient() method in the previous code returns a list of patient objects instead of a single patient. This is a naming oddity of the JAXB code generation from the XSD schema in this example.

How It Works

A Marshaller object understands JAXB annotations. As it processes classes, it uses the JAXB annotations to provide the context needed to create the object tree in XML.

You can run the previous code from the org.java8recipes.chapter20.recipe20_7.MarshalPatients application using the following command line:

java org.java8recipes.chapter20.recipe20_7.MarshalPatients <xmlfile><context>

The context argument refers to the package of the Java classes that you will marshal. In the previous example, because the code marshals a Patients object tree, the correct context is the package name of the Patients class. In this case, the context isorg.java8recipes.chapter20.

Summary

XML is commonly used to transfer data between disparate applications or to store data of some kind to a file. Therefore, it is important to understand the fundamentals for working with XML in your application development platform. This chapter provided an overview of how to perform some key tasks for working with XML using Java.

This chapter began with the basics of writing and reading XML. It then demonstrated how to transform XML into different formats, and how to validate against XML schemas. Lastly, the chapter covered topics detailing how to perform various tasks using XML data within your applications.