XML and PHP - PHP Advanced and Object-Oriented Programming (2013) - Visual Quickpro Guide

PHP Advanced and Object-Oriented Programming (2013)

Visual Quickpro Guide

13. XML and PHP


In This Chapter

What Is XML?

XML Syntax

Attributes, Empty Elements, and Entities

Defining XML Schemas

Parsing XML

Creating an RSS Feed

Review and Pursue


XML, the Extensible Markup Language, is one of the most important technologies in computing. With a range of uses, from sharing data between computers to storing data for a single application, XML provides a format for representing not just information but also information about the information (aka meta-information). XML, like HTML, is based on Standard Generalized Markup Language (SGML), which means that you’ll see numerous similarities between the two.

This chapter begins with a basic introduction to XML: what it is, the proper XML syntax, and how to make your own XML document. From there, PHP will take over, both for reading and creating XML documents. You’ll learn about and use the two primary XML handling methods. The chapter wraps up with a demonstration of creating RSS (Really Simple Syndication) feeds, an increasingly popular feature of Web sites.

What Is XML?

XML, which is governed by the World Wide Web Consortium (W3C), was created with several goals in mind:

• To be a regulated standard, not the proprietary technology of any one company

• To act as a highly flexible way to store nearly any type of information

• To be easily readable by humans and still usable by computers

• To be able to check itself for validity and integrity

While XML itself is not actually a markup language—despite its name—it provides a foundation for you to manufacture your own markup language. A markup language is used to help define or describe pieces of information. For example, the HTML code <strong>Giant</strong> indicates that the word Giant should be displayed in such a way as to suggest strong importance.

With XML you use tags to encapsulate pieces of information in defined chunks. XML tags (or elements as they are formally called) are the opposite of HTML tags in that they define what something is but do not suggest how that something should be displayed. Whereas the purpose of HTML is to present information, the purpose of XML is to identify information.


XML and...

Because XML is all about providing an independent way to store and transmit data, it’s often intertwined with other networking technologies. You’ll see other acronyms, like RPC (Remote Procedure Calls), SOAP (which used to be an acronym but technically isn’t anymore), WSDL (Web Services Description Language), and REST (Representational State Transfer). All of these technologies help create Web services: where part of the content of one Web site is based on data requested from another Web site. Google, eBay, PayPal, Amazon, Yahoo!, and others all offer ways to use their data in your applications.

Every book having its limitations, you won’t find examples of these here. Sadly, this chapter alone can only offer a mere introduction to XML. But that knowledge—basic XML—is critical to implementing Web services in your own site when you’re ready. If you take the knowledge covered in this chapter, and add that to the discussion of Web services in Chapter 10, “Networking with PHP,” you’ll have the fundamental tools to use network-provided XML in your Web applications.


The power of XML is that you are not limited to any predetermined set of tags; you can actually use XML to come up with your own. Once you have created your markup language (your own definition of elements), you can begin to store data formatted within the newly defined tags.

XML documents can be created in any text editor or IDE. And although most of today’s Web browsers can display XML data image, you’ll normally want to use another technology, such as PHP, to render the XML data into a more user-friendly format, as you’ll see by chapter’s end.

image

image Google Chrome (here, on Macintosh) automatically parses an XML document to display it in a more meaningful form.

XML Syntax

Before doing anything with XML, you must understand how XML documents are structured. An XML document contains two parts:

• The prolog or XML declaration

• The data

The XML prolog is the first line of every XML file and should be in the form

<?xml version="1.0"?>

The prolog indicates the XML version and, sometimes, the text encoding or similar attributes:

<?xml version="1.0" encoding="utf-8"?>

There are actually two versions of XML—1.0 and 1.1—but the differences aren’t important here and using version 1.0 is fine.

The main part of the XML document is the content itself. This section, like an HTML page, begins and ends with a root element. Each XML document can have only one root.

Within that root element will be more nested elements. Each element contains a start tag, the element data, and an end tag:

<tag>data</tag>

One example might involve products for an e-commerce store (Script 13.1). In this example, store is the root element.

The XML rules for elements insist on the following:

• XML tags must be balanced in that they open and close (for every <tag> there must be a </tag>).

• Elements can be nested but not intertwined. HTML will let you get away with a construct like <strong><em>Soul Mining</strong></em>, but XML will not.

Script 13.1. An XML document representing two products in a virtual store.


1 <?xml version="1.0" encoding="utf-8"?>
2 <!-- Script 13.1 - store.xml -->
3 <!DOCTYPE store SYSTEM "store.dtd">
4 <store>
5 <product>
6 <name>T-Shirt</name>
7 <size>XL</size>
8 <color>White</color>
9 <price>12.00</price>
10 </product>
11 <product>
12 <name>Sweater</name>
13 <size>M</size>
14 <color>Blue</color>
15 <price>25.50</price>
16 <picture filename="sweater.png" />
17 </product>
18 </store>


As for the tag names, they are case-sensitive. You can use letters, numbers, and some other characters, but tag names cannot contain spaces or begin with the letters xml. They can only begin with either a letter or the underscore.

Before getting into an example, there are two more things to know. First, it is safe to use white space outside of elements but not within the tags (XML, unlike PHP or HTML, is generally sensitive to white space). Second, you can place comments within an XML file—for your own use, not for any technical purposes—by using the same syntax as HTML:

<!-- This is my comment. -->

To start down the XML path, you’ll hand-code an XML document: a partial listing of the books I’ve written (with apologies for the self-centeredness of the example).

To write XML

1. Begin a new XML document in your text editor or IDE, to be named books1.xml (Script 13.2);

<?xml version="1.0" encoding="utf-8"?>
<!-- Script 13.2 - books1.xml -->

2. Open with the root element:

<collection>

For a file to be proper XML, it must use a root element. All of the data will be stored between the opening and closing tags of this element. You will make up the name of this element, as is the case with all of your elements.

Script 13.2. This is a basic XML document containing information about three books.


1 <?xml version="1.0" encoding="utf-8"?>
2 <!-- Script 13.2 - books1.xml -->
3 <collection>
4 <book>
5 <title>PHP Advanced and Object-Oriented Programming: Visual QuickPro Guide</title>
6 <author>Larry Ullman</author>
7 <year>2012</year>
8 </book>
9 <book>
10 <title>Modern JavaScript: Develop and Design</title>
11 <author>Larry Ullman</author>
12 <year>2012</year>
13 <pages>612</pages>
14 </book>
15 <book>
16 <title>C++ Programming: Visual QuickStart Guide</title>
17 <author>Larry Ullman</author>
18 <author>Andreas Signer</author>
19 <year>2006</year>
20 <pages>500</pages>
21 </book>
22 </collection>


3. Add a book to the file:

<book>
<title>PHP Advanced and Object-Oriented Programming: Visual QuickPro Guide</title>
<author>Larry Ullman</author>
<year>2012</year>
</book>

This book is represented by one element, book, with three nested elements: title, author, and year.

4. Add another book:

<book>
<title>Modern JavaScript: Develop and Design</title>
<author>Larry Ullman</author>
<year>2012</year>
<pages>612</pages>
</book>

This book has a fourth nested element: pages. It is perfectly acceptable, even common, for similar elements to have different subelements.

5. Add a third and final book:

<book>
<title>C++ Programming: Visual QuickStart Guide</title>
<author>Larry Ullman</author>
<author>Andreas Signer</author>
<year>2006</year>
<pages>500</pages>
</book>

This record is different from the others in that it has two authors. Each piece of author data (the name) is placed within its own author element (rather than putting both names within one element).

6. Complete the XML document:

</collection>

This closes the root element.

7. Save this file as books1.xml.

If you want, view it in a Web browser image.

image

image How Internet Explorer 9 displays the books1.xml file.


Tip

As you can see in Script 13.2, I’ve not used any indentations or extraneous spaces between tags. You can add those if you want to make the data more human readable.


Attributes, Empty Elements, and Entities

The preceding section of the chapter and the books1.xml file demonstrate the basic syntax of an XML document. There are three more concepts to cover before learning how to handle XML with PHP.

An element, as already described, has both tags and data. This is also true in HTML:

<p>some text here</p>

Also like HTML, XML elements can have attributes:

<tag attribute_name="value">data</tag>
<p class="highlight">some text here</p>

XML elements can have an unlimited number of attributes; the only restriction is that each attribute must have a value. You could not do

<p citation>some text here</p>

You can use either single or double quotes for quoting your attribute values, but you must use quotes and you should be consistent about which type you use.

Attributes are often used with empty elements. An empty element is one that doesn’t encapsulate any data. As an HTML example, <img /> is an empty element. Just as XHTML requires the space and the slash at the end of an empty element, so does XML:

<tag attribute_name="value" />

For example, you might have this:

<picture image_name="me.jpg" />

The last little idea to throw out here is the entity. Some characters cannot be used in XML data, as they cause conflicts. Instead, a character combination is used, known as the entity version of the problematic character. Entities always start with the ampersand (&) and end with the semicolon (;). Table 13.1 lists the five most common predeclared XML entities; there are many others.

Table 13.1. XML Entities

image

To use attributes, empty elements, and entities

1. Open books1.xml in your text editor or IDE, if it is not already open.

2. Add an edition number to the first book (Script 13.3):

<title edition="3">PHP Advanced and Object-Oriented Programming: Visual QuickPro Guide</title>

To indicate that this title is a third edition, an attribute is added to its title element, with values of 3. The other two books, which are both in their first editions, won’t have this attribute, although you could add it with a value of 1.

3. Add a couple chapters to the first book:

<chapter number="12">PHP's Command-Line Interface</chapter>
<chapter number="13">XML and PHP</chapter>
<chapter number="14">Debugging, Testing, and Performance</chapter>

Three chapters are added, each having an attribute of number, with a value of the chapter’s number. Chapter 12, whose name is PHP’s Command-Line Interface, requires the apostrophe entity (').

4. Add a couple chapters to the second book:

<chapter number="1" pages="24">(Re-)Introducing JavaScript</chapter>
<chapter number="2" pages="32">JavaScript in Action</chapter>
<chapter number="3" pages="34">Tools of the Trade</chapter>

To demonstrate multiple attributes, these chapters contain information about their number and page count. It is not a problem that the XML file contains both elements called pages and attributes with the same name.

5. Add a chapter to the third book:

<chapter number="12">Namespaces & Modularization</chapter>

I’m entering just this one chapter, to demonstrate the ampersand entity.

Script 13.3. Attributes, empty elements, and entities have been added to the XML document to better describe the data.


1 <?xml version="1.0" encoding="utf-8"?>
2 <!-- Script 13.3 - books2.xml -->
3 <collection>
4 <book>
5 <title edition="3">PHP Advanced and Object-Oriented Programming: Visual QuickPro Guide</title>
6 <author>Larry Ullman</author>
7 <year>2012</year>
8 <chapter number="12">PHP's Command-Line Interface</chapter>
9 <chapter number="13">XML and PHP</chapter>
10 <chapter number="14">Debugging, Testing, and Performance</chapter>
11 <cover filename="phpvqp3.jpg" />
12 </book>
13 <book>
14 <title>Modern JavaScript: Develop and Design</title>
15 <author>Larry Ullman</author>
16 <year>2012</year>
17 <pages>612</pages>
18 <chapter number="1" pages="24">(Re-)Introducing JavaScript</chapter>
19 <chapter number="2" pages="32">JavaScript in Action</chapter>
20 <chapter number="3" pages="34">Tools of the Trade</chapter>
21 </book>
22 <book>
23 <title>C++ Programming: Visual QuickStart Guide</title>
24 <author>Larry Ullman</author>
25 <author>Andreas Signer</author>
26 <year>2006</year>
27 <pages>500</pages>
28 <chapter number="12">Namespaces & Modularization</chapter>
29 </book>
30 </collection>


6. Add an empty element to the first book:

<cover filename="phpvqp3.jpg" />

The cover element contains no data but does have an attribute, whose value is the name of the cover image file.

7. Save this file as books2.xml.

If you want, view it in a Web browser image.

image

image The updated books2.xml in Safari.


Tip

HTML has dozens upon dozens of pre-declared entities, including the five listed in Table 13.1.



Tip

You can also create your own entities in a Document Type Definition file, discussed next in the chapter.



Tip

Whether you use a nested element or an attribute is often a matter of choice. The first book in could also be reflected as (omitting a couple elements for brevity):

<book>
<title>PHP Advanced and Object-Oriented Programming: Visual QuickPro Guide</title>
<edition>3</edition>
<chapter>
<number>12</number>
<name>PHP's Command-Line Interface</name>
</chapter>
<cover>php5adv.jpg</cover>
</book>



Well-Formed and Valid XML

Two ways of describing an XML document are well formed and valid. A well-formed XML document conforms to the XML standard. These are the rules discussed in the “XML Syntax” section of this chapter. A valid XML document both is well formed and adheres to the rules laid out in its associated schema or DTD.

Most XML data can be used as long as it is well formed, which is why many discussions of XML don’t even go into the topics of schema and DTD. But being valid is, for obvious reasons, better.


Defining XML Schemas

XML files primarily contain data, as you’ve already seen in the first three scripts. That data can also be associated with a schema, which is a guide for the XML document’s contents. Schemas are optional, but if provided, can be used to ensure that the XML data is valid.

A schema can be represented using two approaches:

• DTD, a Document Type Definition

• XML Schema

DTD is the older, more accessible approach, and I’ll discuss it first. XML Schema is a newer method that allows for mapping of XML data to specific types (instead of general data categories, as you’ll see with DTD). I’ll discuss XML Schema second.

Incorporating the DTD

To associate a DTD with an XML file, a reference line is placed after the prolog but before the data itself:

<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE rootelement [definition]>

The syntax begins <!DOCTYPE rootelement. This is similar to HTML documents that begin with <!DOCTYPE html, stating that the root element of the data is the html tag. Within the document type declaration, between the opening and closing square brackets, you can define the elements to be allowed in this XML data.

If you’d rather define the allowed elements in a separate file, your document type declaration would be

<!DOCTYPE rootelement SYSTEM "/path/to/filename.dtd">

where filename.dtd is the included file and /path/to is a Uniform Resource Indicator, or URI, pinpointing where that file is on the server.

There are two primary benefits to using an external DTD:

• The DTD information will only need to be transmitted once, regardless of how many XML documents that use it are transmitted (when using XML between networked computers).

• External documents are generally easier to edit.

Now that the XML file references the DTD, that file must be created. This process is called document modeling, because you are creating a paradigm for how your XML data should be organized.

Defining elements

A DTD defines every element and attribute that’s to be used in your markup language. The syntax for defining an element is

<!ELEMENT name TYPE>

where name is the name of the new tag and it will contain content of type TYPE.

Table 13.2 lists the four primary element types and their meanings.

Table 13.2. Element Types

image

If you apply these rules to the e-commerce XML (Script 13.1), you could define some of the elements like so:

<!ELEMENT name (#PCDATA)>
<!ELEMENT size (#PCDATA)>
<!ELEMENT price (#PCDATA)>
<!ELEMENT picture EMPTY>

The last element, picture, is of type EMPTY because it has no content (it has an attribute of filename).

The rules just defined seem to cover the XML in Script 13.1, but there are still a couple of missing pieces. First, there’s another element used, that of product, which contains all of the other elements. To define it:

<!ELEMENT product (name, size, price, picture)>

This states that product contains four other elements in the order of name, size, price, and picture. Definitions can be more flexible by using regular expression–like syntax:

<!ELEMENT product (name, size*, price, picture?)>

That line indicates that product can contain up to four elements. One element, size, can be listed anywhere from zero to multiple times. Another element, picture, is entirely optional, but if present, there can be only one. Table 13.3 lists the pertinent characters for defining elements.

Table 13.3. Element Type Symbols

image

You can extend this even further by dictating that an element contain other elements, parsed-character data, or nothing, using the OR character:

<!ELEMENT thing (other_element | #PCDATA | EMPTY)>

Defining attributes

The second problem with the current model for Script 13.1 is that it doesn’t reflect the picture element’s attribute (the filename). To allow elements to have attributes, make an attribute list within the DTD. This can be done only after defining the elements (or at least, the attributes of an element must be defined after the element itself has been defined).

<!ATTLIST element_name
attr_name attr_type attr_description
>

The attribute_name field is simply a text string like color or alignment. The attribute_type indicates the format of the attribute. Table 13.4 lists the possibilities.

Table 13.4. Element Attribute Types

image

Another possibility is for an attribute to be an enumerated list of possible values:

<!ATTLIST element_name
attr_name (value1 | value2) "value1"
>

The preceding code says that element_ name takes an attribute of attr_name with possible values of value1 or value2, the former being the default.

The third parameter for an attribute—the attribute’s description—allows you to further define how it will function. Possibilities include #REQUIRED, meaning that an element must use that attribute; #IMPLIED, which means that the attribute is optional; and #FIXED, indicating that the attribute will always have the same value. To round out the definition of the picture element for Script 13.1, an attribute should be added:

<!ATTLIST picture
filename NMTOKEN #REQUIRED
>

Now that you’ve seen the foundation of defining elements, you can write a Document Type Definition that corresponds to the books XML.

To write a Document Type Definition

1. Create a new document in your text editor or IDE, to be named collection.dtd (Script 13.4).

<!-- Script 13.4 - collection.dtd -->

2. Define the collection element:

<!ELEMENT collection (book+)>

The first element to be declared is the root element, collection. It consists only of one or more book elements.

Note that the separate DTD file does not begin with <!DOCTYPE, because that goes within the XML file itself.

Script 13.4. The DTD file will establish all the rules by which the book XML pages must abide.


1 <!-- Script 13.4 - collection.dtd -->
2
3 <!ELEMENT collection (book+)>
4
5 <!ELEMENT book (title, author+, year, pages?, chapter*, cover?)>
6
7 <!ELEMENT title (#PCDATA)>
8 <!ELEMENT author (#PCDATA)>
9 <!ELEMENT year (#PCDATA)>
10 <!ELEMENT pages (#PCDATA)>
11 <!ELEMENT chapter (#PCDATA)>
12 <!ELEMENT cover EMPTY>
13
14 <!ATTLIST title
15 edition NMTOKEN #IMPLIED
16 >
17
18 <!ATTLIST chapter
19 number NMTOKEN #IMPLIED
20 pages NMTOKEN #IMPLIED
21 >
22
23 <!ATTLIST cover
24 filename NMTOKEN #REQUIRED
25 >


3. Define the book element:

<!ELEMENT book (title, author+, year, pages?, chapter*, cover?)>

This tag will contain up to six other tags: title, author, and year, which are required; chapter, which is optional and can be listed numerous times; and pages and cover_image, both of which are optional but can occur only once. The author is also flagged as being allowed multiple times.

4. Define the title, author, year, pages, and chapter elements:

<!ELEMENT title (#PCDATA)>
<!ELEMENT author (#PCDATA)>
<!ELEMENT year (#PCDATA)>
<!ELEMENT pages (#PCDATA)>
<!ELEMENT chapter (#PCDATA)>

Each of these elements contains only character data. I’m specifically treating them each as parsed-character data, but some people prefer to use unparsed-character data (CDATA) instead.

5. Define the cover element:

<!ELEMENT cover EMPTY>

This one item is different from the others because the element will always be empty. The information for this element will be stored in the attribute.

6. Define the attributes for title and chapter:

<!ATTLIST title
edition NMTOKEN #IMPLIED
>
<!ATTLIST chapter
number NMTOKEN #IMPLIED
pages NMTOKEN #IMPLIED
>

The title element has one optional attribute, the edition. The chapter element has two attributes—number and pages—both of which are optional.

7. Define the attribute for cover:

<!ATTLIST cover
filename NMTOKEN #REQUIRED
>

The cover element will take one mandatory attribute, the filename of type NMTOKEN, which means it will be a string (e.g., image.jpg). Keep in mind that the element itself is not required, as defined in the book tag. So the XML file should either include cover with a filename attribute or not include it at all.

8. Save this file as collection.dtd.

Now that the document modeling is done, the DTD needs to be linked to the XML file.

9. Open books2.xml (Script 13.3) in your text editor or IDE.

10. After the prolog but before the root element, add the doctype declaration (Script 13.5):

<!DOCTYPE collection SYSTEM "collection.dtd">

As written, this does assume that the XML file and the DTD will be stored in the same directory.

11. Save the file with these new changes (I’ve also changed its name to books3.xml). Place this file and collection.dtd in your Web directory (in the same folder), and test in your Web browser, if you want.


Tip

One of the great things about XML is that you can write your own DTDs or make use of document models created by others, which are freely available online. Developers have already written models for books, recipes, and more.


Script 13.5. The books file now references the corresponding DTD (Script 13.4).


1 <?xml version="1.0" encoding="utf-8"?>
2 <!-- Script 13.5 - books3.xml -->
3 <!DOCTYPE collection SYSTEM "collection.dtd">
4 <collection>
5 <book>
6 <title edition="3">PHP Advanced and Object-Oriented Programming: Visual QuickPro Guide</title>
7 <author>Larry Ullman</author>
8 <year>2012</year>
9 <chapter number="12">PHP's Command-Line Interface</chapter>
10 <chapter number="13">XML and PHP</chapter>
11 <chapter number="14">Debugging, Testing, and Performance</chapter>
12 <cover filename="phpvqp3.jpg" />
13 </book>
14 <book>
15 <title>Modern JavaScript: Develop and Design</title>
16 <author>Larry Ullman</author>
17 <year>2012</year>
18 <pages>612</pages>
19 <chapter number="1" pages="24">(Re-)Introducing JavaScript</chapter>
20 <chapter number="2" pages="32">JavaScript in Action</chapter>
21 <chapter number="3" pages="34">Tools of the Trade</chapter>
22 </book>
23 <book>
24 <title>C++ Programming: Visual QuickStart Guide</title>
25 <author>Larry Ullman</author>
26 <author>Andreas Signer</author>
27 <year>2006</year>
28 <pages>500</pages>
29 <chapter number="12">Namespaces & Modularization</chapter>
30 </book>
31 </collection>


Using XML Schema

XML Schema is a more powerful and complex tool for defining what constitutes acceptable XML data. For example, whereas DTD is vague as to an element’s type—most come down to character data of some sort—XML Schema can require that an element contain an integer, a string, a decimal, a valid country code, and more.

XML Schema (note the capital “S” in Schema) was first formalized in 2001, although it is just one of many possible XML schema languages. Since that time, and to avoid confusion with general schema languages, the proper label is now XML Schema Document (XSD). Version 1.1 of XSD just became a W3C recommendation in 2012, although the differences between it and version 1.0 are insignificant to the scope of this chapter.

Incorporating XSD

Unlike DTD, XSD is written in XML. To add XSD to an XML file, use the schema element (again, after the XML prolog but before the data):

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<!-- Schema Information Here -->
</xs:schema>
<!-- Start XML Data -->

There’s a bit of extra stuff here in that schema is prefaced by xs. This is a namespace reference, with the opening schema tag indicating where the namespace definition comes from.

To use an external XSD document, you’ll want to first create the file, giving it an .xsd extension. To tie the XSD file to your XML, you have to reference that file in the XML document’s root tag. For the e-commerce example, which uses store as the root element, you would start with

<?xml version="1.0" encoding="utf-8"?>
<store xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:noNamespaceSchemaLocation="somefile.xsd">

The first attribute, xmlns, identifies the XML namespace for XML Schema instances (xsi). That code is used without modification, regardless of your server setup. Next, the xsi:noNamespaceSchemaLocation property gets assigned the path to the XSD file. You’ll need to make sure this is correct for each XML and XSD combination. Note that the XSD file, being XML, must start with its own prolog and use schema as the root element. You’ll see this in action momentarily.

But whether you place the schema definition inline, or in an external file, you’ll need to know how to define elements in XSD.

Defining elements

Elements are defined within XSD using the format

<xs:element name="some_name" type="some_type"/>

Each element must have a name. This is how you establish valid element names to use in the corresponding XML. Note that the xs identifies the namespace wherein the element definition can be found.

The types can be a predefined type, which include

xs:string

xs:integer

xs:boolean

xs:decimal

xs:date

The meanings of these types are obvious, and you can find the full list of possible types online. There are 19 standard types, and a couple dozen derived types, already defined for you.

With this in mind, the name element in the e-commerce example would be defined as

<xs:element name="name" type="xs:string"/>

Many elements will be of a user-defined type, such as in the books and e-commerce examples. In XSD, these types are represented as xs:simple or xs:complexType, and I’ll return to this topic momentarily.

You can customize the elements by using the attributes of the element tag. There is default, which declares the default value, and fixed, which indicates the only value the element can have. For example, if the edition element (for books) were to have a default value of 1, that would look like

<xs:element name="edition" type="xs:integer" default="1"/>

As with DTD, you can create limits for how many times an element can appear within its context. This is done using the minOccurs and maxOccurs attributes. These are inclusive:

<xs:element name="size" type="xs:string" minOccurs="1" maxOccurs="10"/>

That code says that the size element has to exist at least once but can be present up to 10 times. This definition would invalidate any XML that contained a product without at least one size.

An important aspect of XSD is that the minOccurs and maxOccurs attributes have default values of 1, meaning that if these attributes are not stated, the corresponding element will have to exist exactly one time.

To allow for any number of occurrences, you would use 0 as the minOccurs value and unbounded as the maxOccurs:

<xs:element name="picture" type="xs:string" minOccurs="0" maxOccurs="unbounded"/>

Simple and complex types

Earlier, I said that you could use xs:simpleType or xs:complexType to create user-defined types. A simple type is a variation on the other default types. For example, you could define your own simple type that is a number between 0 and 10.

Complex types are user-defined types composed of other types. The syntax is trickier:

<xs:element name="product">
<xs:complexType>
<xs:sequence>
<!-- Subtypes -->
</xs:sequence>
</xs:complexType>
</xs:element>

When defining a complex type, you must indicate how the type is composed. Here are the options:

Sequence, the child elements must appear in the given order

Choice, only one of the child elements can appear

All, any or all of the child elements can appear in any order

An alternative way to create a complex type is to define it outside of any individual element, give it a name attribute, and then use that new type where needed. This would make sense in situations with repeating custom data types, such as an address or person’s full name. You’ll see an example in the subsequent set of steps.

As you can tell, writing XSD is verbose and quickly becomes a challenge, but XSD is much more demanding than DTD for validating XML, which can be a good thing.

Creating attributes

Finally, attributes are declared for elements using xs:attribute, providing the name and type:

<xs:attribute name="price" type="xs:decimal" />

The attribute element has a use property, whose possible values are optional and required (optional is the default).

One last little hiccup is in situations where an element has both a value and one or more attributes, such as a title element that can have an edition attribute. To allow for both a value and one or more attributes, you have to use this syntax:

<xs:complexType name="titleType">
<xs:simpleContent>
<xs:extension base="xs:string">
<xs:attribute name="edition" type="xs:int" />
</xs:extension>
</xs:simpleContent>
</xs:complexType>

That code indicates that the base content for the titleType element is a string but that it also has an integer attribute for the edition.

Whew! Let’s now take all this information about XSD to write the proper XSD for the book collection XML file.

To write an XML Schema Definition

1. Create a new document in your text editor or IDE, to be named collection.xsd (Script 13.6):

<?xml version="1.0" encoding="utf-8"?>
<!-- Script 13.6 - collection.xsd -->
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">

Again, this is an XML document, so it must begin with the prolog. The root element of the page is schema, which associates the schema definition with the xs namespace.

2. Define the collection element:

<xs:element name="collection">
<xs:complexType>
<xs:sequence>
<xs:element name="book" type="bookType" maxOccurs="unbounded" />
</xs:sequence>
</xs:complexType>
</xs:element>

The first element to be declared is the root element, collection. It consists only of one or more book elements. The book element type will be of type bookType, to be defined next.

3. Define the book element:

<xs:complexType name="bookType">
<xs:sequence>
<xs:element name="title" type="titleType" />
<xs:element name="author" type="xs:string" minOccurs="1" maxOccurs="unbounded" />
<xs:element name="year" type="xs:int" />
<xs:element name="pages" type="xs:int" minOccurs="0" maxOccurs="1" />
<xs:element name="chapter" type="chapterType" minOccurs="0" maxOccurs="unbounded"/>
<xs:element name="cover" type="coverType" minOccurs="0" maxOccurs="1" />
</xs:sequence>
</xs:complexType>

The bookType of complex element is a bit more complicated. It can contain any or all of the following elements: title, author, year, pages, chapter, and cover. Of these, author, year, and pages are simple types. The chapter, title, and cover elements will be user-defined complex types.

The default is that exactly one of each element is required, which is fine for the title and year. The author is allowed any number from 1 to infinity. The chapter element can be present any number of times, including 0. And the pages and cover are both optional but can appear only once.

Script 13.6. This XSD file establishes specific rules by which the book XML data must abide.


1 <?xml version="1.0" encoding="utf-8"?>
2 <!-- Script 13.6 - collection.xsd -->
3 <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
4
5 <xs:element name="collection">
6 <xs:complexType>
7 <xs:sequence>
8 <xs:element name="book" type="bookType" maxOccurs="unbounded"/>
9 </xs:sequence>
10 </xs:complexType>
11 </xs:element>
12
13 <xs:complexType name="bookType">
14 <xs:sequence>
15 <xs:element name="title" type="titleType" />
16 <xs:element name="author" type="xs:string" minOccurs="1" maxOccurs="unbounded" />
17 <xs:element name="year" type="xs:int" />
18 <xs:element name="pages" type="xs:int" minOccurs="0" maxOccurs="1" />
19 <xs:element name="chapter" type="chapterType" minOccurs="0" maxOccurs="unbounded"/>
20 <xs:element name="cover" type="coverType" minOccurs="0" maxOccurs="1" />
21 </xs:sequence>
22 </xs:complexType>
23
24 <xs:complexType name="coverType">
25 <xs:attribute name="filename" type="xs:string" />
26 </xs:complexType>
27
28 <xs:complexType name="chapterType">
29 <xs:simpleContent>
30 <xs:extension base="xs:string">
31 <xs:attribute name="number" type="xs:int" />
32 <xs:attribute name="pages" type="xs:int" />
33 </xs:extension>
34 </xs:simpleContent>
35 </xs:complexType>
36
37 <xs:complexType name="titleType">
38 <xs:simpleContent>
39 <xs:extension base="xs:string">
40 <xs:attribute name="edition" type="xs:int" />
41 </xs:extension>
42 </xs:simpleContent>
43 </xs:complexType>
44
45 </xs:schema>


4. Define the coverType element:

<xs:complexType name="coverType">
<xs:attribute name="filename" type="xs:string" />
</xs:complexType>

The coverType element has only a filename attribute, which would have a string value.

5. Define the chapterType element:

<xs:complexType name="chapterType">
<xs:simpleContent>
<xs:extension base="xs:string">
<xs:attribute name="number" type="xs:int" />
<xs:attribute name="pages" type="xs:int" />
</xs:extension>
</xs:simpleContent>
</xs:complexType>

The chapterType element has both a value, which is a string (i.e., the chapter name), and two attributes: number and pages, both of which are integers.

6. Define the titleType element:

<xs:complexType name="titleType">
<xs:simpleContent>
<xs:extension base="xs:string">
<xs:attribute name="edition" type="xs:int" />
</xs:extension>
</xs:simpleContent>
</xs:complexType>

Similar to the chapterType element, titleType has both a value and an attribute.

7. Close the schema element:

</xs:schema>

8. Save this file as collection.xsd.

Now that the document modeling is done, the DTD needs to be linked to the XML file.

9. Open books2.xml (Script 13.3) in your text editor or IDE.

10. Change the root element to reference the XSD file (Script 13.7):

<collection xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:noNamespaceSchemaLocation="collection.xsd">

As written, this does assume that the XML file and the DTD will be stored in the same directory.

11. Save the file with these new changes (I’ve also changed its name to books4.xml).

12. If you want, use an online or command-line image validation tool to text the XML against the XSD.

image

image The command-line xmllint utility shows that the XML is valid, per the XSD definition.


Tip

XSD supports the mixed attribute on elements, which allows those elements to contain both subelements and data. This would allow you to, for example, use HTML within an XML element.


Script 13.7. The books file now references the corresponding XSD (Script 13.6).


1 <?xml version="1.0" encoding="utf-8"?>
2 <!-- Script 13.7- books4.xml -->
3 <collection xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
4 xsi:noNamespaceSchemaLocation="collection.xsd">
5 <book>
6 <title edition="3">PHP Advanced and Object-Oriented Programming: Visual QuickPro Guide</title>
7 <author>Larry Ullman</author>
8 <year>2012</year>
9 <chapter number="12">PHP's Command-Line Interface</chapter>
10 <chapter number="13">XML and PHP</chapter>
11 <chapter number="14">Debugging, Testing, and Performance</chapter>
12 <cover filename="phpvqp3.jpg" />
13 </book>
14 <book>
15 <title>Modern JavaScript: Develop and Design</title>
16 <author>Larry Ullman</author>
17 <year>2012</year>
18 <pages>612</pages>
19 <chapter number="1" pages="24">(Re-)Introducing JavaScript</chapter>
20 <chapter number="2" pages="32">JavaScript in Action</chapter>
21 <chapter number="3" pages="34">Tools of the Trade</chapter>
22 </book>
23 <book>
24 <title>C++ Programming: Visual QuickStart Guide</title>
25 <author>Larry Ullman</author>
26 <author>Andreas Signer</author>
27 <year>2006</year>
28 <pages>500</pages>
29 <chapter number="12">Namespaces & Modularization</chapter>
30 </book>
31 </collection>


Parsing XML

There’s more to XML than just composing XML documents, DTD files, and XML schemas in text editors, although those steps are the basis of XML. Once XML data exists, applications can then parse it. Parsing XML is a matter of using an application or library to read XML files and...

• Check if they are well formed

• Check if they are valid

• Make use of the stored data

A parser, in short, takes XML files and breaks them down into their various pieces. As an example, the code <artist>Air</artist> consists of the opening tag (<artist>), the content (Air), and the closing tag (</artist>). While this distinction is obvious to the human eye, the ability of a computer to pull meaning out of a string of characters is the power of XML.

There are two types of XML parsers: event-based and tree-based. The former goes into action when an event occurs. An example of an event would be encountering an opening tag in an XML file. By reading an entire file and doing things at each event, this type of parser—also called a SAX (Simple API for XML)—manages the entire XML document. Expat, to be demonstrated next, is an event-based parser.

The second parser type views an XML file and creates a tree-like representation of the entire thing that can then be manipulated image. These are primarily DOM (Document Object Model) systems. Later in the chapter, you’ll see how to use SimpleXML, which is a DOM parser.

image

image A DOM, or tree, representation of the store XML file (Script 13.1).

Parsing XML with Expat

Using Expat with PHP is a four-step process:

1. Create a new parser.

2. Identify the functions to use for handling events.

3. Parse the file.

4. Free up the resources used by the parser.

The first step is accomplished using xml_parser_create():

$p = xml_parser_create();

The second step is the most important. Because Expat is an event-based parser, it makes use of callback functions when events occur. The primary events that will occur when parsing XML consist of discovering

• An opening tag

• The content between tags

• A closing tag

You need to tell PHP what user-defined functions should be called when each of these events occurs. For the opening and closing tags, use the xml_set_element_ handler() function:

xml_set_element_handler($p, 'open_element_fx', 'close_element_fx');

For the element content, use xml_set_ character_data_handler() to name the callback function:

xml_set_character_data_handler($p, 'data_fx');

Now, when the parser encounters the different events, it will automatically send that content to the proper function.

Parsing the file requires the use of the xml_parse() function, which takes two arguments (and an optional third):

xml_parse($p, $data, $stop);

This function is first fed the pointer or reference to the parser, and then the information to be parsed. The third argument tells the parser when to stop working.

Finally, you should free up the resources used by the parser:

xml_parser_free($);

Just one use of PHP and XML is to turn XML documents into formatted HTML so that the information can be displayed in the browser. As an example, I’ll write a PHP script that uses Expat to make a legible Web page from an XML file.

To parse XML with PHP

1. Create a new document in your text editor or IDE, to be named expat.php, beginning with the standard HTML (Script 13.8):

<!doctype html>
<html lang="en">
<head>
<meta charset="utf-8">
<title>XML Expat Parser</title>
<link rel="stylesheet" href="style.css">
</head>
<body>
<?php # Script 14.7 - expat.php

This script is going to use a bit of CSS, placed in an external file. You can download that from LarryUllman.com.

2. Begin the function for handling opening tags:

function handle_open_element($p, $element, $attributes) {

This function will be called whenever an opening tag is encountered by the parser. This function will receive from the parser the parser reference, the name of the element, and an associative array of any attributes that element contains. As an example, the chapter element can have bothnumber and pages attributes. Upon encountering that tag, the parser will send this function the values $p (for the parser), chapter (the name of the element), and an array that could be defined like so:

$attributes = array ('NUMBER' => 1, 'PAGES' => '34');

Script 13.8. This script uses PHP in conjunction with the Expat library to parse XML documents, turning them into an HTML page.


1 <!doctype html>
2 <html lang="en">
3 <head>
4 <meta charset="utf-8">
5 <title>XML Expat Parser</title>
6 <link rel="stylesheet" href="style.css">
7 </head>
8 <body>
9 <?php # Script 13.8 - expat.php
10
11 /* This script will parse an XML file.
12 * It uses the Expat library, an event-based parser.
13 */
14
15 // Function for handling the open tag:
16 function handle_open_element($p, $element, $attributes) {
17
18 // Do different things based upon the element:
19 switch ($element) {
20
21 // Need to address: book, title, author, year, chapter, and pages!
22
23 case 'BOOK': // Books are DIV's:
24 echo '<div>';
25 break;
26
27 case 'CHAPTER': // Chapters are Ps:
28 echo "<p>Chapter {$attributes['NUMBER']}: ";
29 break;
30
31 case 'COVER': // Show the image...
32
33 // Get the image info:
34 $image = @getimagesize($attributes['FILENAME']);
35
36 // Make the image HTML:
37 echo "<img src=\"{$attributes['FILENAME']}\" $image[3] border=\"0\"><br>";
38 break;
39
40 case 'TITLE': // Titles are H2s:
41 echo '<h2>';
42 break;
43
44 // Everything else is just displayed:
45 case 'YEAR':
46 case 'AUTHOR':
47 case 'PAGES':
48 echo '<span class="label">' . $element . '</span>: ';
49 break;
50
51 } // End of switch.
52
53 } // End of handle_open_element() function.
54
55 // Function for handling the closing tag:
56 function handle_close_element($p, $element) {
57
58 // Do different things based upon the element:
59 switch ($element) {
60
61 // Close up HTML tags...
62
63 case 'BOOK': // Books are DIV's:
64 echo '</div>';
65 break;
66
67 case 'CHAPTER': // Chapters are Ps:
68 echo '</p>';
69 break;
70
71 case 'TITLE': // Titles are H2s:
72 echo '</h2>';
73 break;
74
75 // Add a break to the others:
76 case 'YEAR':
77 case 'AUTHOR':
78 case 'PAGES':
79 echo '<br>';
80 break;
81
82 } // End of switch.
83
84 } // End of handle_close_element() function.
85
86 // Function for printing the content:
87 function handle_character_data($p, $cdata) {
88 echo $cdata;
89 }
90
91 # ---------------------
92 # End of the functions.
93 # ---------------------
94
95 // Create the parser:
96 $p = xml_parser_create();
97
98 // Set the handling functions:
99 xml_set_element_handler($p, 'handle_open_element', 'handle_close_element');
100 xml_set_character_data_handler($p, 'handle_character_data');
101
102 // Read the file:
103 $file = 'books4.xml';
104 $fp = @fopen($file, 'r') or die("<p>Could not open a file called '$file'.</p></body></html>");
105 while ($data = fread($fp, 4096)) {
106 xml_parse($p, $data, feof($fp));
107 }
108
109 // Free up the parser:
110 xml_parser_free($p);
111 ?>
112 </body>
113 </html>


3. Begin a switch for handling the different elements:

switch ($element) {
case 'BOOK':
echo '<div>';
break;

Depending on the element received, the function will do different things. Each book will be wrapped within DIV tags, and so when the opening book element is encountered, the opening DIV element is created.

One thing to be aware of with Expat is that every element and attribute name is received in all-uppercase letters, thanks to something called case-folding. Thus, even though the XML data uses book, it must be BOOK here to match.

4. Add a case for chapter elements:

case 'CHAPTER':
echo "<p>Chapter {$attributes ['NUMBER']}: ";
break;

For the chapter element, I’ll want to create a paragraph that begins with the chapter’s number. That value is available through the $attributes array, again using all capital letters.

5. Add a case for cover elements:

case 'COVER':
$image = @getimagesize ($attributes['FILENAME']);
echo "<img src=\"{$attributes ['FILENAME']}\" $image[3] border=\"0\"><br>";
break;

If the element is the cover, I’ll place the image itself in the page in lieu of referring to the textual name of the element or its attributes.

6. Complete the switch and the function:

case 'TITLE':
echo '<h2>';
break;
case 'YEAR':
case 'AUTHOR':
case 'PAGES':
echo '<span class="label">' . $element . '</span>: ';
break;
} // End of switch.
} // End of handle_open_element() function.

The book title will be placed within H2 tags. The year, author, and pages are just written using the syntax

<span class="label">YEAR</span>: 2012

That code is begun here, with the value to follow.

7. Begin defining the function for handling any closing elements:

function handle_close_element($p, $element) {
switch ($element) {
case 'BOOK':
echo '</div>';
break;
case 'CHAPTER':
echo '</p>';
break;
case 'TITLE':
echo '</h2>';
break;

This function is more straightforward than its predecessor. All this does is close the proper HTML tag for each XML element.

8. Complete the function:

case 'YEAR':
case 'AUTHOR':
case 'PAGES':
echo '<br>';
break;
} // End of switch.
} // End of handle_close_element()
function.

For the year, author, and pages, a break is added. Nothing needs to be done for cover elements.

9. Add the final function:

function handle_character_data($p, $cdata) {
echo $cdata;
}

The handle_character_data() function will be used for the information between the opening and closing tags—in other words, the data. In this case, all this function has to do is print the received data. With different types of XML data, other steps may be required.

10. Create a new parser and identify the functions to use:

$p = xml_parser_create();
xml_set_element_handler($p, 'handle_open_element', 'handle_close_element');
xml_set_character_data_handler($p, 'handle_character_data');

11. Read and parse the XML file:

$file = 'books4.xml';
$fp = @fopen($file, 'r') or die("<p>Could not open a file called '$file'.</p></body></html>");
while ($data = fread($fp, 4096)) {
xml_parse($p, $data, feof($fp));
}

To parse the file, I first try to open it using fopen(). Then I loop through the file and send the retrieved data to the parser. The main loop stops once the entire file has been read, and the parser is told to stop once the end of the file has been reached.

12. Free up the parser and complete the page:

xml_parser_free($p);
?>
</body>
</html>

13. Save the file as expat.php, place it in your Web directory, along with books4.xml (Script 13.7), collection.dtd (Script 13.6), and the phpvqp3.jpg image file (downloadable from the book’s Web site, LarryUllman.com).

14. Test in your Web browser image.

image

image Running books4.xml through the PHP-Expat parser generates this HTML page, viewable in any browser.


Tip

Remember when working with XML to always use formal PHP tags (<?php and ?>). The informal PHP tags (<? and ?>) will conflict with XML tags.



Tip

For more on the Expat functions, see www.php.net/xml.



Tip

The Expat library can read an XML document, but it cannot validate one.



Tip

You can change the case-folding using xml_parser_set_option().


Using SimpleXML

Expat provides an acceptable way to process XML, but it’s not without its negatives. For example, attributes are only available when Expat encounters the opening tag. In the previous example, it would not be easy, for example, to show the number of pages after a chapter’s name or the edition after a book title. Fortunately, as with almost everything in programming, there is an alternative approach.

Added in PHP 5 is a great tool for working with XML documents, called SimpleXML. While not as elaborate as other DOM-based parsers, SimpleXML is terrifically easy to use, with several nice built-in features.

To start the process off, use the simplexml_load_file() function to load an XML file into an object:

$xml = simplexml_load_ file('filename.xml');

Alternatively, you could use simplexml_ load_string() if you had a bunch of XML stored in a string.

From there, there are many ways you could access the XML data. To refer to specific elements, use the format $xml->elementname. If there are multiple items of the same element, you could treat them like arrays:

echo $xml->elementname[0];

Looking at the DOM represented by the tree in image, you could use $xml->product[0] and $xml->product[1].

For nested elements, just continue this syntax:

echo $xml->product[0]->name;
echo $xml->product[1]->price;

Using a foreach loop, it’s easy to access every element in a tree:

foreach ($xml->product as $product) {
// Do something with:
// $product->name
// $product->size
// etc.
}

Attributes are easy to access as well, by referring to them like an array:

$xml->elementname['attribute'];

With just this bit of information in mind, let’s parse the books4.xml file using SimpleXML this time. The output will start by matching that created using Expat, but with a little more information and flair.


Modifying XML

The SimpleXML library also makes it easy to modify the loaded XML data. The addChild() and addAttribute() methods let you add new elements and attributes. You can also change the value in an element by using the assignment operator:

$xml->product->name = 'Heavy T-Shirt';


To use SimpleXML

1. Create a new document in your text editor or IDE, to be named simplexml.php, beginning with the standard HTML (Script 13.9):

<!doctype html>
<html lang="en">
<head>
<meta charset="utf-8">
<title>SimpleXML Parser</title>
<link rel="stylesheet" href="style.css">
</head>
<body>
<?php # Script 13.9 - simplexml.php

Script 13.9. The SimpleXML library provides an easy, DOM-based way to access all of the data in an XML file.


1 <!doctype html>
2 <html lang="en">
3 <head>
4 <meta charset="utf-8">
5 <title>SimpleXML Parser</title>
6 <link rel="stylesheet" href="style.css">
7 </head>
8 <body>
9 <?php # Script 13.9 - simplexml.php
10
11 /* This script will parse an XML file.
12 * It uses the simpleXML library, a DOM parser.
13 */
14
15 // Read the file:
16 $xml = simplexml_load_file('books4.xml');
17
18 // Iterate through each book:
19 foreach ($xml->book as $book) {
20
21 // Print the title:
22 echo "<div><h2>$book->title";
23
24 // Check for an edition:
25 if (isset($book->title['edition'])) {
26 echo " (Edition {$book->title['edition']})";
27 }
28
29 echo '</h2>';
30
31 // Print the author(s):
32 foreach ($book->author as $author) {
33 echo "<span class=\"label\">Author</span>: $author<br>";
34 }
35
36 // Print the other book info:
37 echo "<span class=\"label\">Published:</span> $book->year<br>";
38
39 if (isset($book->pages)) {
40 echo "<span class=\"label\">Pages:</span> $book->pages<br>";
41 }
42
43 // Print each chapter:
44 if (isset($book->chapter)) {
45 echo 'Table of Contents<ul>';
46 foreach ($book->chapter as $chapter) {
47
48 echo '<li>';
49
50 if (isset($chapter['number'])) {
51 echo "Chapter {$chapter['number']}: ";
52 }
53
54 echo $chapter;
55
56 if (isset($chapter['pages'])) {
57 echo " ({$chapter['pages']} Pages)";
58 }
59
60 echo '</li>';
61
62 }
63 echo '</ul>';
64 }
65
66 // Handle the cover:
67 if (isset($book->cover)) {
68
69 // Get the image info:
70 $image = @getimagesize ($book->cover['filename']);
71
72 // Make the image HTML:
73 echo "<img src=\"{$book->cover['filename']}\" $image[3] border=\"0\" /><br>";
74
75 }
76
77 // Close the book's DIV tag:
78 echo '</div>';
79
80 } // End of foreach loop.
81 ?>
82 </body>
83 </html>


2. Read the file:

$xml = simplexml_load_file ('books3.xml');

This one line is all you need to read in the entire XML document.

3. Create a loop that iterates through each book element:

foreach ($xml->book as $book) {

The XML file contains several book elements. With each iteration of this loop, another of the book elements will be assigned (as an object) to the $book variable. If the XML file is represented as a tree image, then $book at this point is one of the branches of the tree.

image

image The books XML file as a DOM tree.

4. Print the book’s title:

echo "<div><h2>$book->title";

Referring to a subelement is this easy. For the first iteration of the loop, this is the equivalent of directly referring to $xml->book[0]->title.

The title will be printed within H2 tags, which are started here, as is the DIV that surrounds each book.

5. Print the book’s edition, if applicable:

if (isset($book->title['edition'])) {
echo " (Edition {$book->title['edition']})";
}
echo '</h2>';

The isset() function can be used to test if an element or attribute exists, as if it were any other variable. If the edition attribute exists, it’ll be printed in parentheses: (Edition X). Then the title’s closing H2 tag is printed.

As you can see, unlike in the Expat example, it’s quite easy to access element attributes whenever you want.

6. Print the author(s):

foreach ($book->author as $author) {
echo "<span
class=\"label\">Author</span>: $author<br>";
}

Another foreach loop can iterate through all the authors. Remember that, by definition in the collection.xsd file, each book element has at least one author subelement, but it can also have multiple.

7. Print the year and the page count:

echo "<span class=\"label\">Published:</span> $book->year<br>";
if (isset($book->pages)) {
echo "<span class=\"label\">Pages:</span> $book->pages<br>";
}

Most of this output will exactly match that of the Expat example, with some little differences such as easily being able to use Pages instead of PAGES.

8. Begin the process of printing each chapter:

if (isset($book->chapter)) {
echo 'Table of Contents<ul>';
foreach ($book->chapter as $chapter) {

A book may or may not have any chapter elements in it but could have several. The isset() checks if any exist. If so, the chapters will be printed as an unordered list. Another foreach loop will access each chapter.

9. Print the chapter information:

echo '<li>';
if (isset($chapter['number'])) {
echo "Chapter {$chapter['number']}: ";
}
echo $chapter;
if (isset($chapter['pages'])) {
echo " ({$chapter['pages']} Pages)";
}
echo '</li>';

The chapter’s name will be printed within LI tags. If the chapter has a number or pages attribute, that information should be printed as well.

10. Complete the chapter’s foreach loop and conditional:

}
echo '</ul>';
}

11. Handle the book’s cover:

if (isset($book->cover)) {
$image = @getimagesize ($book->cover['filename']);
echo "<img src=\"{$book->cover['filename']}\" $image[3] border=\"0\" /><br>";
}

If a cover element exists, the image’s information is gathered from the file on the server and the appropriate HTML img tag is generated.

12. Close the DIV tag for this book and complete the page:

echo '</div>';
} // End of foreach loop.
?>
</body>
</html>

13. Save the file as simplexml.php, place it in your Web directory, along with books4.xml (Script 13.7), collection.xsd (Script 13.6), and the phpvqp3.jpg image file (downloadable from the book’s Web site, LarryUllman.com.

14. Test in your Web browser image.

image

image The beginning of the book’s output.


Tip

The asXML() method returns the loaded XML data as an XML string.



Tip

Because PHP treats elements and attributes as objects, you’ll need to cast them to strings if you want to use them for comparison or in any standard string functions.



Tip

SimpleXML also supports XPath, a language used to perform queries (search for data) within XML.



Tip

The DOM parsers, like SimpleXML, will require more memory on the server than SAX parsers because they load the entire XML data into a variable.


Creating an RSS Feed

RSS, which stands for Really Simple Syndication (it used to mean Rich Site Summary or RDF Site Summary), is a way for Web sites to provide listings of the site’s content. Normally, this list contains at least the titles of articles, plus their descriptions (and by “article,” think of any type of content that a site might offer). Users access these feeds using an RSS client (many Web browsers support RSS as well). If a user wants to read more of an article, there’s a link to click, which takes them to the full Web page. RSS is a great convenience and has become popular for good reasons.

RSS feeds are just XML files that have already-established tags. RSS documents begin with the rss root element, with a mandatory attribute called version. You’ll want to use the latest version of RSS for that value, which is 2.0 as of this writing. So an RSS document starts with

<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0">

After that, all RSS files contain a single channel element. Nested within this element are others, like title, description, and link, all of which describe the RSS feed.

<channel>
<title>Name of the RSS Feed</title>
<description>Description of the RSS Feed</description>
<link>Link to the Web site</link>

Those three elements are required within channel. There are many optional ones, too, like language (e.g., en-us), copyright, managingEditor (an email address), webMaster (also an email address), and so on. See the formal specifications at www.rssboard.org/rss-specification for more.

The channel also contains multiple item elements, each item being a piece of content (an article). The item elements also have title, description, link, and other nested elements.

<item>
<title>Article Title</title>
<description>Article Desc</description>
<link>Link to this article</link>
</item>

None of the item subelements are required, except that either a title or description must be present. You might also use author (an email address) and pubDate (the article’s publication date). This last one is tricky because its value must be in the RFC 822–specified format. If you don’t know what that is offhand, it’s Wed, 01 Nov 2012 16:45:23 GMT.

That’s really all there is to it! Remember, RSS is just formatted XML. If you understand XML, you can create RSS.

For this example, I’ll create a pseudo-RSS feed of some blog postings. Naturally, most blogging software is capable of creating its own RSS feed, but it’s a good use of the concept and will be easy enough for you to alter for your own needs.

To create an RSS feed

1. Begin a new document in your text editor or IDE, to be named rss.php (Script 13.10).

<?php # Script 13.10 - rss.php

This is not a Web page to be viewed in the browser, so it creates no HTML, just XML.

2. Send the Content-type header:

header('Content-type: text/xml');

This page will have a .php extension, because it’s a PHP page that must be properly handled by the Web server. But to create an XML page, a header should be sent with the proper Content-type.

Script 13.10. This PHP script uses an array to generate an RSS feed.


1 <?php # Script 13.10 - rss.php
2
3 /* This script will create an RSS feed.
4 * The feed content will be based upon an array.
5 */
6
7 // Send the Content-type header:
8 header('Content-type: text/xml');
9
10 // Create the initial RSS code:
11 echo '<?xml version="1.0" encoding="utf-8"?>
12 <rss version="2.0">
13 <channel>
14 <title>Larry Ullman's Important Things</title>
15 <description>The most recent things Larry has been writing about.</description>
16 <link>http://LarryUllman.com/</link>
17 ';
18
19 // Manufacture the data:
20 $data = array(
21 0 => array('title' => 'SSH Key Authentication', 'description' => 'The wonderful hosting company that I use', 'link' => 'http://www.larryullman.com/2012/05/25/ssh-key-authentication/', 'pubDate' => '1337930580'),
22 1 => array('title' => 'What It Means to Be a Writer, Part 1', 'description' => 'A little while back, I had a series of emails', 'link' => 'http://www.larryullman.com/2012/05/23/ what-it-means-to-be-a-writer-part-1-defining-your-book/', 'pubDate' => '1337683425'),
23 2 => array('title' => 'Learn to Write', 'description' => 'There was a recent posting by', 'link' => 'http://www.larryullman.com/2012/05/18/learn-to-write/', 'pubDate' => '133733103')
24 );
25
26 // Loop through the data:
27 foreach ($data as $item) {
28
29 // Print each record as an item:
30 echo '<item>
31 <title>' . htmlentities($item['title']) . '</title>
32 <description>' . htmlentities($item ['description']) . '...</description>
33 <link>' . $item['link'] . '</link>
34 <guid>' . $item['link'] . '</guid>
35 <pubDate>' . date('r', $item ['pubDate']) . '</pubDate>
36 </item>
37 ';
38
39 }
40
41 // Complete the channel and rss elements:
42 echo '</channel>
43 </rss>
44 ';


3. Create the initial RSS code:

echo '<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0">
<channel>
<title>Larry Ullman's Important Things</title>
<description>The most recent things Larry has been writing about.</description>
<link>http://LarryUllman.com/</link>
';

These lines of XML get the ball rolling. To start, there’s the XML prolog, required in all XML documents. Next is the rss element and the opening channel tag. Within the channel, three tags are used to help describe this feed image.

image

image Firefox, which supports RSS, shows the channel element’s title and description values at the top of the page.

4. Define the data to use for the feed:

$data = array(
0 => array('title' => 'SSH Key Authentication', 'description' => 'The wonderful hosting company that I use', 'link' => 'http://www.larryullman.com/2012/05/25/ssh-key-authentication/', 'pubDate' => '1337930580'),
1 => array('title' => 'What It Means to Be a Writer, Part 1', 'description' => 'A little while back, I had a series of emails', 'link' => 'http://www.larryullman.com/2012/05/23/what-it-means-to-be-a-writer-part-1-defining-your-book/', 'pubDate' => '1337683425'),
2 => array('title' => 'Learn to Write', 'description' => 'There was a recent posting by', 'link' => 'http://www.larryullman.com/2012/05/18/learn-to-write/', 'pubDate' => '133733103')
);

This data would normally be derived from a database, but this array is simple enough to use to demonstrate the core concepts.

5. Print each record as an item:

foreach ($data as $item) {
echo '<item>
<title>' . htmlentities($item['title']) . '</title>
<description>' . htmlentities ($item['description']) . '...</description>
<link>' . $item['link'] . '</link>
<guid>' . $item['link'] . '</guid>
<pubDate>' . date('r', $item['pubDate']) . '</pubDate>
</item>
';
}

This is the most important part of the whole script, where each item is generated. First, you have the opening item tag. Then, there’s the title, which is the subject of the posting and becomes the title of the article in the feed. After that is the description, which is what will be printed in the feed describing the article. For both the title and the description, the retrieved value is run through the htmlentities() function because XML does not allow many characters that might appear in those values.

Next is the link element, which is a link to the actual “article” online. After that is an element called a guid, which isn’t required but is a good idea. This is a unique identifier for each item. The URL, which will be unique for each item, can be used here as well.

Finally, there’s the pubDate, which needs to be in an exact format. Fortunately, PHP’s date() function has a shortcut for this: r. This makes the formatting a lot easier!

6. Complete the channel and rss elements:

echo '</channel>
</rss>
';

7. Save the file as rss.php, place it in your Web directory, and load it in an application that supports RSS feeds image.

image

image Viewing the RSS feed in Safari.

Not all browsers support RSS natively; look online for RSS reading applications you can use, too.


Tip

If you want to confirm that you’ve generated a valid RSS feed, check out http://feedvalidator.org.



Tip

Because of some perceived issues with RSS, an offshoot format called Atom was created. Meant to define a better standard for feeds, Atom is an open standard (unlike RSS, which is both closed and frozen from further development). Although Atom is worth considering, many of the largest Web sites still use RSS 2.0 for their feeds.


Review and Pursue

If you have any problems with these sections, either in answering the questions or pursuing your own endeavors, turn to the book’s supporting forum (www.LarryUllman.com/forums/).

Review

• What does XML stand for? What is it used for? (See pages 409 and 410.)

• What line does XML data begin with? (See page 412.)

• What are the rules for XML tags? Is XML case-sensitive or case-insensitive? (See pages 412 and 413.)

• What are the rules for attributes? (See page 415.)

• If an element does not require a closing tag, what can you do instead? (See page 415.)

• How do you associate a DTD with an XML document (there are two answers)? (See page 419.)

• How do you define elements and attributes using DTD? (See pages 420 through 422.)

• What are entities and why are they required in XML data? (See page 415.)

• How does XSD differ from DTD? (See pages 425 through 427.)

• How do you associate an XSD with an XML document (again, two answers)? (See page 425.)

• What are the two types of XML parsers demonstrated in this chapter? How do they differ? (See page 432.)

• What are RSS feeds? How are they created using PHP? (See page 447.)

Pursue

• If you’re unfamiliar with the HTML entities, research the topic online.

• Check online for existing DTDs that might be useful to you.

• If you want, rewrite books3.xml to use an inline DTD.

• If you think you’ll want to use XSD more, search online for further tutorials and information. You’ll also want to research the possible element types, beyond the basic ones.

• If you want, rewrite books4.xml to use an inline XSD.

• Check out the command-line xmllint (if your system has it) or an online XSD validation tool.

• If you’re interested in XML, learn about how you can use namespaces.

• Check out possible XML editing applications that will run on your computer.

• For an OOP alternative to Expat, look into PHP’s XMLReader class.

• Learn how to create XML in PHP using the XMLWriter class.

• If you want to learn more about creating RSS feeds, also research the Atom format.