Serialization - C# 5.0 in a Nutshell (2012)

C# 5.0 in a Nutshell (2012)

Chapter 17. Serialization

This chapter introduces serialization and deserialization, the mechanism by which objects can be represented in a flat text or binary form. Unless otherwise stated, the types in this chapter all exist in the following namespaces:

System.Runtime.Serialization

System.Xml.Serialization

Serialization Concepts

Serialization is the act of taking an in-memory object or object graph (set of objects that reference each other) and flattening it into a stream of bytes or XML nodes that can be stored or transmitted. Deserialization works in reverse, taking a data stream and reconstituting it into an in-memory object or object graph.

Serialization and deserialization are typically used to:

§ Transmit objects across a network or application boundary.

§ Store representations of objects within a file or database.

Another, less common use is to deep-clone objects. The data contract and XML serialization engines can also be used as general-purpose tools for loading and saving XML files of a known structure.

The .NET Framework supports serialization and deserialization both from the perspective of clients wanting to serialize and deserialize objects, and from the perspective of types wanting some control over how they are serialized.

Serialization Engines

There are four serialization mechanisms in the .NET Framework:

§ The data contract serializer

§ The binary serializer (except in the Metro profile)

§ The (attribute-based) XML serializer (XmlSerializer)

§ The IXmlSerializable interface

Of these, the first three are serialization “engines” that do most or all of the serialization work for you. The last is just a hook for doing the serialization yourself, using XmlReader and XmlWriter. IXmlSerializable can work in conjunction with the data contract serializer orXmlSerializer, to handle the more complicated XML serialization tasks.

Table 17-1 compares each of the engines.

Table 17-1. Serialization engine comparison

Feature

Data contract serializer

Binary serializer

XmlSerializer

IXmlSerializable

Level of automation

***

*****

****

*

Type coupling

Choice

Tight

Loose

Loose

Version tolerance

*****

***

*****

*****

Preserves object references

Choice

Yes

No

Choice

Can serialize nonpublic fields

Yes

Yes

No

Yes

Suitability for interoperable messaging

*****

**

****

****

Flexibility in reading/writing XML files

**

-

****

*****

Compact output

**

****

**

**

Performance

***

****

* to ***

***

The scores for IXmlSerializable assume you’ve (hand) coded optimally using XmlReader and XmlWriter. The XML serialization engine requires that you recycle the same XmlSerializer object for good performance.

Why three engines?

The reason for there being three engines is partly historical. The Framework started out with two distinct goals in serialization:

§ Serializing .NET object graphs with type and reference fidelity

§ Interoperating with XML and SOAP messaging standards

The first was led by the requirements of Remoting; the second, by Web Services. The job of writing one serialization engine to do both was too daunting, so Microsoft wrote two engines: the binary serializer and the XML serializer.

When Windows Communication Foundation (WCF) was later written, as part of Framework 3.0, part of the goal was to unify Remoting and Web Services. This required a new serialization engine—hence, the data contract serializer. The data contract serializer unifies the features of the older two engines relevant to (interoperable) messaging. Outside of this context, however, the two older engines are still important.

The data contract serializer

The data contract serializer is the newest and the most versatile of the three serialization engines and is used by WCF. The serializer is particularly strong in two scenarios:

§ When exchanging information through standards-compliant messaging protocols

§ When you need good version tolerance, plus the option of preserving object references

The data contract serializer supports a data contract model that helps you decouple the low-level details of the types you want to serialize from the structure of the serialized data. This provides excellent version tolerance, meaning you can deserialize data that was serialized from an earlier or later version of a type. You can even deserialize types that have been renamed or moved to a different assembly.

The data contract serializer can cope with most object graphs, although it can require more assistance than the binary serializer. It can also be used as a general-purpose tool for reading/writing XML files, if you’re flexible on how the XML is structured. (If you need to store data in attributes or cope with XML elements presenting in a random order, you cannot use the data contract serializer.)

The binary serializer

The binary serialization engine is easy to use, highly automatic, and well supported throughout the .NET Framework. Remoting uses binary serialization—including when communicating between two application domains in the same process (see Chapter 24).

The binary serializer is highly automated: quite often, a single attribute is all that’s required to make a complex type fully serializable. The binary serializer is also faster than the data contract serializer when full type fidelity is needed. However, it tightly couples a type’s internal structure to the format of the serialized data, resulting in poor version tolerance. (Prior to Framework 2.0, even adding a simple field was a version-breaking change.) The binary engine is also not really designed to produce XML, although it offers a formatter for SOAP-based messaging that provides limited interoperability with simple types.

XmlSerializer

The XML serialization engine can only produce XML, and it is less powerful than other engines in saving and restoring a complex object graph (it cannot restore shared object references). It’s the most flexible of the three, however, in following an arbitrary XML structure. For instance, you can choose whether properties are serialized to elements or attributes and the handling of a collection’s outer element. The XML engine also provides excellent version tolerance.

XmlSerializer is used by ASMX Web Services.

IXmlSerializable

Implementing IXmlSerializable means to do the serialization yourself with an XmlReader and XmlWriter. The IXmlSerializable interface is recognized both by XmlSerializer and by the data contract serializer, so it can be used selectively to handle the more complicated types. (It also can be used directly by WCF and ASMX Web Services.) We describe XmlReader and XmlWriter in detail in Chapter 11.

Formatters

The output of the data contract and binary serializers is shaped by a pluggable formatter. The role of a formatter is the same with both serialization engines, although they use completely different classes to do the job.

A formatter shapes the final presentation to suit a particular medium or context of serialization. In general, you can choose between XML and binary formatters. An XML formatter is designed to work within the context of an XML reader/writer, text file/stream, or SOAP messaging packet. A binary formatter is designed to work in a context where an arbitrary stream of bytes will do—typically a file/stream or proprietary messaging packet. Binary output is usually smaller than XML—sometimes radically so.

NOTE

The term “binary” in the context of a formatter is unrelated to the “binary” serialization engine. Each of the two engines ships with both XML and binary formatters!

In theory, the engines are decoupled from their formatters. In practice, the design of each engine is geared toward one kind of formatter. The data contract serializer is geared toward the interoperability requirements of XML messaging. This is good for the XML formatter but means its binary formatter doesn’t always achieve the gains you might hope. In contrast, the binary engine provides a relatively good binary formatter, but its XML formatter is highly limited, offering only crude SOAP interoperability.

Explicit Versus Implicit Serialization

Serialization and deserialization can be initiated in two ways.

The first is explicitly, by requesting that a particular object be serialized or deserialized. When you serialize or deserialize explicitly, you choose both the serialization engine and the formatter.

In contrast, implicit serialization is initiated by the Framework. This happens when:

§ A serializer recursively serializes a child object.

§ You use a feature that relies on serialization, such as WCF, Remoting, or Web Services.

WCF always uses the data contract serializer, although it can interoperate with the attributes and interfaces of the other engines.

Remoting always uses the binary serialization engine.

Web Services always uses XmlSerializer.

The Data Contract Serializer

Here are the basic steps in using the data contract serializer:

1. Decide whether to use the DataContractSerializer or the NetDataContractSerializer.

2. Decorate the types and members you want to serialize with [DataContract] and [DataMember] attributes, respectively.

3. Instantiate the serializer and call WriteObject or ReadObject.

If you chose the DataContractSerializer, you will also need to register “known types” (subtypes that can also be serialized), and decide whether to preserve object references.

You may also need to take special action to ensure that collections are properly serialized.

NOTE

Types for the data contract serializer are defined in the System.Runtime.Serialization namespace, in an assembly of the same name.

DataContractSerializer Versus NetDataContractSerializer

There are two data contract serializers:

DataContractSerializer

Loosely couples .NET types to data contract types

NetDataContractSerializer

Tightly couples .NET types to data contract types

The DataContractSerializer can produce interoperable standards-compliant XML such as this:

<Person xmlns="...">

...

</Person>

It requires, however, that you explicitly register serializable subtypes in advance so that it can map a data contract name such as “Person” to the correct .NET type. The NetDataContractSerializer requires no such assistance, because it writes the full type and assembly names of the types it serializes, rather like the binary serialization engine:

<Person z:Type="SerialTest.Person" z:Assembly=

"SerialTest, Version=1.0.0.0, Culture=neutral, PublicKeyToken=null">

...

</Person>

Such output, however, is proprietary. It also relies on the presence of a specific.NET type in a specific namespace and assembly in order to deserialize.

If you’re saving an object graph to a “black box,” you can choose either serializer, depending on what benefits are more important to you. If you’re communicating through WCF, or reading/writing an XML file, you’ll most likely want the DataContractSerializer.

Another difference between the two serializers is that NetDataContractSerializer always preserves referential equality; DataContractSerializer does so only upon request.

We’ll go into each of these topics in more detail in the following sections.

Using the Serializers

After choosing a serializer, the next step is to attach attributes to the types and members you want to serialize. At a minimum:

§ Add the [DataContract] attribute to each type.

§ Add the [DataMember] attribute to each member that you want to include.

Here’s an example:

namespace SerialTest

{

[DataContract] public class Person

{

[DataMember] public string Name;

[DataMember] public int Age;

}

}

These attributes are enough to make a type implicitly serializable through the data contract engine.

You can then explicitly serialize or deserialize an object instance by instantiating a DataContractSerializer or NetDataContractSerializer and calling WriteObject or ReadObject:

Person p = new Person { Name = "Stacey", Age = 30 };

var ds = new DataContractSerializer (typeof (Person));

using (Stream s = File.Create ("person.xml"))

ds.WriteObject (s, p); // Serialize

Person p2;

using (Stream s = File.OpenRead ("person.xml"))

p2 = (Person) ds.ReadObject (s); // Deserialize

Console.WriteLine (p2.Name + " " + p2.Age); // Stacey 30

DataContractSerializer’s constructor requires the root object type (the type of the object you’re explicitly serializing). In contrast, NetDataContractSerializer does not:

var ns = new NetDataContractSerializer();

// NetDataContractSerializer is otherwise the same to use

// as DataContractSerializer.

...

Both types of serializer use the XML formatter by default. With an XmlWriter, you can request that the output be indented for readability:

Person p = new Person { Name = "Stacey", Age = 30 };

var ds = new DataContractSerializer (typeof (Person));

XmlWriterSettings settings = new XmlWriterSettings() { Indent = true };

using (XmlWriter w = XmlWriter.Create ("person.xml", settings))

ds.WriteObject (w, p);

System.Diagnostics.Process.Start ("person.xml");

Here’s the result:

<Person xmlns="http://schemas.datacontract.org/2004/07/SerialTest"

xmlns:i="http://www.w3.org/2001/XMLSchema-instance">

<Age>30</Age>

<Name>Stacey</Name>

</Person>

The XML element name <Person> reflects the data contract name, which, by default, is the .NET type name. You can override this and explicitly state a data contract name as follows:

[DataContract (Name="Candidate")]

public class Person { ... }

The XML namespace reflects the data contract namespace, which, by default, is http://schemas.datacontract.org/2004/07/, plus the .NET type namespace. You can override this in a similar fashion:

[DataContract (Namespace="http://oreilly.com/nutshell")]

public class Person { ... }

NOTE

Specifying a name and namespace decouples the contract identity from the .NET type name. It ensures that, should you later refactor and change the type’s name or namespace, serialization is unaffected.

You can also override names for data members:

[DataContract (Name="Candidate", Namespace="http://oreilly.com/nutshell")]

public class Person

{

[DataMember (Name="FirstName")] public string Name;

[DataMember (Name="ClaimedAge")] public int Age;

}

Here’s the output:

<?xml version="1.0" encoding="utf-8"?>

<Candidate xmlns="http://oreilly.com/nutshell"

xmlns:i="http://www.w3.org/2001/XMLSchema-instance" >

<ClaimedAge>30</ClaimedAge>

<FirstName>Stacey</FirstName>

</Candidate>

[DataMember] supports both fields and properties—public and private. The field or property’s data type can be any of the following:

§ Any primitive type

§ DateTime, TimeSpan, Guid, Uri, or an Enum value

§ Nullable versions of the above

§ byte[] (serializes in XML to base 64)

§ Any “known” type decorated with DataContract

§ Any IEnumerable type (see the section Serializing Collections later in this chapter)

§ Any type with the [Serializable] attribute or implementing ISerializable (see the section Extending Data Contracts later in this chapter)

§ Any type implementing IXmlSerializable

Specifying a binary formatter

You can use a binary formatter with DataContractSerializer or NetDataContractSerializer. The process is the same:

Person p = new Person { Name = "Stacey", Age = 30 };

var ds = new DataContractSerializer (typeof (Person));

var s = new MemoryStream();

using (XmlDictionaryWriter w = XmlDictionaryWriter.CreateBinaryWriter (s))

ds.WriteObject (w, p);

var s2 = new MemoryStream (s.ToArray());

Person p2;

using (XmlDictionaryReader r = XmlDictionaryReader.CreateBinaryReader (s2,

XmlDictionaryReaderQuotas.Max))

p2 = (Person) ds.ReadObject (r);

The output varies between being slightly smaller than that of the XML formatter, and radically smaller if your types contain large arrays.

Serializing Subclasses

You don’t need to do anything special to handle the serializing of subclasses with the NetDataContractSerializer. The only requirement is that subclasses have the DataContract attribute. The serializer will write the fully qualified names of the actual types that it serializes as follows:

<Person ... z:Type="SerialTest.Person" z:Assembly=

"SerialTest, Version=1.0.0.0, Culture=neutral, PublicKeyToken=null">

A DataContractSerializer, however, must be informed about all subtypes that it may have to serialize or deserialize. To illustrate, suppose we subclass Person as follows:

[DataContract] public class Person

{

[DataMember] public string Name;

[DataMember] public int Age;

}

[DataContract] public class Student : Person { }

[DataContract] public class Teacher : Person { }

and then write a method to clone a Person:

static Person DeepClone (Person p)

{

var ds = new DataContractSerializer (typeof (Person));

MemoryStream stream = new MemoryStream();

ds.WriteObject (stream, p);

stream.Position = 0;

return (Person) ds.ReadObject (stream);

}

which we call as follows:

Person person = new Person { Name = "Stacey", Age = 30 };

Student student = new Student { Name = "Stacey", Age = 30 };

Teacher teacher = new Teacher { Name = "Stacey", Age = 30 };

Person p2 = DeepClone (person); // OK

Student s2 = (Student) DeepClone (student); // SerializationException

Teacher t2 = (Teacher) DeepClone (teacher); // SerializationException

DeepClone works if called with a Person but throws an exception with a Student or Teacher, because the deserializer has no way of knowing what .NET type (or assembly) a “Student” or “Teacher” should resolve to. This also helps with security, in that it prevents the deserialization of unexpected types.

The solution is to specify all permitted or “known” subtypes. You can do this either when constructing the DataContractSerializer:

var ds = new DataContractSerializer (typeof (Person),

new Type[] { typeof (Student), typeof (Teacher) } );

or in the type itself, with the KnownType attribute:

[DataContract, KnownType (typeof (Student)), KnownType (typeof (Teacher))]

public class Person

...

Here’s what a serialized Student now looks like:

<Person xmlns="..."

xmlns:i="http://www.w3.org/2001/XMLSchema-instance"

i:type="Student" >

...

<Person>

Because we specified Person as the root type, the root element still has that name. The actual subclass is described separately—in the type attribute.

NOTE

The NetDataContractSerializer suffers a performance hit when serializing subtypes—with either formatter. It seems that when it encounters a subtype, it has to stop and think for a while!

Serialization performance matters on an application server that’s handling many concurrent requests.

Object References

References to other objects are serialized, too. Consider the following classes:

[DataContract] public class Person

{

[DataMember] public string Name;

[DataMember] public int Age;

[DataMember] public Address HomeAddress;

}

[DataContract] public class Address

{

[DataMember] public string Street, Postcode;

}

Here’s the result of serializing this to XML using the DataContractSerializer:

<Person...>

<Age>...</Age>

<HomeAddress>

<Street>...</Street>

<Postcode>...</Postcode>

</HomeAddress>

<Name>...</Name>

</Person>

NOTE

The DeepClone method we wrote in the preceding section would clone HomeAddress, too—distinguishing it from a simple MemberwiseClone.

If you’re using a DataContractSerializer, the same rules apply when subclassing Address as when subclassing the root type. So, if we define a USAddress class, for instance:

[DataContract]

public class USAddress : Address { }

and assign an instance of it to a Person:

Person p = new Person { Name = "John", Age = 30 };

p.HomeAddress = new USAddress { Street="Fawcett St", Postcode="02138" };

p could not be serialized. The solution is either to apply the KnownType attribute to Address:

[DataContract, KnownType (typeof (USAddress))]

public class Address

{

[DataMember] public string Street, Postcode;

}

or to tell DataContractSerializer about USAddress in construction:

var ds = new DataContractSerializer (typeof (Person),

new Type[] { typeof (USAddress) } );

(We don’t need to tell it about Address because it’s the declared type of the HomeAddress data member.)

Preserving object references

The NetDataContractSerializer always preserves referential equality. The DataContractSerializer does not, unless you specifically ask it to.

This means that if the same object is referenced in two different places, a DataContractSerializer ordinarily writes it twice. So, if we modify the preceding example so that Person also stores a work address:

[DataContract] public class Person

{

...

[DataMember] public Address HomeAddress, WorkAddress;

}

and then serialize an instance as follows:

Person p = new Person { Name = "Stacey", Age = 30 };

p.HomeAddress = new Address { Street = "Odo St", Postcode = "6020" };

p.WorkAddress = p.HomeAddress;

we would see the same address details twice in the XML:

...

<HomeAddress>

<Postcode>6020</Postcode>

<Street>Odo St</Street>

</HomeAddress>

...

<WorkAddress>

<Postcode>6020</Postcode>

<Street>Odo St</Street>

</WorkAddress>

When this was later deserialized, WorkAddress and HomeAddress would be different objects. The advantage of this system is that it keeps the XML simple and standards-compliant. The disadvantages of this system include larger XML, loss of referential integrity, and the inability to cope with cyclical references.

You can request referential integrity by specifying true for preserveObjectReferences when constructing a DataContractSerializer:

var ds = new DataContractSerializer (typeof (Person),

null, 1000, false, true, null);

The third argument is mandatory when preserveObjectReferences is true: it indicates the maximum number of object references that the serializer should keep track of. The serializer throws an exception if this number is exceeded (this prevents a denial of service attack through a maliciously constructed stream).

Here’s what the XML then looks like for a Person with the same home and work addresses:

<Person xmlns="http://schemas.datacontract.org/2004/07/SerialTest"

xmlns:i="http://www.w3.org/2001/XMLSchema-instance"

xmlns:z="http://schemas.microsoft.com/2003/10/Serialization/"

z:Id="1">

<Age>30</Age>

<HomeAddress z:Id="2">

<Postcode z:Id="3">6020</Postcode>

<Street z:Id="4">Odo St</Street>

</HomeAddress>

<Name z:Id="5">Stacey</Name>

<WorkAddress z:Ref="2" i:nil="true" />

</Person>

The cost of this is in reduced interoperability (notice the proprietary namespace of the Id and Ref attributes).

Version Tolerance

You can add and remove data members without breaking forward or backward compatibility. By default, the data contract deserializers do the following:

§ Skip over data for which there is no [DataMember] in the type.

§ Don’t complain if any [DataMember] is missing in the serialization stream.

Rather than skipping over unrecognized data, you can instruct the deserializer to store unrecognized data members in a black box, and then replay them should the type later be reserialized. This allows you to correctly round-trip data that’s been serialized by a later version of your type. To activate this feature, implement IExtensibleDataObject. This interface really means “IBlackBoxProvider.” It requires that you implement a single property, to get/set the black box:

[DataContract] public class Person : IExtensibleDataObject{

[DataMember] public string Name;

[DataMember] public int Age;

ExtensionDataObject IExtensibleDataObject.ExtensionData { get; set; }

}

Required members

If a member is essential for a type, you can demand that it be present with IsRequired:

[DataMember (IsRequired=true)] public int ID;

If that member is not present, an exception is then thrown upon deserialization.

Member Ordering

The data contract serializers are extremely fussy about the ordering of data members. The deserializers, in fact, skip over any members considered out of sequence.

Members are written in the following order when serializing:

1. Base class to subclass

2. Low Order to high Order (for data members whose Order is set)

3. Alphabetical order (using ordinal string comparison)

So, in the preceding examples, Age comes before Name. In the following example, Name comes before Age:

[DataContract] public class Person

{

[DataMember (Order=0)] public string Name;

[DataMember (Order=1)] public int Age;

}

If Person has a base class, the base class’s data members would all serialize first.

The main reason to specify an order is to comply with a particular XML schema. XML element order equates to data member order.

If you don’t need to interoperate with anything else, the easiest approach is not to specify a member Order and rely purely on alphabetical ordering. A discrepancy will then never arise between serialization and deserialization as members are added and removed. The only time you’ll come unstuck is if you move a member between a base class and a subclass.

Null and Empty Values

There are two ways to deal with a data member whose value is null or empty:

1. Explicitly write the null or empty value (the default).

2. Omit the data member from the serialization output.

In XML, an explicit null value looks like this:

<Person xmlns="..."

xmlns:i="http://www.w3.org/2001/XMLSchema-instance">

<Name i:nil="true" />

</Person>

Writing null or empty members can waste space, particularly on a type with lots of fields or properties that are usually left empty. More importantly, you may need to follow an XML schema that expects the use of optional elements (e.g., minOccurs="0") rather than nil values.

You can instruct the serializer not to emit data members for null/empty values as follows:

[DataContract] public class Person

{

[DataMember (EmitDefaultValue=false)] public string Name;

[DataMember (EmitDefaultValue=false)] public int Age;

}

Name is omitted if its value is null; Age is omitted if its value is 0 (the default value for the int type). If we were to make Age a nullable int, then it would be omitted if (and only if) its value was null.

NOTE

The data contract deserializer, in rehydrating an object, bypasses the type’s constructors and field initializers. This allows you to omit data members as described without breaking fields that are assigned nondefault values through an initializer or constructor. To illustrate, suppose we set the default Age for a Person to 30 as follows:

[DataMember (EmitDefaultValue=false)]

public int Age = 30;

Now suppose that we instantiate Person, explicitly set its Age from 30 to 0, and then serialize it. The output won’t include Age, because 0 is the default value for the int type. This means that in deserialization, Age will be ignored and the field will remain at its default value—which fortunately is 0, given that field initializers and constructors were bypassed.

Data Contracts and Collections

The data contract serializers can save and repopulate any enumerable collection. For instance, suppose we define Person to have a List<> of addresses:

[DataContract] public class Person

{

...

[DataMember] public List<Address> Addresses;

}

[DataContract] public class Address

{

[DataMember] public string Street, Postcode;

}

Here’s the result of serializing a Person with two addresses:

<Person ...>

...

<Addresses>

<Address>

<Postcode>6020</Postcode>

<Street>Odo St</Street>

</Address>

<Address>

<Postcode>6152</Postcode>

<Street>Comer St</Street>

</Address>

</Addresses>

...

</Person>

Notice that the serializer doesn’t encode any information about the particular type of collection it serialized. If the Addresses field was instead of type Address[], the output would be identical. This allows the collection type to change between serialization and deserialization without causing an error.

Sometimes, though, you need your collection to be of a more specific type than you expose. An extreme example is with interfaces:

[DataMember] public IList<Address> Addresses;

This serializes correctly (as before), but a problem arises in deserialization. There’s no way the deserializer can know which concrete type to instantiate, so it chooses the simplest option—an array. The deserializer sticks to this strategy even if you initialize the field with a different concrete type:

[DataMember] public IList<Address> Addresses = new List<Address>();

(Remember that the deserializer bypasses field initializers.) The workaround is to make the data member a private field and add a public property to access it:

[DataMember (Name="Addresses")] List<Address> _addresses;

public IList<Address> Addresses { get { return _addresses; } }

In a nontrivial application, you would probably use properties in this manner anyway. The only unusual thing here is that we’ve marked the private field as the data member, rather than the public property.

Subclassed Collection Elements

The serializer handles subclassed collection elements transparently. You must declare the valid subtypes just as you would if they were used anywhere else:

[DataContract, KnownType (typeof (USAddress))]

public class Address

{

[DataMember] public string Street, Postcode;

}

public class USAddress : Address { }

Adding a USAddress to a Person’s address list then generates XML like this:

...

<Addresses>

<Address i:type="USAddress">

<Postcode>02138</Postcode>

<Street>Fawcett St</Street>

</Address>

</Addresses>

Customizing Collection and Element Names

If you subclass a collection class itself, you can customize the XML name used to describe each element by attaching a CollectionDataContract attribute:

[CollectionDataContract (ItemName="Residence")]

public class AddressList : Collection<Address> { }

[DataContract] public class Person

{

...

[DataMember] public AddressList Addresses;

}

Here’s the result:

...

<Addresses>

<Residence>

<Postcode>6020</Postcode

<Street>Odo St</Street>

</Residence>

...

CollectionDataContract also lets you specify a Namespace and Name. The latter is not used when the collection is serialized as a property of another object (such as in this example), but it is when the collection is serialized as the root object.

You can also use CollectionDataContract to control the serialization of dictionaries:

[CollectionDataContract (ItemName="Entry",

KeyName="Kind",

ValueName="Number")]

public class PhoneNumberList : Dictionary <string, string> { }

[DataContract] public class Person

{

...

[DataMember] public PhoneNumberList PhoneNumbers;

}

Here’s how this formats:

...

<PhoneNumbers>

<Entry>

<Kind>Home</Kind>

<Number>08 1234 5678</Number>

</Entry>

<Entry>

<Kind>Mobile</Kind>

<Number>040 8765 4321</Number>

</Entry>

</PhoneNumbers>

Extending Data Contracts

This section describes how you can extend the capabilities of the data contract serializer through serialization hooks, [Serializable] and IXmlSerializable.

Serialization and Deserialization Hooks

You can request that a custom method be executed before or after serialization, by flagging the method with one of the following attributes:

[OnSerializing]

Indicates a method to be called just before serialization

[OnSerialized]

Indicates a method to be called just after serialization

Similar attributes are supported for deserialization:

[OnDeserializing]

Indicates a method to be called just before deserialization

[OnDeserialized]

Indicates a method to be called just after deserialization

The custom method must have a single parameter of type StreamingContext. This parameter is required for consistency with the binary engine, and it is not used by the data contract serializer.

[OnSerializing] and [OnDeserialized] are useful in handling members that are outside the capabilities of the data contract engine, such as a collection that has an extra payload or that does not implement standard interfaces. Here’s the basic approach:

[DataContract] public class Person

{

public SerializationUnfriendlyType Addresses;

[DataMember (Name="Addresses")]

SerializationFriendlyType _serializationFriendlyAddresses;

[OnSerializing]

void PrepareForSerialization (StreamingContext sc)

{

// Copy Addresses-> _serializationFriendlyAddresses

// ...

}

[OnDeserialized]

void CompleteDeserialization (StreamingContext sc)

{

// Copy _serializationFriendlyAddresses-> Addresses

// ...

}

}

An [OnSerializing] method can also be used to conditionally serialize fields:

public DateTime DateOfBirth;

[DataMember] public bool Confidential;

[DataMember (Name="DateOfBirth", EmitDefaultValue=false)]

DateTime? _tempDateOfBirth;

[OnSerializing]

void PrepareForSerialization (StreamingContext sc)

{

if (Confidential)

_tempDateOfBirth = DateOfBirth;

else

_tempDateOfBirth = null;

}

Recall that the data contract deserializers bypass field initializers and constructors. An [OnDeserializing] method acts as a pseudoconstructor for deserialization, and it is useful for initializing fields excluded from serialization:

[DataContract] public class Test

{

bool _editable = true;

public Test() { _editable = true; }

[OnDeserializing]

void Init (StreamingContext sc)

{

_editable = true;

}

}

If it wasn’t for the Init method, _editable would be false in a deserialized instance of Test—despite the other two attempts at making it true.

Methods decorated with these four attributes can be private. If subtypes need to participate, they can define their own methods with the same attributes, and they will get executed, too.

Interoperating with [Serializable]

The data contract serializer can also serialize types marked with the binary serialization engine’s attributes and interfaces. This ability is important, since support for the binary engine has been woven into much of what was written prior to Framework 3.0—including the .NET Framework itself!

NOTE

The following things flag a type as being serializable for the binary engine:

§ The [Serializable] attribute

§ Implementing ISerializable

Binary interoperability is useful in serializing existing types as well as new types that need to support both engines. It also provides another means of extending the capability of the data contract serializer, because the binary engine’s ISerializable is more flexible than the data contract attributes. Unfortunately, the data contract serializer is inefficient in how it formats data added via ISerializable.

A type wanting the best of both worlds cannot define attributes for both engines. This creates a problem for types such as string and DateTime, which for historical reasons cannot divorce the binary engine attributes. The data contract serializer works around this by filtering out these basic types and processing them specially. For all other types marked for binary serialization, the data contract serializer applies similar rules to what the binary engine would use. This means it honors attributes such as NonSerialized or calls ISerializable if implemented. It does notthunk to the binary engine itself—this ensures that output is formatted in the same style as if data contract attributes were used.

WARNING

Types designed to be serialized with the binary engine expect object references to be preserved. You can enable this option through the DataContractSerializer (or by using the NetDataContractSerializer).

The rules for registering known types also apply to objects and subobjects serialized through the binary interfaces.

The following example illustrates a class with a [Serializable] data member:

[DataContract] public class Person

{

...

[DataMember] public Address MailingAddress;

}

[Serializable] public class Address

{

public string Postcode, Street;

}

Here’s the result of serializing it:

<Person ...>

...

<MailingAddress>

<Postcode>6020</Postcode>

<Street>Odo St</Street>

</MailingAddress>

...

Had Address implemented ISerializable, the result would be less efficiently formatted:

<MailingAddress>

<Street xmlns:d3p1="http://www.w3.org/2001/XMLSchema"

i:type="d3p1:string" xmlns="">str</Street>

<Postcode xmlns:d3p1="http://www.w3.org/2001/XMLSchema"

i:type="d3p1:string" xmlns="">pcode</Postcode>

</MailingAddress>

Interoperating with IXmlSerializable

A limitation of the data contract serializer is that it gives you little control over the structure of the XML. In a WCF application this can actually be beneficial, in that it makes it easier for the infrastructure to comply with standard messaging protocols.

If you do need precise control over the XML, you can implement IXmlSerializable and then use XmlReader and XmlWriter to manually read and write the XML. The data contract serializer allows you to do this just on the types for which this level of control is required. We describe the IXmlSerializable interface further in the final section of this chapter.

The Binary Serializer

The binary serialization engine is used implicitly by Remoting. It can also be used to perform such tasks as saving and restoring objects to disk. The binary serialization is highly automated and can handle complex object graphs with minimum intervention. It’s not available, however, in the Metro profile.

There are two ways to make a type support binary serialization. The first is attribute-based; the second involves implementing ISerializable. Adding attributes is simpler; implementing ISerializable is more flexible. You typically implement ISerializable to:

§ Dynamically control what gets serialized.

§ Make your serializable type friendly to being subclassed by other parties.

Getting Started

A type can be made serializable with a single attribute:

[Serializable] public sealed class Person

{

public string Name;

public int Age;

}

The [Serializable] attribute instructs the serializer to include all fields in the type. This includes both private and public fields (but not properties). Every field must itself be serializable; otherwise, an exception is thrown. Primitive .NET types such as string and int support serialization (as do many other .NET types).

NOTE

The Serializable attribute is not inherited, so subclasses are not automatically serializable, unless also marked with this attribute.

With automatic properties, the binary serialization engine serializes the underlying compiler-generated field. The name of this field, unfortunately, can change when its type is recompiled, breaking compatibility with existing serialized data. The workaround is either to avoid automatic properties in [Serializable] types or to implement ISerializable.

To serialize an instance of Person, you instantiate a formatter and call Serialize. There are two formatters for use with the binary engine:

BinaryFormatter

This is the more efficient of the two, producing smaller output in less time. Its namespace is System.Runtime.Serialization.Formatters.Binary.

SoapFormatter

This supports basic SOAP-style messaging when used with Remoting. Its namespace is System.Runtime.Serialization.Formatters.Soap.

BinaryFormatter is contained in mscorlib; SoapFormatter is contained in System.Runtime.Serialization.Formatters.Soap.dll.

WARNING

The SoapFormatter is less functional than the BinaryFormatter. The SoapFormatter doesn’t support generic types or the filtering of extraneous data necessary for version tolerant serialization.

The two formatters are otherwise exactly the same to use. The following serializes a Person with a BinaryFormatter:

Person p = new Person() { Name = "George", Age = 25 };

IFormatter formatter = new BinaryFormatter();

using (FileStream s = File.Create ("serialized.bin"))

formatter.Serialize (s, p);

All the data necessary to reconstruct the Person object is written to the file serialized.bin. The Deserialize method restores the object:

using (FileStream s = File.OpenRead ("serialized.bin"))

{

Person p2 = (Person) formatter.Deserialize (s);

Console.WriteLine (p2.Name + " " + p.Age); // George 25

}

WARNING

The deserializer bypasses all constructors when re-creating objects. Behind the scenes, it calls FormatterServices.GetUninitializedObject to do this job. You can call this method yourself to implement some very grubby design patterns!

The serialized data includes full type and assembly information, so if we try to cast the result of deserialization to a matching Person type in a different assembly, an error would result. The deserializer fully restores object references to their original state upon deserialization. This includes collections, which are just treated as serializable objects like any other (all collection types in System.Collections.* are marked as serializable).

NOTE

The binary engine can handle large, complex object graphs without special assistance (other than ensuring that all participating members are serializable). One thing to be wary of is that the serializer’s performance degrades in proportion to the number of references in your object graph. This can become an issue in a Remoting server that has to process many concurrent requests.

Binary Serialization Attributes

[NonSerialized]

Unlike data contracts, which have an opt-in policy in serializing fields, the binary engine has an opt-out policy. Fields that you don’t want serialized, such as those used for temporary calculations, or for storing file or window handles, you must mark explicitly with the [NonSerialized]attribute:

[Serializable] public sealed class Person

{

public string Name;

public DateTime DateOfBirth;

// Age can be calculated, so there's no need to serialize it.

[NonSerialized] public int Age;

}

This instructs the serializer to ignore the Age member.

WARNING

Nonserialized members are always empty or null when deserialized—even if field initializers or constructors set them otherwise.

[OnDeserializing] and [OnDeserialized]

Deserialization bypasses all your normal constructors as well as field initializers. This is of little consequence if every field partakes in serialization, but it can be problematic if some fields are excluded via [NonSerialized]. We can illustrate this by adding a bool field called Valid:

public sealed class Person

{

public string Name;

public DateTime DateOfBirth;

[NonSerialized] public int Age;

[NonSerialized] public bool Valid = true;

public Person() { Valid = true; }

}

A deserialized Person will not be Valid—despite the constructor and field initializer.

The solution is the same as with the data contract serializer: to define a special deserialization “constructor” with the [OnDeserializing] attribute. A method that you flag with this attribute gets called just prior to deserialization:

[OnDeserializing]

void OnDeserializing (StreamingContext context)

{

Valid = true;

}

We could also write an [OnDeserialized] method to update the calculated Age field (this fires just after deserialization):

[OnDeserialized]

void OnDeserialized (StreamingContext context)

{

TimeSpan ts = DateTime.Now - DateOfBirth;

Age = ts.Days / 365; // Rough age in years

}

[OnSerializing] and [OnSerialized]

The binary engine also supports the [OnSerializing] and [OnSerialized] attributes. These flag a method for execution before or after serialization. To see how they can be useful, we’ll define a Team class that contains a generic List of players:

[Serializable] public sealed class Team

{

public string Name;

public List<Person> Players = new List<Person>();

}

This class serializes and deserializes correctly with the binary formatter but not the SOAP formatter. This is because of an obscure limitation: the SOAP formatter refuses to serialize generic types! An easy solution is to convert Players to an array just prior to serialization, then convert it back to a generic List upon deserialization. To make this work, we can add another field for storing the array, mark the original Players field as [NonSerialized], and then write the conversion code in as follows:

[Serializable] public sealed class Team

{

public string Name;

Person[] _playersToSerialize;

[NonSerialized] public List<Person> Players = new List<Person>();

[OnSerializing]

void OnSerializing (StreamingContext context)

{

_playersToSerialize = Players.ToArray();

}

[OnSerialized]

void OnSerialized (StreamingContext context)

{

_playersToSerialize = null; // Allow it to be freed from memory

}

[OnDeserialized]

void OnDeserialized (StreamingContext context)

{

Players = new List<Person> (_playersToSerialize);

}

}

[OptionalField] and Versioning

By default, adding a field breaks compatibility with data that’s already serialized, unless you attach the [OptionalField] attribute to the new field.

To illustrate, suppose we start with a Person class that has just one field. Let’s call it Version 1:

[Serializable] public sealed class Person // Version 1

{

public string Name;

}

Later, we realize we need a second field, so we create Version 2 as follows:

[Serializable] public sealed class Person // Version 2

{

public string Name;

public DateTime DateOfBirth;

}

If two computers were exchanging Person objects via Remoting, deserialization would go wrong unless they both updated to Version 2 at exactly the same time. The OptionalField attribute gets around this problem:

[Serializable] public sealed class Person // Version 2 Robust

{

public string Name;

[OptionalField (VersionAdded = 2)] public DateTime DateOfBirth;

}

This tells the deserializer not to panic if it sees no DateOfBirth in the data stream, and instead to treat the missing field as nonserialized. This means you end up with an empty DateTime (you can assign a different value in an [OnDeserializing] method).

The VersionAdded argument is an integer that you increment each time you augment a type’s fields. This serves as documentation, and it has no effect on serialization semantics.

WARNING

If versioning robustness is important, avoid renaming and deleting fields and avoid retrospectively adding the NonSerialized attribute. Never change a field’s type.

So far we’ve focused on the backward-compatibility problem: the deserializer failing to find an expected field in the serialization stream. But with two-way communication, a forward-compatibility problem can also arise whereby the deserializer encounters an extraneous field with no knowledge of how to process it. The binary formatter is programmed to automatically cope with this by throwing away the extraneous data; the SOAP formatter instead throws an exception! Hence, you must use the binary formatter if two-way versioning robustness is required; otherwise, manually control the serialization by implementing ISerializable.

Binary Serialization with ISerializable

Implementing ISerializable gives a type complete control over its binary serialization and deserialization.

Here’s the ISerializable interface definition:

public interface ISerializable

{

void GetObjectData (SerializationInfo info, StreamingContext context);

}

GetObjectData fires upon serialization; its job is to populate the SerializationInfo object (a name-value dictionary) with data from all fields that you want serialized. Here’s how we would write a GetObjectData method that serializes two fields, called Name and DateOfBirth:

public virtual void GetObjectData (SerializationInfo info,

StreamingContext context)

{

info.AddValue ("Name", Name);

info.AddValue ("DateOfBirth", DateOfBirth);

}

In this example, we’ve chosen to name each item according to its corresponding field. This is not required; any name can be used, as long as the same name is used upon deserialization. The values themselves can be of any serializable type; the Framework will recursively serialize as necessary. It’s legal to store null values in the dictionary.

NOTE

It’s a good idea to make the GetObjectData method virtual—unless your class is sealed. This allows subclasses to extend serialization without having to reimplement the interface.

SerializationInfo also contains properties that you can use to control the type and assembly that the instance should deserialize as. The StreamingContext parameter is a structure that contains, among other things, an enumeration value indicating to where the serialized instance is heading (disk, Remoting, etc., although this value is not always populated).

In addition to implementing ISerializable, a type controlling its own serialization needs to provide a deserialization constructor that takes the same two parameters as GetObjectData. The constructor can be declared with any accessibility and the runtime will still find it. Typically, though, you would declare it protected so that subclasses can call it.

In the following example, we implement ISerializable in the Team class. When it comes to handling the List of players, we serialize the data as an array rather than a generic list, so as to offer compatibility with the SOAP formatter:

[Serializable] public class Team : ISerializable

{

public string Name;

public List<Person> Players;

public virtual void GetObjectData (SerializationInfo si,

StreamingContext sc)

{

si.AddValue ("Name", Name);

si.AddValue ("PlayerData", Players.ToArray());

}

public Team() {}

protected Team (SerializationInfo si, StreamingContext sc)

{

Name = si.GetString ("Name");

// Deserialize Players to an array to match our serialization:

Person[] a = (Person[]) si.GetValue ("PlayerData", typeof (Person[]));

// Construct a new List using this array:

Players = new List<Person> (a);

}

}

For commonly used types, the SerializationInfo class has typed “Get” methods such as GetString, in order to make writing deserialization constructors easier. If you specify a name for which no data exists, an exception is thrown. This happens most often when there’s a version mismatch between the code doing the serialization and deserialization. You’ve added an extra field, for instance, and then forgotten about the implications of deserializing an old instance. To work around this problem, you can either:

§ Add exception handling around code that retrieves a data member added in a later version.

§ Implement your own version numbering system. For example:

§ public string MyNewField;

§

§ public virtual void GetObjectData (SerializationInfo si,

§ StreamingContext sc)

§ {

§ si.AddValue ("_version", 2);

§ si.AddValue ("MyNewField", MyNewField);

§ ...

§ }

§

§ protected Team (SerializationInfo si, StreamingContext sc)

§ {

§ int version = si.GetInt32 ("_version");

§ if (version >= 2) MyNewField = si.GetString ("MyNewField");

§ ...

}

Subclassing Serializable Classes

In the preceding examples, we sealed the classes that relied on attributes for serialization. To see why, consider the following class hierarchy:

[Serializable] public class Person

{

public string Name;

public int Age;

}

[Serializable] public sealed class Student : Person

{

public string Course;

}

In this example, both Person and Student are serializable, and both classes use the default runtime serialization behavior since neither class implements ISerializable.

Now imagine that the developer of Person decides for some reason to implement ISerializable and provide a deserialization constructor to control Person serialization. The new version of Person might look like this:

[Serializable] public class Person : ISerializable

{

public string Name;

public int Age;

public virtual void GetObjectData (SerializationInfo si,

StreamingContext sc)

{

si.AddValue ("Name", Name);

si.AddValue ("Age", Age);

}

protected Person (SerializationInfo si, StreamingContext sc)

{

Name = si.GetString ("Name");

Age = si.GetInt32 ("Age");

}

public Person() {}

}

Although this works for instances of Person, this change breaks serialization of Student instances. Serializing a Student instance would appear to succeed, but the Course field in the Student type isn’t saved to the stream because the implementation ofISerializable.GetObjectData on Person has no knowledge of the members of the Student-derived type. Additionally, deserialization of Student instances throws an exception since the runtime is looking (unsuccessfully) for a deserialization constructor on Student.

The solution to this problem is to implement ISerializable from the outset for serializable classes that are public and nonsealed. (With internal classes, it’s not so important because you can easily modify the subclasses later if required.)

If we started out by writing Person as in the preceding example, Student would then be written as follows:

[Serializable]

public class Student : Person

{

public string Course;

public override void GetObjectData (SerializationInfo si,

StreamingContext sc)

{

base.GetObjectData (si, sc);

si.AddValue ("Course", Course);

}

protected Student (SerializationInfo si, StreamingContext sc)

: base (si, sc)

{

Course = si.GetString ("Course");

}

public Student() {}

}

XML Serialization

The Framework provides a dedicated XML serialization engine called XmlSerializer in the System.Xml.Serialization namespace. It’s suitable for serializing .NET types to XML files and is also used implicitly by ASMX Web Services.

As with the binary engine, there are two approaches you can take:

§ Sprinkle attributes throughout your types (defined in System.Xml.Serialization).

§ Implement IXmlSerializable.

Unlike with the binary engine, however, implementing the interface (i.e., IXmlSerializable) eschews the engine completely, leaving you to code the serialization yourself with XmlReader and XmlWriter.

Getting Started with Attribute-Based Serialization

To use XmlSerializer, you instantiate it and call Serialize or Deserialize with a Stream and object instance. To illustrate, suppose we define the following class:

public class Person

{

public string Name;

public int Age;

}

The following saves a Person to an XML file, and then restores it:

Person p = new Person();

p.Name = "Stacey"; p.Age = 30;

XmlSerializer xs = new XmlSerializer (typeof (Person));

using (Stream s = File.Create ("person.xml"))

xs.Serialize (s, p);

Person p2;

using (Stream s = File.OpenRead ("person.xml"))

p2 = (Person) xs.Deserialize (s);

Console.WriteLine (p2.Name + " " + p2.Age); // Stacey 30

Serialize and Deserialize can work with a Stream, XmlWriter/XmlReader, or TextWriter/TextReader. Here’s the resultant XML:

<?xml version="1.0"?>

<Person xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"

xmlns:xsd="http://www.w3.org/2001/XMLSchema">

<Name>Stacey</Name>

<Age>30</Age>

</Person>

XmlSerializer can serialize types without any attributes—such as our Person type. By default, it serializes all public fields and properties on a type. You can exclude members you don’t want serialized with the XmlIgnore attribute:

public class Person

{

...

[XmlIgnore] public DateTime DateOfBirth;

}

Unlike the other two engines, XmlSerializer does not recognize the [OnDeserializing] attribute and relies instead on a parameterless constructor for deserialization, throwing an exception if one is not present. (In our example, Person has an implicit parameterless constructor.) This also means field initializers execute prior to deserialization:

public class Person

{

public bool Valid = true; // Executes before deserialization

}

Although XmlSerializer can serialize almost any type, it recognizes the following types and treats them specially:

§ The primitive types, DateTime, TimeSpan, Guid, and nullable versions

§ byte[] (which is converted to base 64)

§ An XmlAttribute or XmlElement (whose contents are injected into the stream)

§ Any type implementing IXmlSerializable

§ Any collection type

The deserializer is version tolerant: it doesn’t complain if elements or attributes are missing or if superfluous data is present.

Attributes, names, and namespaces

By default, fields and properties serialize to an XML element. You can request an XML attribute be used instead as follows:

[XmlAttribute] public int Age;

You can control an element or attribute’s name as follows:

public class Person

{

[XmlElement ("FirstName")] public string Name;

[XmlAttribute ("RoughAge")] public int Age;

}

Here’s the result:

<Person RoughAge="30" ...>

<FirstName>Stacey</FirstName>

</Person>

The default XML namespace is blank (unlike the data contract serializer, which uses the type’s namespace). To specify an XML namespace, [XmlElement] and [XmlAttribute] both accept a Namespace argument. You can also assign a name and namespace to the type itself with[XmlRoot]:

[XmlRoot ("Candidate", Namespace = "http://mynamespace/test/")]

public class Person { ... }

This names the person element “Candidate” as well as assigning a namespace to this element and its children.

XML element order

XmlSerializer writes elements in the order that they’re defined in the class. You can change this by specifying an Order in the XmlElement attribute:

public class Person

{

[XmlElement (Order = 2)] public string Name;

[XmlElement (Order = 1)] public int Age;

}

If you use Order at all, you must use it throughout.

The deserializer is not fussy about the order of elements—they can appear in any sequence and the type will properly deserialize.

Subclasses and Child Objects

Subclassing the root type

Suppose your root type has two subclasses as follows:

public class Person { public string Name; }

public class Student : Person { }

public class Teacher : Person { }

and you write a reusable method to serialize the root type:

public void SerializePerson (Person p, string path)

{

XmlSerializer xs = new XmlSerializer (typeof (Person));

using (Stream s = File.Create (path))

xs.Serialize (s, p);

}

To make this method work with a Student or Teacher, you must inform XmlSerializer about the subclasses. There are two ways to do this. The first is to register each subclass with the XmlInclude attribute:

[XmlInclude (typeof (Student))]

[XmlInclude (typeof (Teacher))]

public class Person { public string Name; }

The second is to specify each of the subtypes when constructing XmlSerializer:

XmlSerializer xs = new XmlSerializer (typeof (Person),

new Type[] { typeof (Student), typeof (Teacher) } );

In either case, the serializer responds by recording the subtype in the type attribute (just like with the data contract serializer):

<Person xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"

xsi:type="Student">

<Name>Stacey</Name>

</Person>

This deserializer then knows from this attribute to instantiate a Student and not a Person.

NOTE

You can control the name that appears in the XML type attribute by applying [XmlType] to the subclass:

[XmlType ("Candidate")]

public class Student : Person { }

Here’s the result:

<Person xmlns:xsi="..."

xsi:type="Candidate">

Serializing child objects

XmlSerializer automatically recurses object references such as the HomeAddress field in Person:

public class Person

{

public string Name;

public Address HomeAddress = new Address();

}

public class Address { public string Street, PostCode; }

To demonstrate:

Person p = new Person(); p.Name = "Stacey";

p.HomeAddress.Street = "Odo St";

p.HomeAddress.PostCode = "6020";

Here’s the XML to which this serializes:

<Person ... >

<Name>Stacey</Name>

<HomeAddress>

<Street>Odo St</Street>

<PostCode>6020</PostCode>

</HomeAddress>

</Person>

WARNING

If you have two fields or properties that refer to the same object, that object is serialized twice. If you need to preserve referential equality, you must use another serialization engine.

Subclassing child objects

Suppose you need to serialize a Person that can reference subclasses of Address as follows:

public class Address { public string Street, PostCode; }

public class USAddress : Address { }

public class AUAddress : Address { }

public class Person

{

public string Name;

public Address HomeAddress = new USAddress();

}

There are two distinct ways to proceed, depending on how you want the XML structured. If you want the element name always to match the field or property name with the subtype recorded in a type attribute:

<Person ...>

...

<HomeAddress xsi:type="USAddress">

...

</HomeAddress>

</Person>

you use [XmlInclude] to register each of the subclasses with Address as follows:

[XmlInclude (typeof (AUAddress))]

[XmlInclude (typeof (USAddress))]

public class Address

{

public string Street, PostCode;

}

If, on the other hand, you want the element name to reflect the name of the subtype, to the following effect:

<Person ...>

...

<USAddress>

...

</USAddress>

</Person>

you instead stack multiple [XmlElement] attributes onto the field or property in the parent type:

public class Person

{

public string Name;

[XmlElement ("Address", typeof (Address))]

[XmlElement ("AUAddress", typeof (AUAddress))]

[XmlElement ("USAddress", typeof (USAddress))]

public Address HomeAddress = new USAddress();

}

Each XmlElement maps an element name to a type. If you take this approach, you don’t require the [XmlInclude] attributes on the Address type (although their presence doesn’t break serialization).

NOTE

If you omit the element name in [XmlElement] (and specify just a type), the type’s default name is used (which is influenced by [XmlType] but not [XmlRoot]).

Serializing Collections

XmlSerializer recognizes and serializes concrete collection types without intervention:

public class Person

{

public string Name;

public List<Address> Addresses = new List<Address>();

}

public class Address { public string Street, PostCode; }

Here’s the XML to which this serializes:

<Person ... >

<Name>...</Name>

<Addresses>

<Address>

<Street>...</Street>

<Postcode>...</Postcode>

</Address>

<Address>

<Street>...</Street>

<Postcode>...</Postcode>

</Address>

...

</Addresses>

</Person>

The [XmlArray] attribute lets you rename the outer element (i.e., Addresses).

The [XmlArrayItem] attribute lets you rename the inner elements (i.e., the Address elements).

For instance, the following class:

public class Person

{

public string Name;

[XmlArray ("PreviousAddresses")]

[XmlArrayItem ("Location")]

public List<Address> Addresses = new List<Address>();

}

serializes to this:

<Person ... >

<Name>...</Name>

<PreviousAddresses>

<Location>

<Street>...</Street>

<Postcode>...</Postcode>

</Location>

<Location>

<Street>...</Street>

<Postcode>...</Postcode>

</Location>

...

</PreviousAddresses>

</Person>

The XmlArray and XmlArrayItem attributes also allow you to specify XML namespaces.

To serialize collections without the outer element, for example:

<Person ... >

<Name>...</Name>

<Address>

<Street>...</Street>

<Postcode>...</Postcode>

</Address>

<Address>

<Street>...</Street>

<Postcode>...</Postcode>

</Address>

</Person>

instead add [XmlElement] to the collection field or property:

public class Person

{

...

[XmlElement ("Address")]

public List<Address> Addresses = new List<Address>();

}

Working with subclassed collection elements

The rules for subclassing collection elements follow naturally from the other subclassing rules. To encode subclassed elements with the type attribute, for example:

<Person ... >

<Name>...</Name>

<Addresses>

<Address xsi:type="AUAddress">

...

add [XmlInclude] attributes to the base (Address) type as we did before. This works whether or not you suppress serialization of the outer element.

If you want subclassed elements to be named according to their type, for example:

<Person ... >

<Name>...</Name>

<!-start of optional outer element->

<AUAddress>

<Street>...</Street>

<Postcode>...</Postcode>

</AUAddress>

<USAddress>

<Street>...</Street>

<Postcode>...</Postcode>

</USAddress>

<!-end of optional outer element->

</Person>

you must stack multiple [XmlArrayItem] or [XmlElement] attributes onto the collection field or property.

Stack multiple [XmlArrayItem] attributes if you want to include the outer collection element:

[XmlArrayItem ("Address", typeof (Address))]

[XmlArrayItem ("AUAddress", typeof (AUAddress))]

[XmlArrayItem ("USAddress", typeof (USAddress))]

public List<Address> Addresses = new List<Address>();

Stack multiple [XmlElement] attributes if you want to exclude the outer collection element:

[XmlElement ("Address", typeof (Address))]

[XmlElement ("AUAddress", typeof (AUAddress))]

[XmlElement ("USAddress", typeof (USAddress))]

public List<Address> Addresses = new List<Address>();

IXmlSerializable

Although attribute-based XML serialization is flexible, it has limitations. For instance, you cannot add serialization hooks—nor can you serialize nonpublic members. It’s also awkward to use if the XML might present the same element or attribute in a number of different ways.

On that last issue, you can push the boundaries somewhat by passing an XmlAttributeOverrides object into XmlSerializer’s constructor. There comes a point, however, when it’s easier to take an imperative approach. This is the job of IXmlSerializable:

public interface IXmlSerializable

{

XmlSchema GetSchema();

void ReadXml (XmlReader reader);

void WriteXml (XmlWriter writer);

}

Implementing this interface gives you total control over the XML that’s read or written.

NOTE

A collection class that implements IXmlSerializable bypasses XmlSerializer’s rules for serializing collections. This can be useful if you need to serialize a collection with a payload—in other words, additional fields or properties that would otherwise be ignored.

The rules for implementing IXmlSerializable are as follows:

§ ReadXml should read the outer start element, then the content, and then the outer end element.

§ WriteXml should write just the content.

For example:

using System;

using System.Xml;

using System.Xml.Schema;

using System.Xml.Serialization;

public class Address : IXmlSerializable

{

public string Street, PostCode;

public XmlSchema GetSchema() { return null; }

public void ReadXml(XmlReader reader)

{

reader.ReadStartElement();

Street = reader.ReadElementContentAsString ("Street", "");

PostCode = reader.ReadElementContentAsString ("PostCode", "");

reader.ReadEndElement();

}

public void WriteXml (XmlWriter writer)

{

writer.WriteElementString ("Street", Street);

writer.WriteElementString ("PostCode", PostCode);

}

}

Serializing and deserializing an instance of Address via XmlSerializer automatically calls the WriteXml and ReadXml methods. Further, if Person was defined as follows:

public class Person

{

public string Name;

public Address HomeAddress;

}

IXmlSerializable would be called upon selectively to serialize the HomeAddress field.

We describe XmlReader and XmlWriter at length in the first section of Chapter 11. Also in Chapter 11, in Patterns for Using XmlReader/XmlWriter, we provide examples of IXmlSerializable-ready classes.