Working with HTML and XML data - PowerShell management - PowerShell in Depth, Second Edition (2015)

PowerShell in Depth, Second Edition (2015)

Part 2. PowerShell management

Chapter 14. Working with HTML and XML data

This chapter covers

· Working with HTML

· Creating HTML output

· Persisting data with XML

· Working with XML files

PowerShell includes some great capabilities for working with two common forms of structured data: HTML and XML. Why is this important? Because HTML is a great way to produce professional-looking reports and you can use XML in so many places within your environment. If you use PowerShell, the help, format, and type files are XML. The “cmdlet over objects” functionality introduced in PowerShell v3 is based on XML. The HTML- and XML-related functionality hasn’t had any major changes in PowerShell v4. We’ll cover the various capabilities and provide some concise examples of how you might want to use them.

14.1. Working with HTML

HTML—the Hypertext Markup Language—is a similar-to-XML language used to construct the content for web pages. Like XML, HTML documents consist of sets of nested tags, which form a document hierarchy:

<Body>

<H1>This is a heading</H1>

<p>This is some text</p>

</Body>

A full discussion of HTML and all its many features is beyond the scope of this book, but you can find some excellent tutorials at www.w3schools.com/html/ if you want to learn more. PowerShell v3 introduced unique new ways to work with HTML, which we’ll cover in this chapter.

14.1.1. Retrieving an HTML page

Your first step will be to get some HTML into the shell. To do that, you’ll ask PowerShell to retrieve a page from a web server, in much the same way that a web browser would do the same task. PowerShell won’t draw, or render, the page, but it’ll let you work with the raw HTML.

Note

In this chapter we’re concerned with the HTML only. In chapter 40 you’ll learn more about using Invoke-WebRequest and other web-based cmdlets for interacting with websites and web services.

PowerShell v3 introduced this cmdlet, which you can use with the following command:

PS C:\> $html = Invoke-WebRequest -uri http://bing.com

Nice and simple. The Invoke-WebRequest command has a lot of additional parameters that you can use, many of which require a bit of understanding into how HTTP requests are formed and sent to a web server. Let’s review a few of the major ones:

· -Credential lets you attach a credential to the request, which is useful when you’re accessing a server that requires authentication.

· -Headers is a dictionary (or hash table) of request headers that need to be sent. This can include any valid HTTP headers—a full list of which is outside the scope of this book, but http://en.wikipedia.org/wiki/List_of_HTTP_headers contains a list of valid options.

· -MaximumRedirection lets you specify the maximum number of times your request can be redirected from one server to another before the request fails.

· -Method specifies the type of request you’re sending, and you’ll usually specify either GET or POST. GET is the default and lets you use URLs that have embedded parameters, such as the following: http://www.bing.com/search?q=cmdlet&go=&qs=n&form=QBLH&pq=cmdlet&sc=8-6&sp=-1&sk=

· -OutFile accepts a file path and name and saves the resulting HTML to that file. This creates a static, local copy of the web page you requested. In our example, we captured the HTML to a variable instead.

· -Proxy accepts the URI for an HTTP proxy server, which will proxy the request for you. Depending on your network, you may need to also use –ProxyCredential or -ProxyUseDefaultCredentials to specify credentials for the proxy server.

· -UseBasicParsing is necessary when you’re running the command on a computer that doesn’t have Internet Explorer (IE) installed, such as on Server Core computers. This causes the command to skip the HTML Document Object Model (DOM) parsing, because IE is needed to perform that step.

· -UserAgent lets you specify a custom user agent string for the request. Web servers use this to identify the type of web browser you’re using, and they may change their content based on this value. For example, mobile browsers might get different content than a desktop browser. The user agent string for IE11 on Windows 8.1 is “R” “Mozilla/5.0 (Windows NT 6.3; Trident/7.0; rv:11.0) like Gecko”.

Those parameters will get you through basic requests to most websites. Things get tricky when you need to automate several back-and-forth requests with a web server; we’ll cover this in detail in chapter 40. Typically, web servers send your browser a small piece of data called a cookie, which is used to identify your particular browser to the server.

Note

Cookie use in Europe is diminishing as a result of recent legislation.

That way, the server can, for example, maintain the state of things like a shopping cart. PowerShell isn’t a web browser, though, and doesn’t automatically handle cookies like a web browser would. So each request the shell sends is a fresh new relationship with the server, and that might not work for some scenarios.

Invoke-WebRequest does have the ability to maintain state information—you have to help it to do so. Two parameters, -SessionVariable and –WebSession, support this capability. You’ll use one or the other but never both.

You’ll generally use –SessionVariable when you’re sending an initial, or first, request to the server. For example, if you’re retrieving a server’s login page, use this:

PS C:\> $r = Invoke-WebRequest http://www.facebook.com/login.php

-SessionVariable fb

This code will result in the creation of a variable $fb as the session variable. PowerShell will populate $fb with a WebRequestSession object, and you’ll use that for subsequent requests sent to that server. To do this, you’d pass the entire variable to the -WebSession parameter:

PS C:\> $r = Invoke-WebRequest http://whatever.com -WebSession $fb

Note

You’ll find a complete walkthrough on how to log into Facebook from PowerShell in the help file for Invoke-WebRequest. It includes examples of how to construct the form contents to submit. We also suggest reading it, because it’s a good example of how to maintain a multirequest, back-and-forth conversation with a web server.

14.1.2. Working with the HTML results

On a machine with IE installed, the result of your web request is a parsed HTML document. For example, our request for the home page of Bing.com resulted in the following being stored in our $html variable:

PS C:\> $html

StatusCode : 200

StatusDescription : OK

Content : <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0

Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xht

ml1-transitional.dtd"><html lang="en" xml:lang="en"

xmlns="http://www.w3.org/1999/xhtml"><head><meta ...

RawContent : HTTP/1.1 200 OK

Connection: keep-alive

Content-Length: 32595

Cache-Control: private, max-age=0

Content-Type: text/html; charset=utf-8

Date: Tue, 13 Mar 2013 17:55:21 GMT

P3P: CP="NON UNI COM NAV...

Forms : {sb_form}

Headers : {[Connection, keep-alive], [Content-Length, 32595],

[Cache-Control, private, max-age=0], [Content-Type,

text/html; charset=utf-8]...}

Images : {}

InputFields : {@{innerHTML=; innerText=; outerHTML=<INPUT

id=sb_form_q title="Enter your search term"

class=sw_qbox name=q autocomplete="off">; outerText=;

tagName=INPUT; id=sb_form_q; title=Enter your search

term; class=sw_qbox; name=q; autocomplete=off},

@{innerHTML=; innerText=; outerHTML=<INPUT tabIndex=0

id=sb_form_go title=Search class="sw_qbtn sw_sb"

type=submit name=go>; outerText=; tagName=INPUT;

tabIndex=0; id=sb_form_go; title=Search;

class=sw_qbtn sw_sb; type=submit; name=go},

@{innerHTML=; innerText=; outerHTML=<INPUT id=sa_qs

type=hidden value=bs name=qs>; outerText=;

tagName=INPUT; id=sa_qs; type=hidden; value=bs;

name=qs}, @{innerHTML=; innerText=; outerHTML=<INPUT

type=hidden value=QBLH name=form>; outerText=;

tagName=INPUT; type=hidden; value=QBLH; name=form}}

Links : {@{innerHTML=Explore ; innerText=Explore ;

outerHTML=<A onmousedown="return

si_T('&ID=SERP,5002.1')"

href="/explore?FORM=BXLH">Explore </A>;

outerText=Explore ; tagName=A; onmousedown=return

si_T('&ID=SERP,5002.1');

href=/explore?FORM=BXLH}, @{innerHTML=Images;

innerText=Images; outerHTML=<A

onclick="selectScope(this, 'images');"

onmousedown="return si_T('&ID=SERP,5013.1')"

href="/images?FORM=Z9LH">Images</A>;

outerText=Images; tagName=A;

onclick=selectScope(this, 'images');;

onmousedown=return si_T('&ID=SERP,5013.1');

href=/images?FORM=Z9LH}, @{innerHTML=Videos;

innerText=Videos; outerHTML=<A

onclick="selectScope(this, 'video');"

onmousedown="return si_T('&ID=SERP,5014.1')"

href="/videos?FORM=Z9LH1">Videos</A>;

outerText=Videos; tagName=A;

onclick=selectScope(this, 'video');;

onmousedown=return si_T('&ID=SERP,5014.1');

href=/videos?FORM=Z9LH1}, @{innerHTML=Shopping;

innerText=Shopping; outerHTML=<A

onclick="selectScope(this, 'commerce');"

onmousedown="return si_T('&ID=SERP,5015.1')"

href="/shopping?FORM=Z9LH2">Shopping</A>;

outerText=Shopping; tagName=A;

onclick=selectScope(this, 'commerce');;

onmousedown=return si_T('&ID=SERP,5015.1');

href=/shopping?FORM=Z9LH2}...}

ParsedHtml : System.__ComObject

RawContentLength : 32595

You can see that the resulting object contains the following properties:

· StatusCode—The HTTP status code for our request. A “200” is good news, indicating a successful request.

· StatusDescription—A textual version of the status code.

· Content—The raw, unparsed HTML content.

· RawContent—The entire response, including various headers.

· Forms—A collection of HTML forms on the page (which will be empty if there are no forms).

· Headers—A collection of HTML headers sent with the response.

· Images—A collection of the <IMG> tags from the page.

· InputFields—A collection of the <INPUT> tags from the page.

· Links—A collection of the <A> tags from the page.

· ParsedHTML—A DOM object with the page’s tag hierarchy. We’re not going to dive into this in detail because it’s “developer-y,” but if you’d like to explore further, you’ll find a decent tutorial at www.javascriptkit.com/javatutors/dom.shtml.

· RawContentLength—The number of bytes in the response.

Of this set, the StatusCode, Forms, Headers, Images, InputFields, and Links properties are probably the easiest to use. StatusCode states the obvious; the others are all collections of tags. Each tag is presented as an object. For example, this is a link:

PS C:\> $html.links[0]

innerHTML : Explore

innerText : Explore

outerHTML : <A onmousedown="return si_T('&ID=SERP,5002.1')"

href="/explore?FORM=BXLH">Explore </A>

outerText : Explore

tagName : A

onmousedown : return si_T('&ID=SERP,5002.1')

href : /explore?FORM=BXLH

As you can see, there are subproperties for the link’s HTML, text, destination URL, and so forth. This means you could create a list of all destination URLs on this page, as follows:

PS C:\> $links = $html.links | select -expand href

PS C:\> $links

/explore?FORM=BXLH

/images?FORM=Z9LH

/videos?FORM=Z9LH1

/shopping?FORM=Z9LH2

/news?FORM=Z9LH3

/maps/?FORM=Z9LH4

/travel/?cid=homenav&FORM=Z9LH5

/entertainment?FORM=Z9LH6

/profile/history?FORM=Z9LH7

This technique created a collection of simple String objects, making it easy to enumerate through those, if you wanted to do so for some reason. Similarly, let’s look at the one and only form on the page:

PS C:\> $html.forms[0] | format-list *

Id : sb_form

Method : get

Action : /search

Fields : {[sb_form_q, ], [sb_form_go, ], [sa_qs, bs], [form, QBLH]}

This tells us there’s a form named sb_form, which uses the GET request method, submits to a page called /search, and contains four form fields.

Obviously, one of the big tricks in working with HTML is to understand HTML itself, which isn’t something we’re going to teach you in this book. PowerShell only provides a means of getting to the HTML data, once you know what it is you’re working with.

14.1.3. Practical example

As a simple yet practical example, you’ll write a command that searches Bing.com for the term “cmdlet” and returns the URLs for the top 10 results:

PS C:\> Invoke-WebRequest -uri 'http://www.bing.com/search?q=cmdlet&

form=APMCS1' | select -expand links | select -expand href -first 10

/?scope=web&FORM=Z9FD

/images/search?q=cmdlet&FORM=BIFD

/videos/search?q=cmdlet&FORM=BVFD

/shopping/search?q=cmdlet&mkt=en-US&FORM=BPFD

/news/search?q=cmdlet&FORM=BNFD

/maps/default.aspx?q=cmdlet&mkt=en-US&FORM=BYFD

/explore?q=cmdlet&FORM=BXFD

http://www.msn.com/

http://mail.live.com/

Being able to easily send web requests and deal with the results opens up a wide range of possibilities in PowerShell. This capability was available in earlier versions of the shell but required you to use low-level .NET Framework classes that are now nicely wrapped up into a single handy cmdlet.

14.1.4. Creating HTML output

PowerShell can also create HTML output from almost any command (except commands that produce no output, or the Format- commands, which produce a specialized form of output). Here’s a simple example:

PS C:\> Get-Service | where Status -eq 'Running' | ConvertTo-HTML |

Out-File RunningServices.html

This code produces the HTML shown in figure 14.1. You can open the HTML file by passing its path to Invoke-Item:

Figure 14.1. ConvertTo-HTML creates simple HTML tables from the command output.

PS C:\> Invoke-Item -Path ./RunningServices.html

Tip

Any file that’s linked to an application through a file association can be opened using Invoke-Item. This includes Word documents, Excel spreadsheets, TXT files (Notepad), or CSV files (Excel).

Figure 14.1 shows a simple HTML table, but you can tweak it quite a bit. Notice that we explicitly needed to pipe the HTML content to Out-File in order to get it into a file; the ConvertTo verb changes the format of something (to HTML in this case), but it leaves the converted data in the shell’s pipeline.

The ConvertTo-HTML command has a number of useful parameters:

· -Property lets you specify the properties you want displayed. You could also do this by piping the output to Select-Object first. If you’re bringing data across the network from a remote machine, then filter it when you retrieve the data.

· -Head lets you specify HTML-formatted text to be included in the <HEAD> section of the final HTML.

· -Title lets you specify a title for the page, which will appear in the browser’s window title bar. Don’t use this and –Head at the same time, because they modify the same section of the HTML page and –Title will be ignored.

· -CssUri lets you specify the URL of a Cascading Style Sheet (CSS) file, which can specify better-looking formatting directives for the page. A browser combines the CSS and HTML to render the final output. Figure 14.2 shows our example HTML page with CSS applied.

Figure 14.2. Adding a CSS style sheet to change the appearance of the HTML page

· You can use -PostContent and –PreContent to add textual content after or before the main HTML table constructed by the cmdlet. You can use them to briefly explain what’s being shown or add other information to the page.

ConvertTo-HTML normally produces an entire HTML web page, including the outer <HTML> tags, the initial <HEAD> section, and so forth. But you can also use it to produce only an HTML fragment. Such fragments aren’t intended for stand-alone use but can be used to construct a multisection HTML page. Combining it with the –As parameter, which lets you change the output from the default table form into a list, you can create some impressive-looking reports in HTML. The following listing shows an example.

Listing 14.1. Creating an HTML report

$computername = 'WIN8'

$b = Get-WmiObject -class Win32_ComputerSystem -Computer $computername |

Select-Object -Property Manufacturer,Model,

@{name='Memory(GB)';expression={$_.TotalPhysicalMemory / 1GB -as [int]}},

@{name='Architecture';expression={$_.SystemType}},

@{name='Processors';expression={$_.NumberOfProcessors}} |

ConvertTo-HTML -Fragment -As LIST

-PreContent "<h2>Computer Hardware:</h2>" |

Out-String

$b += Get-WmiObject -class Win32_LogicalDisk -Computer $computername |

Select-Object -Property @{n='DriveLetter';e={$_.DeviceID}},

@{name='Size(GB)';expression={$_.Size / 1GB -as [int]}},

@{name='FreeSpace(GB)';expression={$_.FreeSpace / 1GB -as [int]}} |

ConvertTo-Html -Fragment -PreContent "<h2>Disks:</h2>" |

Out-String

$b += Get-WmiObject -class Win32_NetworkAdapter -Computer $computername |

Where { $_.PhysicalAdapter } |

Select-Object -Property MACAddress,AdapterType,DeviceID,Name |

ConvertTo-Html -Fragment

-PreContent "<h2>Physical Network Adapters:</h2>" |

Out-String

$head = @'

<style>

body { background-color:#dddddd;

font-family:Tahoma;

font-size:12pt; }

td, th { border:1px solid black;

border-collapse:collapse; }

th { color:white;

background-color:black; }

table, tr, td, th { padding: 2px; margin: 0px }

table { margin-left:50px; }

</style>

'@

ConvertTo-HTML -head $head -PostContent $b `

-Body "<h1>Hardware Inventory for $ComputerName</h1>" |

Out-File -FilePath "$computername.html"

Invoke-Item -Path "$computername.html"

The code in listing 14.1 queries three different things from WMI, creates some custom output properties, and converts the results to HTML fragments. The first one is created as a list rather than the usual HTML table. All the HTML is converted to a string (Out-String) and appended to a variable, $b.

The $head variable is created to contain an embedded HTML style sheet, eliminating the need to put the CSS into a separate file. Everything is then fed to ConvertTo-HTML one last time to combine it all into a completed HTML page, which is shown in figure 14.3.

Figure 14.3. Creating a multipart HTML report in PowerShell

This is a powerful technique and one you can easily expand to include additional sections of information.

14.2. Using XML to persist data

Before we jump into using XML, let’s explore a feature of PowerShell that’s been available since PowerShell v1—persisting PowerShell data as XML. One common use of XML is to preserve complex, hierarchical data in a simple, text-based format that’s easily transmitted across networks, copied as files, and so forth. XML’s other advantage is that it can be read by humans if required. Objects, PowerShell’s main form of command output, are one common kind of complex, hierarchical data, and a pair of PowerShell cmdlets can help convert objects to and from XML. This process is called serializing (converting objects to XML) and deserializing (converting XML back into objects), and it’s almost exactly what happens in PowerShell Remoting (covered in chapter 10) when objects need to be transmitted over the network.

In this case, you’re using XML as a format in which to save the data in the PowerShell objects, from which it can be reconstructed. Here’s a quick example:

PS C:\> $proc = Get-Process

PS C:\> $proc | Export-Clixml proc_baseline.xml

This code creates a static, text-based representation of the processes currently running on the computer. The Export-Clixml cmdlet produces XML that’s specifically designed to be read back in by PowerShell:

Note

The Export verb, unlike the ConvertTo verb, combines the acts of converting the objects into another data format and writing them out to a file.

PS C:\> $rproc = Import-Clixml .\proc_baseline.xml

PS C:\> $rproc | sort -property pm -Descending | select -First 10

Handles NPM(K) PM(K) WS(K) VM(M) CPU(s) Id ProcessName

------- ------ ----- ----- ----- ------ -- -----------

783 77 336420 285772 819 43.69 2204 powershell

544 41 196500 166980 652 13.41 2660 powershell

348 24 91156 39032 600 1.28 92 wsmprovhost

186 18 52024 35472 170 5.56 716 dwm

329 28 24628 24844 213 0.30 2316 iexplore

311 26 24276 22308 213 0.30 108 iexplore

210 14 20628 26228 69 5.95 1828 WmiPrvSE

1327 41 19608 33164 126 49.45 764 svchost

398 15 19164 21120 56 3.95 728 svchost

722 47 17992 23080 1394 13.45 924 svchost

The previous example demonstrates that the objects are imported from XML and placed, as objects, into the pipeline, where they can again be sorted, selected, filtered, and so forth.

If you now run:

PS C:\> $proc | sort -property pm -Descending | select -First 10

the data will appear the same whether you use $proc or $rproc. But there are differences. Try:

$proc | Get-Member

$rproc | Get-Member

Comparing the results shows that $rproc is a Deserialized.System.Diagnostics .Process object and $proc is a System.Diagnostics.Process object, as you’d expect from Get-Process. Deserialized objects are static, and their methods have been removed because they’re no longer “live” objects against which actions can be taken. You can see this by comparing the outputs of Get-Member produced earlier. But because XML captures a hierarchy of object data, it’s an excellent tool for capturing complex objects.

If you open proc_baseline.xml with Notepad or an XML editor, you’ll see that it’s incredibly verbose. We don’t recommend that you use CliXml for anything but persisting PowerShell objects. If you want to work directly with XML, you need to use the techniques in the following sections.

JSON

PowerShell v3 introduced another potential intermediary form: JavaScript Object Notation (JSON). Two cmdlets are provided:

ConvertTo-Json

ConvertFrom-Json

When converting a PowerShell object to JSON, properties are converted to field names, the field values are converted to property values, and the methods are removed. This last point is important because it means you end up with an inert object when you convert back. In the CliXML example, you’d get a Deserialized .System.Diagnostics.Process object returned—that is, a Process object with the methods removed.

Try performing the same actions with JSON:

Get-Process | ConvertTo-Json | ConvertFrom-Json |

sort -property pm -Descending | select -first 10

This won’t give the same results as the type returned by ConvertFrom-Json, which is a System.Management.Automation.PSCustomObject, and as such the default formatting for the Process object won’t apply.

We recommend using the CliXML format as an intermediary rather than JSON. JSON as an output from web services is covered in chapter 40.

14.3. XML basics

PowerShell’s XML abilities are no less amazing than its HTML abilities. XML (Extensible Markup Language) is perhaps one of the most useful text-based formats for storing a static copy of data. PowerShell provides rich functionality for working with XML data. Unlike comma-separated values (CSV) files, which can store only “flat” data, XML can store rich, hierarchical representations of data, yet it’s still easy to import into the shell, modify, save, attach to emails, and so forth.

It’s a little easy to overthink XML, when in fact XML has only a couple of rules. XML isn’t technically a language—it’s a grammar. It’s a set of rules—only a couple of rules, really—for creating your own language. In other words, you get to make up most of the rules, which makes XML easy to work with. Take a look at this short example of an XML document:

<servers>

<server name="SERVER2">

<OSVersion>2012R2</OSVersion>

<BIOS Version="1.2.662" Maker="Dell" />

</server>

<server name="SERVER1" />

</servers>

Let’s walk through some of the important parts, most of which correspond to the official XML rules, and some of which are options that you get to decide on:

· The document starts with a single, top-level, root element. In this case, it’s the <servers> element. We chose that—there’s no rule that made us pick it, and no rule that forced us to make it plural. We could’ve called it <fred> and the document would still work the same way. But by choosing <servers> we make the document a bit more human-readable. It contains information on servers, and so it makes sense to have the root element indicate that.

· The XML elements, called tags, are case sensitive. The tag <Server> isn’t the same as <server>.

· All elements have both an opening tag and a closing tag, such as <server> </server>. But it’s common to use a self-closing tag when the element doesn’t contain any information. That’s what we chose to do with SERVER1’s tag. We could also have written that as <server name="SERVER1"></server> and it’d have been just as valid.

· All elements must be completely nested within their parent. Because <servers> is our root element, everything appears between that opening tag and its closing counterpart. We indented each element a bit so that it was visually clearer to us how the nesting worked, but that indentation is purely for human convenience.

· There are two ways to attach data to an element, and we’ve used both. In the <OSVersion> element, “2012R2” is the value of the element. That’s useful when you have only one piece of data that goes with the element. However, for <BIOS>, we included two attributes, Version andMaker, and attached values to those. Both approaches are equally valid. In fact, we could’ve broken the <BIOS> element out as follows:

· <BIOS>

· <Version>1.2.662</Version>

· <Maker>Dell</Maker>

</BIOS>

The only reason we didn’t do so is because, from a programming perspective, it’s a little easier to use our first approach. We only have to get that <BIOS> element, which is a single operation, and then we get access to its attributes. Using the more expanded approach, we’d have to retrieve each element to access the values. But it’s not that much extra work—how you decide to go about it is up to you.

Tip

The PowerShell ISE “understands” XML. If you create a new document in it, and then save that document with an .xml or .ps1xml filename extension, the ISE will properly color-code the XML. It’s a lot easier to work in than, say, Notepad.

For the running example in this chapter, we’re going to use the XML in listing 14.2 as a starting point. If you want to follow along, know that we’re running our code against two computers: one named MEMBER, and then the local computer, LOCALHOST. If you want to follow along with the chapter, replace LOCALHOST with your local computer name and MEMBER with the name of another computer.

Listing 14.2. ComputerData.xml

<computers>

<computer name='localhost'>

<biosserial />

<osversion />

</computer>

<computer name='member'>

<biosserial />

<osversion />

</computer>

</computers>

14.4. Reading XML files

PowerShell makes it easy to import an XML file. Now, we’re not talking about the Import-CliXML cmdlet here. That cmdlet is designed to read a specific kind of XML—the kind produced by Export-CliXML. With that particular XML language, PowerShell makes the rules, and the result isn’t meant to be especially human-readable. No, we’re talking about importing any ol’ XML you want, such as the snippet we offered earlier. Just do this:

[xml]$xml = Get-Content C:\ComputerData.xml

The [xml] part tells PowerShell to parse the text file as an XML document, so if your document isn’t properly formed XML, you’ll get an error. The resulting XML document is then stored in the $xml variable. We could’ve called that variable anything, but $xml seemed reasonable.

Once you’ve got the XML document in a variable—and mind you, this is an XML document now, not just a big chunk of text—you can start manipulating it. For example, let’s say we didn’t know in advance how many <computer> elements the document contained and we wanted to enumerate them. We could do this in a script:

foreach ($computer in $xml.computers.computer) {

Write-Output " Computer $($computer.name)"

}

Here, we’ve accessed the XML document in $xml, asked for the <computers> node, and then asked for the collection of <computer> nodes. We can access the attributes of a <computer> node by simply referring to it, as we’ve done with the name attribute. Because we’re getting only a single attribute, we could also do something like this:

PS C:\> $xml.computers.computer.name

localhost

member

What are we enumerating?

Enumerating XML documents can seem a bit confusing. In our sample XML document, the <computers> node contains one or more <computer> child nodes, right? So why didn’t we run foreach ($computer in $xml.computers)?

The answer requires that you realize XML doesn’t restrict child nodes to being of a single type. That is, we could’ve created a <computers> root node (which we did), and underneath that could have put <client>, <server>, <phone>, and many other types of nodes. So in order to enumerate something—at least, the way PowerShell does it—you specify the kind of node you’re enumerating:

foreach ($thing in $xml.computers.computer)

Or, if you prefer a more abstract example:

foreach ($item in $xml.root_node.child_node_type)

PowerShell treats the XML document like an object, with each node or tag as a nested object property.

You can also use an XPath query to access individual nodes. A full discussion is beyond the scope of this book (although you can find numerous XPath tutorials online, starting at www.w3schools.com), but here’s a quick example:

$node = $xml.SelectSingleNode("//computer[@name='localhost']")

The $node variable will now contain the <computer> node whose name attribute is localhost:

PS C:\> $node

name biosserial osversion

---- ---------- ---------

localhost

We can also access properties with the new object:

PS C:\> $node.name

localhost

PS C:\> $node.osversion

PS C:\>

Obviously we’re missing some data, so let’s correct that.

14.5. Modifying XML

Once you have a node, you can modify it easily. Starting with our original example, let’s try to populate the BIOS serial numbers, as shown in listing 14.3.

Listing 14.3. Modifying XML data

foreach ($computer in $xml.computers.computer) {

$bios = Get-WmiObject -Class Win32_BIOS -ComputerName ($computer.name)

$computer.biosserial = $bios.SerialNumber

}

Using the code in listing 14.3, we’ve enumerated the computers, queried each one by using WMI, and inserted the queried BIOS serial number into each computer’s <biosserial> node. The resulting XML might look like this:

<computers>

<computer name="localhost">

<biosserial>VMware-56 4d bb 4e e8 ec 08 e</biosserial>

<osversion />

</computer>

<computer name="member">

<biosserial>VMware-56 4b d8 09 35 c4 f8 02 21</biosserial>

<osversion />

</computer>

</computers>

After modifying the XML, we used $xml.InnerXML to display the modified XML document, although it won’t be as nicely formatted as we show here. Note that this doesn’t save the XML back to disk. We’ve only modified the XML data currently in memory. To update the file, we can run this command, assuming the XML file is in the current directory:

$xml.Save(".\ComputerData.xml")

So it’s easy to modify the existing elements, populating them with data as you see fit. What about adding new ones? Let’s create a new element for each computer that shows the computer’s manufacturer (listing 14.4).

Listing 14.4. Adding the manufacturer

foreach ($computer in $xml.computers.computer) {

$bios = Get-WmiObject -Class Win32_BIOS -ComputerName ($computer.name)

$sys = Get-WmiObject -Class Win32_ComputerSystem

-ComputerName ($computer.name)

$computer.biosserial = $bios.SerialNumber

$new_node = $xml.CreateNode('element','manufacturer','')

$new_node.InnerText = $sys.Manufacturer

$computer.AppendChild($new_node) | Out-Null

}

Here’s what the resulting XML might look like:

<computers>

<computer name="localhost">

<biosserial>VMware-56 4d bb 4e e8 ec 08 e</biosserial>

<osversion />

<manufacturer>VMware</manufacturer>

</computer>

<computer name="member">

<biosserial>VMware-56 4b d8 09 35 c4 f8 02 21</biosserial>

<osversion />

<manufacturer>VMware</manufacturer>

</computer>

</computers>

As you can see, an all-new node has been created for each computer, and we’ve populated its inner text with the manufacturer value that we queried through the WMI Win32_ComputerSystem class.

What about adding a new attribute to an existing element? For example, suppose we wanted to add the operating system build number as an attribute of the <computer> element (listing 14.5)?

Listing 14.5. Adding a new element

foreach ($computer in $xml.computers.computer) {

$bios = Get-WmiObject -Class Win32_BIOS -ComputerName ($computer.name)

$sys = Get-WmiObject -Class Win32_ComputerSystem

-ComputerName ($computer.name)

$os = Get-WmiObject -Class Win32_OperatingSystem

-ComputerName ($computer.name)

$computer.biosserial = $bios.SerialNumber

$new_node = $xml.CreateNode('element','manufacturer','')

$new_node.InnerText = $sys.Manufacturer

$computer.AppendChild($new_node) | Out-Null

$attr = $xml.CreateAttribute('build')

$attr.Value = $os.BuildNumber

$computer.SetAttributeNode($attr) | Out-Null

}

The resulting XML might look like this:

<computers>

<computer build="3900" name="localhost">

<biosserial>VMware-56 4d bb 4e e8 ec 08 e</biosserial>

<osversion />

<manufacturer>VMware</manufacturer>

</computer>

<computer build="3900" name="member">

<biosserial>VMware-56 4b d8 09 35 c4 f8 02 21</biosserial>

<osversion />

<manufacturer>VMware</manufacturer>

</computer>

</computers>

Notice that we used Out-Null in two places. That’s because the methods SetAttribute-Node() and AppendChild() both produce an output object, and we didn’t want to see it. Sending it to null effectively suppresses it. This is a useful technique that you can apply in any place where you want your script to run silently.

14.6. Creating XML

Now all of that’s fine if you’re starting with an existing XML document. But what about taking data from PowerShell and turning it into XML? Yes, there is Export-CliXML, which is great for storing data you intend to reuse in PowerShell. But the XML from that cmdlet can’t be used outside of PowerShell. What you can use instead is ConvertTo-XML.

Like ConvertTo-CSV, ConvertTo-XML takes objects and serializes them:

PS C:\> Get-Service | ConvertTo-Xml

xml Objects

--- -------

version="1.0" Objects

The cmdlet writes an XML document to the pipeline, so you’ll need to save it to a variable:

PS C:\> $svc = Get-Service | ConvertTo-Xml

PS C:\> $svc.GetType().name

XmlDocument

Because the XML is in memory, you can use the techniques we demonstrated earlier in the chapter, including saving the results to a file:

PS C:\> $svc.Save("c:\work\services.xml")

ConvertTo-Xml will save all data, but usually you only need a subset. In listing 14.6 we’re gathering some WMI information from a few computers.

Listing 14.6. Creating an XML document

$computers = 'chi-dc01','chi-fp02','chi-dc04'

$data = Get-CimInstance -ClassName Win32_ComputerSystem -ComputerName $computers |

Select Name,TotalPhysicalMemory,NumberofProcessors,

NumberofLogicalProcessors,Manufacturer,Model,SystemType |

ConvertTo-Xml -NoTypeInformation

You’ll notice we used the –NoTypeInformation parameter because we intend to use the resulting XML in something other than PowerShell. Here’s what the XML looks like:

<?xml version="1.0"?>

<Objects>

<Object>

<Property Name="Name">CHI-DC01</Property>

<Property Name="TotalPhysicalMemory">1073274880</Property>

<Property Name="NumberofProcessors">1</Property>

<Property Name="NumberofLogicalProcessors">1</Property>

<Property Name="Manufacturer">Microsoft Corporation</Property>

<Property Name="Model">Virtual Machine</Property>

<Property Name="SystemType">x64-based PC</Property>

</Object>

<Object>

<Property Name="Name">CHI-FP02</Property>

<Property Name="TotalPhysicalMemory">750309376</Property>

<Property Name="NumberofProcessors">1</Property>

<Property Name="NumberofLogicalProcessors">2</Property>

<Property Name="Manufacturer">Microsoft Corporation</Property>

<Property Name="Model">Virtual Machine</Property>

<Property Name="SystemType">x64-based PC</Property>

</Object>

<Object>

<Property Name="Name">CHI-DC04</Property>

<Property Name="TotalPhysicalMemory">1073270784</Property>

<Property Name="NumberofProcessors">1</Property>

<Property Name="NumberofLogicalProcessors">1</Property>

<Property Name="Manufacturer">Microsoft Corporation</Property>

<Property Name="Model">Virtual Machine</Property>

<Property Name="SystemType">x64-based PC</Property>

</Object>

</Objects>

Technically there’s nothing wrong with this. But it might be nicer to revise so that we can see a collection of computer nodes. To accomplish that we need to rename nodes so that <Objects> becomes <Computers> and <Object> becomes <Computer>. We created a simple PowerShell function to get this done (listing 14.7).

Listing 14.7. Rename-XMLNode

The Rename-XMLNode function creates a new node, with the new name, copies data from the specified node, and replaces it. It won’t write anything to the pipeline, but it’ll update the XML document in memory:

PS C:\> Rename-XMLNode -Node $data.objects -NewName Computers

PS C:\> $data

xml Computers

--- ---------

version="1.0" Computers

We renamed the outer <Objects> node to <Computers>. Next we need to do the same for each of the child <Object> nodes with code like this:

foreach ($node in $data.computers.object) {

rename-xmlnode -node $node -NewName Computer

}

Finally, we can save the modified XML document:

PS C:\> $data.Save("c:\work\mydata.xml")

Here’s the new XML:

<?xml version="1.0"?>

<Computers>

<Computer>

<Property Name="Name">CHI-DC01</Property>

<Property Name="TotalPhysicalMemory">1073274880</Property>

<Property Name="NumberofProcessors">1</Property>

<Property Name="NumberofLogicalProcessors">1</Property>

<Property Name="Manufacturer">Microsoft Corporation</Property>

<Property Name="Model">Virtual Machine</Property>

<Property Name="SystemType">x64-based PC</Property>

</Computer>

<Computer>

<Property Name="Name">CHI-FP02</Property>

<Property Name="TotalPhysicalMemory">750309376</Property>

<Property Name="NumberofProcessors">1</Property>

<Property Name="NumberofLogicalProcessors">2</Property>

<Property Name="Manufacturer">Microsoft Corporation</Property>

<Property Name="Model">Virtual Machine</Property>

<Property Name="SystemType">x64-based PC</Property>

</Computer>

<Computer>

<Property Name="Name">CHI-DC04</Property>

<Property Name="TotalPhysicalMemory">1073270784</Property>

<Property Name="NumberofProcessors">1</Property>

<Property Name="NumberofLogicalProcessors">1</Property>

<Property Name="Manufacturer">Microsoft Corporation</Property>

<Property Name="Model">Virtual Machine</Property>

<Property Name="SystemType">x64-based PC</Property>

</Computer>

</Computers>

To us, this is clearer and now ready to use outside of PowerShell.

14.7. Select-XML

PowerShell v3 and later also include a Select-Xml cmdlet. This cmdlet is designed to find text within an XML string or within an XML document. Specifically, it’s designed to execute XPath queries (the same ones we mentioned earlier).

Let’s start with a simple XML document:

<top>

<mid attrib="1">Value 1</mid>

<mid attrib="2">Value B</mid>

</top>

We’ll save that in test.xml to make it easy to use. We might then run:

[xml]$xml = Get-Content test.xml

Select-XML –Xml $xml –Xpath "//mid[@attrib='1']"

Doing so would return the first <mid> node from the document, because it has an attribute equal to “1.” Again, you have to know XPath for this to work. If you’re still looking for a starting point on XPath, you’ll find a good one at www.w3schools.com/xpath/xpath_syntax.asp.

The cmdlet can accept XML in one of several ways:

· Use the –Xml parameter to pass an XML node or document.

· Use the –Content parameter to pass a string that contains XML.

· Use the –Path or –LiteralPath parameter to specify file paths.

The result of the cmdlet is a result object, and it will have a Node property that provides access to the desired chunk of XML. For example:

PS C:\> [xml]$xml = Get-Content test.xml

PS C:\> $result = Select-Xml –Xml $xml –Xpath "//mid[@attrib='1']"

PS C:\> $result.node.InnerXML

Value 1

PS C:\> $result.node.attrib

1

In that example, we used the Node property to access the XML node that was found by the cmdlet, and then accessed its value via the InnerXML property and the value of its attrib attribute. Select-Xml isn’t all that different from the SelectSingleNode() method that we showed you earlier; it’s just done with a cmdlet, instead of accessing a method of the XML document itself.

14.8. Summary

With so much of the world’s data in XML and/or HTML, being able to work in those formats can be handy. PowerShell provides a variety of capabilities that should be able to address most common situations; obviously, the more knowledge you have of those formats and how they’re used, the more effective you’ll be with PowerShell’s ability to handle them.

HTML is the data format of the web and being able to work directly with the format opens up a number of possibilities. More importantly, from your viewpoint, being able to easily create reports in HTML gives you the opportunity to produce impressive-looking output with minimal effort on your part.

XML is easy to work with. You’ve seen how to manually set up a starting document, load it into the shell, and enumerate its elements. We’ve shown you how to add nodes and attributes, and how to write the final thing back to disk. It isn’t a lot of work, and XML makes a wonderful, simple, text-based data file that’s a lot more flexible than CSV. It’s also a lot easier than working with something like an Excel spreadsheet.