Working with the web - Advanced PowerShell - PowerShell in Depth, Second Edition (2015)

PowerShell in Depth, Second Edition (2015)

Part 4. Advanced PowerShell

Chapter 40. Working with the web

This chapter covers

· Getting data from a website

· Working with JSON

· Working with REST services

Web protocols have given us incredible access to data. Whether you’re consuming an XML feed from an internal web service, scraping HTML pages from the public internet, or interacting with REST services on a partner extranet, web technologies help make it happen. PowerShell is well connected to these technologies, too, meaning you can use it as a tool for automating those interactions.


This chapter in particular focuses on Windows PowerShell v3 and v4. Much of what we’re discussing can be performed in older versions, but the shell itself lacks the commands. Instead, you end up working with the raw .NET Framework to accomplish these tasks. Cmdlets are always easier than raw .NET code.

At first glance this may seem to be a developer-oriented topic, but the administration tools of many software products within our environments are increasingly exposed as web services. As an administrator, you need to know how to access those services. First, though, how do you access a simple website?

40.1. Getting data from the web

PowerShell’s Invoke-WebRequest command is designed to send an HTTP request to a web server and to download the results. For example, if you run Invoke-WebRequest -Uri, you’ll get back a result object. That object includes a number of useful properties:

· StatusCode contains 200 if the request completed normally; other HTTP error numbers like 404 (not found) are possible. You’ll find a list of possible status codes along with their descriptions and meanings here: Individual web servers won’t necessarily implement all status codes or instance those implemented by Internet Information Services (IIS), as explained at

· StatusDescription contains OK if the request is successful. StatusDescription is tightly coupled with StatusCode and is explained in the URL given in the StatusCode item.

· Content contains the actual content of the response, which in this case would be the HTML for the web page. PowerShell doesn’t render the web page because it isn’t a browser, but it does give you access to the raw HTML. The default display will show you only the beginning of the HTML; to see the whole page’s worth, use Invoke-WebRequest -Uri | select -ExpandProperty Content.

· RawContent includes not only the HTML but also the raw HTTP response headers. Those headers can include information on the web server, cookies, cache information, and more.

· Forms is a collection of objects that contain the HTML for any input forms defined in the document. Similarly, Images contains the HTML for any images on the page, InputFields contains input fields, Links contain HTML hyperlinks, and so on.

· Headers provides a collection of HTTP response headers. These are parsed from the RawContent property, and they’re easier to work with because they’re all nicely broken out into a collection.

· ParsedHTML is a reference to a Component Object Model (COM) object that provides access to the Document Object Model (DOM) of the page. We’ll work with this property a bit later in this chapter.

Keep in mind that the command’s job is to send a request to a URL and then save whatever comes back. What you do with those results depends on your goals, and how you work with the results depends on what the web server sent you. We’ll explore those topics later in this chapter.


PowerShell uses the term Uniform Resource Identifier (URI) in most of its web-friendly cmdlets. We tend to use the term Uniform Resource Locator (URL) in this book. A URL is one kind of URI: in addition to idenifying a resource, the URL tells the computer how to get to that resource, often by providing a protocol handler like http://. We won’t be working with any examples that use URIs other than URLs.

40.2. Using web sessions

The back-and-forth between web servers and browsers (or PowerShell) isn’t technically a conversation, although we often refer to it that way. In a conversation, someone says something to you, and you reply. They remember your reply (unless maybe you’re at a really wild party), and craft their next response based on that reply.

When a browser (or PowerShell) sends a request to a web server, though, the web server has no idea who it’s talking to—even if it just sent that same web browser a response a few seconds ago. Every request-response exchange starts from scratch, with no context as to what has happened in the past. Imagine the difficulty this presents: You send your username and password to a website in order to log in and the website sends you a “logged in” web page. You then send a request to access, say, your account details, and the web server says, “Wait, who are you again?”

Cookies help solve that problem. A cookie is a small piece of information, such as a unique identifier, sent by the web server to you. You’re supposed to send it back to the web server with each subsequent request, helping the server remember who you are and what you were both talking about. Cookies help create a conversational thread, or context, that lets you work with a web server. Together with some other pieces of information, cookies form a web session. If you need to automate some back-and-forth with a web server by using PowerShell, you’ll need to manage web sessions.

To help demonstrate this, we’ve set up a page at This page attempts to send two cookies to you and also displays any cookies you sent it. Try this:

PS C:\> Invoke-WebRequest -Uri |

Select –expandproperty content

<h1>Enter Details</h1>

<form name="testform" action="cookietest.php" method="post">

<input type="text" name="field1">

<input type="text" name="field2">


The test page has a form named testform, which contains two input fields, named field1 and field2, which is what you see as returned content. Eventually, you should see cookie information. Our goal will be to capture the web session so that subsequent attempts will show that we correctly re-sent the cookies back to the server. To simulate the process of logging into a web page, we’re pretending that the input fields in the form are for username and password.

We start by retrieving the page and capturing the response to a variable. That’ll make it easier to work with:

$response = Invoke-WebRequest -Uri

-SessionVariable session

Our variable response contains data like this:

StatusCode : 200

StatusDescription : OK

Content : <h1>Enter Details</h1>

<form name="testform" action="cookietest.php"


<input type="text" name="field1">

<input type="text" name="field2">


RawContent : HTTP/1.1 200 OK

Transfer-Encoding: chunked

Connection: keep-alive

Content-Type: text/html; charset=UTF-8

Date: Sat, 14 Dec 2013 09:48:38 GMT

Set-Cookie: __cfduid=ddb23833cf32b002cb11ba618b3283304...

Forms : {testform}

Headers : {[Transfer-Encoding, chunked], [Connection,

keep-alive], [Content-Type, text/html;


[Date, Sat, 14 Dec 2013 09:48:38 GMT]...}

Images : {}

InputFields : {@{innerHTML=; innerText=;

outerHTML=<INPUT name=field1>; outerText=;

tagName=INPUT; name=field1},

@{innerHTML=; innerText=;

outerHTML=<INPUT name=field2>;

outerText=; tagName=INPUT;


Links : {}

ParsedHtml : mshtml.HTMLDocumentClass

RawContentLength : 165

Next we want to get the forms so that we can log in. For now, we’re pretending that we don’t care about whatever cookies were sent. We know in our human brains that this is our first visit to the website. You’ll notice, however, that we did capture the web session in $session. There’s no typo there: When you provide a variable name to –SessionVariable, it only wants the variable name, which doesn’t include a dollar sign. The contents of the $session variable are as follows:

PS C:\> $session

Headers : {}

Cookies : System.Net.CookieContainer

UseDefaultCredentials : False

Credentials :

Certificates :

UserAgent : Mozilla/5.0 (Windows NT; Windows NT 6.3; en-GB)


Proxy :

MaximumRedirection : -1

So, to get that form:

$form = $response.forms[0]

we did need to analyze the web page ahead of time to determine that the first form (index 0) is the one we want. (You can view the available forms using $response.Forms.) Now that we have that form, we can fill in the fields:

$form.fields['field1'] = 'myname'

$form.Fields['field2'] = 'mypassword'

We can now send the form back to the server:

$r = Invoke-WebRequest -Uri

-WebSession $session -Method POST -Body $form.fields

A lot went on there.


We used the –WebSession parameter with our session variable, including the $ prefix. –Sessionvariable is used for the first request with just the variable name (no $). Subsequent sessions use –WebSession with the variable including $.

We sent our web session object back, and we needed to specify the POST method as opposed to the default GET. A GET request is a simple URL, perhaps with parameters embedded in that URL (such as A POST request, on the other hand, can contain an entire body of information, which in this case is the filled-in form fields.

HTTP verbs

A number of standard HTTP verbs (also known as method definitions) are defined here: The common verbs you’ll meet are as follows:

· GET—Used for reading information

· POST—Used for creating information

· PUT—Used for changing information

· DELETE—Used for removing information

Other verbs you may come across include:

· OPTIONS—Gets information on communication options

· HEAD—Same as GET but contains no message body in the response

· TRACE—Invokes a remote, application-layer loopback of the request

· CONNECT—Reserved for use with a proxy

When you’re using PowerShell, the HTTP verbs can be used in a case-insensitive manner—that is, GET and get will both work. Other clients may not be so generous, so we recommend that you capitalize the verb.

Let’s run the following to see what we got back:


In our run-through, here’s what we got:

You sent cookie 'test1' containing 'value1' <br>

You sent cookie 'test2' containing 'value2' <br>

You sent cookie '__cfduid' containing

'd43e5b7c0e2de81022570ddeb3f54ff8a1386698974876' <br>

You sent field 'field1' containing 'myname' <br>

You sent field 'field2' containing 'mypassword' <br>

<h1>Enter Details</h1>

<form name="testform" action="cookietest.php" method="post">

<input type="text" name="field1">

<input type="text" name="field2">


Great! Our cookies made it to the server, and so did our form fields. Now we can keep passing those cookies back with each request. Keep in mind that the server might well send new cookies or change the ones it sent us; we’d need to use –WebSession to capture the new web session with each request, while at the same time sending the last session we got:

$r2 = Invoke-WebRequest -Uri

-WebSession $session -Method Post -Body $form.fields

That way, $session is being continually updated with whatever the server sent back.

40.3. Working with web responses

So you’ve sent a request, and you’ve gotten a response, and you want to work with the content. What do you do? Well, that depends on what the content is. If it’s XML, you can just cast it as XML:

[xml]$xml_response = $response.content

and then work with the XML as normal XML data (see chapter 14 for details). But sometimes, you might be “scraping” a web page and need to work with the raw HTML; other times, you might be getting a JavaScript Object Notation (JSON) result that you’ll need to work with.

40.3.1. Working with HTML documents

There are two ways to work with HTML. Some web pages are in XHTML, which is an XML-friendly version of HTML. Those pages can be cast as XML and you can treat them as normal XML documents. Other HTML pages might not be explicitly XML-friendly, but you can try casting them as XML anyway. If it works, manipulating XML is the easiest way to go. If it doesn’t work, you’ll just get an error.

Your last choice—when the HTML page can’t be cast as XML—is to work with the web page’s DOM. Unfortunately, this requires that you have Internet Explorer (IE) installed locally, which might be a problem on a server (especially a Server Core server). IE provides the COM code that works with the DOM, but if you have IE, the DOM can be an interesting and straightforward way to manipulate a web page.


This isn’t a book on HTML DOM, which is a massive topic all by itself. Suffice it to say that the DOM takes a structured document and creates a hierarchical object model out of it, which you can manipulate programmatically. You’ll find Microsoft’s HTML DOM reference at We’re going to focus on a simple example that shows how PowerShell can access the DOM, but then you’re on your own.

Let’s take our cookietest.php page as an example. It contains a heading that’s styled with the HTML <H1> tag. How can we extract its contents so that we can display the actual heading text?

First, get the page and its DOM object:

PS C:\> $page = Invoke-WebRequest -Uri

PS C:\> $dom = $page.ParsedHtml

PS C:\> $dom.getElementsByTagName('H1') | Select –Expand innerText

This code selects the <H1> tag (it’d select them all, if there were more than one) and expands its innerText property, which contains the text within the tag. Of course, given that this is PowerShell, there’s always more than one way to solve a problem: We could’ve found that heading text by using a regular expression with the page content:

$page.Content -match "<H1>(?<heading>.*)?</H1>"


The point is that HTML is just text. You can work with it via regular expressions, treat it as XML, or treat it like an HTML DOM. But PowerShell isn’t a browser—it’s not going to run embedded JavaScript, display graphics, or anything else.

40.3.2. Working with JSON data

When PowerShell needs to save a static representation of an object in a textual format, it uses XML. Actually, the underlying .NET Framework does so, in a process called serialization. For example, when PowerShell Remoting sends result objects across the network, those are serialized into XML, because XML is easy to transmit across a network—it’s just text! Turning the XML back into a programmatic object is called deserialization.

JavaScript, the language of web browsers and client-side web programming, doesn’t use XML as much. Instead, it serializes to something called JSON. Here’s a snippet of JSON, created by piping a Windows service (the BITS service) to ConvertTo-JSON:

PS C:\> Get-Service -Name bits | ConvertTo-Json


"CanPauseAndContinue": false,

"CanShutdown": false,

"CanStop": false,

"DisplayName": "Background Intelligent Transfer Service",

"DependentServices": [


"MachineName": ".",

"ServiceName": "bits",

There’s a lot more data, including the required and dependent services. JSON isn’t as verbose as XML, but you still get a lot of text. You’d use JSON if you needed to send something to a web server that was expecting JSON, or if you wanted to consume an object that a web server sent to you in JSON format. JSON would be the content of your web response, so you’d use Invoke-WebRequest to send the request and get the response, and then run the content of the response through ConvertFrom-Json. The result would be an object you could manipulate in PowerShell.

Jeff’s blog has a function that returns JSON, so we can use that to play with. We’ll start by asking his server for the data we want:

$json = Invoke-WebRequest -DisableKeepAlive -UseBasicParsing


This code is supposed to return a list of tags from his blog, including their names, internal ID numbers, the number of posts using each tag, and so on. We’ve told PowerShell to not try and parse the resulting HTML (-UseBasicParsing) because the result isn’t HTML—it’s JSON. $jsonnow contains a web response; let’s convert the JSON content to PowerShell objects:

$tags = $json.content | ConvertFrom-Json | select –ExpandProperty tags

That gives us a bunch of tag objects. Looking in $tags, we’ll find objects like the following:

id : 443

slug : sid

title : SID

description :

post_count : 1

id : 338

slug : smbit

title : smbit

description :

post_count : 2

id : 432

slug : snapshot

title : snapshot

description :

post_count : 1

You could then do whatever you wanted with the objects:

PS C:\> $tags | sort post_count -Descending | select -First 5 |

Format-Table -AutoSize

id slug title description post_count

-- ---- ----- ----------- ----------

4 powershell PowerShell 380

8 scripting Scripting 181

19 wmi WMI 63

32 functions functions 62

10 books Books 43

Not surprisingly, Jeff writes a lot about PowerShell. This was just meant to be a quick example; remember, JSON is just a data format and not a service discovery mechanism. In other words, there’s no way to tell whether a website accepts or produces JSON, or even what that JSON means, except for trial and error or previous knowledge of the website.

As a further example, the feature on Jeff’s blog that provides JSON can return data based on a search result, but you have to know in advance how to construct the URI. In this example, we’ll get some recent blog posts on PowerShell:



Using Invoke-WebRequest we’ll retrieve HTML content:

$results = Invoke-WebRequest -Uri $uri -DisableKeepAlive

As before, the results are converted from JSON:

$converted = $results.Content | ConvertFrom-Json

If we pipe $converted to Get-Member, we can discover what type of object we have to work with:

TypeName: System.Management.Automation.PSCustomObject

Name MemberType Definition

---- ---------- ----------

Equals Method bool Equals(System.Object obj)

GetHashCode Method int GetHashCode()

GetType Method type GetType()

ToString Method string ToString()

count NoteProperty System.Int32 count=5

count_total NoteProperty System.Int32 count_total=486

pages NoteProperty System.Int32 pages=98

posts NoteProperty System.Object[] posts=System.Object[]

status NoteProperty System.String status=ok

The posts property, which we’ve boldfaced, appears to be a collection of posts. What does this object look like?

PS C:\> $converted.posts | Get-Member

TypeName: System.Management.Automation.PSCustomObject

Name MemberType Definition

---- ---------- ----------

Equals Method bool Equals(System.Object obj)

GetHashCode Method int GetHashCode()

GetType Method type GetType()

ToString Method string ToString()

attachments NoteProperty System.Object[] attachments=System.Object[]

author NoteProperty System.Management.Automation.PSCustomObject

author=@{id=1; slug=administrator; name=Jeffery Hicks;...

categories NoteProperty System.Object[] categories=System.Object[]

comments NoteProperty System.Object[] comments=System.Object[]

comment_count NoteProperty System.Int32 comment_count=4

comment_status NoteProperty System.String comment_status=open

content NoteProperty System.String content=<!—

google_ad_section_start --><p>The other day Distinguished

Engineer and ...

custom_fields NoteProperty System.Management.Automation.PSCustomObject

custom_fields=@{tt_auto_tweet=System.Object[]; tt_auto...

date NoteProperty System.String date=2013-12-09 11:59:15

excerpt NoteProperty System.String excerpt=<!—

google_ad_section_start --><p>The other day Distinguished

Engineer and ...

id NoteProperty System.Int32 id=3573

modified NoteProperty System.String modified=2013-12-09 11:59:15

slug NoteProperty System.String slug=updated-console-graphing-in-


status NoteProperty System.String status=publish

tags NoteProperty System.Object[] tags=System.Object[]

title NoteProperty System.String title=Updated Console Graphing in


title_plain NoteProperty System.String title_plain=Updated Console

Graphing in PowerShell

type NoteProperty System.String type=post

url NoteProperty System.String



As you’re developing scripts or tools that rely on the cmdlets we’re covering in this chapter, you’ll need to spend some time exploring results with Get-Member.

Now that we have the data, we can use it like we would any other object from PowerShell.

#a regex to strip out html tags


$converted.posts | Select Title,


@{Name="Date";Expression={[datetime]$_.Date}} | Format-List

You can see the result in figure 40.1.

Figure 40.1. Converted JSON data

Most of the time, though, you probably won’t need to use the JSON cmdlets much. Let’s look at another web cmdlet, Invoke-RestMethod, that attempts to do much of the heavy lifting for you.

40.4. Using REST services

Invoke-RestMethod is designed to interact with Representational State Transfer (REST) websites. For IT pros, all you need to know is that this cmdlet will return rich and often hierarchical content from a website.


REST is an architecture, not a technology, so expect a lot of variation in the way URIs are constructed between different REST web services.

The cmdlets will attempt to “decode” the content and give you an appropriate PowerShell object. If you use the cmdlets to get an RSS feed, you might get XML. If you query a site using JSON, the cmdlets will attempt to convert it from JSON for you. You shouldn’t need to useConvertFrom-Json. As a last resort, the cmdlet should give you the same type of content you’d get with Invoke-WebRequest. Let’s look at a few examples.

First, let’s get the feed from

$feed = Invoke-RestMethod -Method GET -Uri

You don’t have to use the –Method parameter because it defaults to GET, but if you develop the habit of using it you won’t forget when you need to use another HTTP verb, such as POST, with a REST web service.

If you pipe $feed to Get-Member, you’ll discover it’s an XML object. Here’s a sample:

PS C:\> $feed[0]

title : Episode 250 – PowerScripting Podcast – Julian Dunn from Ch...

link :

comments : {

pubDate : Thu, 12 Dec 2013 02:16:51 +0000

creator : creator

category : {category, category, category, category}

guid : guid

description : description

encoded : encoded

commentRss :

enclosure : enclosure

Because this is XML, we have a hierarchical object:

PS C:\> $feed[0].description



A Podcast about Windows PowerShell. Listen: In This Episode Tonight on the

PowerScripting Podcast, we talk to Julian Dunn abou...

So with a little work, we can transform $feed into something useful:

$feed | Select Title,



@{Name="Published";Expression={$_.PubDate -as [datetime]}},


@{Name="Category";Expression={$_.Category.innertext -join ","}}

We’re using the same regex pattern from earlier (see figure 40.2) to strip out any HTML tags in the description.

Figure 40.2. An RSS feed from Invoke-RestMethod

We could’ve achieved a similar result with Invoke-WebRequest, but we would’ve either had to parse the DOM data or taken additional steps to turn the results into XML. Invoke-RestMethod did all of the hard work for us. The following listing puts all of this together into a function called Get-RSS.

Listing 40.1. The Get-RSS function

RSS feeds can vary, and we’ve tried to accommodate as much as possible, but there’s no guarantee the function will work perfectly. In those cases, you’ll need to examine the results from Invoke-RestMethod to discover the correct property names and types. In the meantime, you can try it out with these commands:

Get-RSS -Path

Get-RSS -Path

Get-RSS -Path

Get-RSS -Path ""

Finally, we’ll take the JSON queries we discussed earlier but modify them to use Invoke-RestMethod:

PS C:\> (Invoke-RestMethod

"").tags |

sort Post_count -desc | select -First 5 |

Format-Table ID,Title,Post_Count

id title post_count

-- ----- ----------

4 PowerShell 380

8 Scripting 181

19 WMI 63

32 functions 62

10 Books 43

PS C:\> [regex]$rx="<(.|\n)+?>"

PS C:\> (Invoke-RestMethod


powershell").posts | Select Title,


@{Name="Date";Expression={[datetime]$_.Date}} | Format-List

title : Updated Console Graphing in PowerShell

Excerpt : The other day Distinguished Engineer and PowerShell Godfather

Jeffrey Snover posted a blog article about the evils of

Write-Host. His take, which many agree with, is that Write-Host

is a special case cmdlet. In his article he mentions

console graphing as an example. I wrote such a script earlier

this year. Mr. Snover’s post drove […]

Share this:EmailPrintPinterest

url :


Date : 12/9/2013 11:59:15 AM


We get the same results as before, but without having to deal with converting JSON data.

By now you’re probably wondering when you should use Invoke-WebRequest and when you should use Invoke-RestMethod. The answer: It depends. If you definitely know you’re dealing with a RESTful website, then Invoke-RestMethod is the way to go. If you’re pulling information from an RSS feed, use Invoke-RestMethod first. Otherwise, you’ll need to spend some time testing both cmdlets and deciding how to best parse or use the results. But there’s one more type of web resource you might want to take advantage of in PowerShell: SOAP.

40.5. Using SOAP web services

The Simple Object Access Protocol (SOAP) puts a wrapper around the whole serialization and deserialization thing. Essentially, SOAP allows PowerShell to “see” a web service as if it was a locally installed piece of software. You get an instance of the object—much like running Get-Service to get instances of service objects—and then you can play with the properties and methods of that object. Under the hood, it’s all XML and HTTP, but that’s all handled for you.

Let’s experiment with the IP-to-location web service, located at, which is referred to as the endpoint for the service. We have to start by creating a local web service proxy, which will serve as the translator between us and the web service. Doing so is easy:

$px = New-WebServiceProxy ""

This code asks the service for its Web Services Description Language (WSDL), which describes how the service wants to be used. PowerShell constructs a local proxy, which we’ve saved in $px. We can easily see what the service is capable of now by simply piping $px to Get-Member:

$px | Get-Member -MemberType Method

We’re only interested in the methods, but you could run this yourself and see what else is available. We got back a list with several methods, including GetGeoIP() among others (the list is a bit long; run the commands yourself to see the full output). We also see that GetGeoIP() accepts a string. Let’s try it, using the IP address of Google’s public DNS server,



We get this result:

ReturnCode : 1

IP :

ReturnCodeDetails : Success

CountryName : United States

CountryCode : USA

Neat! So basically, $px—our local web service proxy—is acting like a piece of locally installed software. In reality, it’s doing all the under-the-hood magic needed to query information from the web server. It gives us an object that works like any other PowerShell object. We can, for example, select properties:

$px.GetGeoIP($IPAddress) | Select IP,Country*

Unfortunately, there’s no central directory of every SOAP-enabled web service out there. The website publishes a lot of public web services, and they maintain a list of them all. You can also use search engines to find them, and you may have use for SOAP when working with internal web services inside your organization.

40.6. Just in case

We referred to a cookietest.php page several times in this chapter; in the event it becomes unavailable online, the following listing contains the PHP source code for it.

Listing 40.2. CookieTest.php




foreach ($_COOKIE as $key=>$value) {

echo "You sent cookie '$key' containing '$value' <br>\n";


foreach ($_POST as $key=>$value) {

echo "You sent field '$key' containing '$value' <br>\n";



<h1>Enter Details</h1>

<form name="testform" action="cookietest.php" method="post">

<input type="text" name="field1">

<input type="text" name="field2">


You can drop the code on any web server that supports PHP (including Windows’ own IIS) and it should work with the examples in this chapter.

40.7. Summary

Using the web cmdlets opens up many possibilities for IT pros. You’ll have to set aside some time to figure out the correct approach given the web resource, data, and what you intend to do with it. Despite the concept of web standards, we’ve seen enough variation in web content that it’s hard to come up with a one-size-fits-all script or function to consume web resources.

We didn’t cover every single parameter or feature of these cmdlets because some of them are for special-use cases. But we wanted to give you enough of a taste to whet your appetite. As with everything in this book, be sure to read the full cmdlet help and examples.