Mining the Social Web (2014)

Part III. Appendixes

Appendix B. OAuth Primer

Just as each chapter in this book has a corresponding IPython Notebook, each appendix also has a corresponding IPython Notebook. All notebooks, regardless of purpose, are maintained in the book’s GitHub source code repository. The particular appendix that you are reading here “in print” serves as a special cross-reference to the IPython Notebook that provides example code demonstrating interactive OAuth flows that involve explicit user authorization, which is needed if you implement a user-facing application.

The remainder of this appendix provides a terse discussion of OAuth as a basic orientation. The sample code for OAuth flows for popular websites such as Twitter, Facebook, and LinkedIn is in the corresponding IPython Notebook that is available with this book’s source code.

NOTE

Like the other appendixes, this appendix has a corresponding IPython Notebook entitled Appendix B: OAuth Primer that you can view online.

Overview

OAuth stands for “open authorization” and provides a means for users to authorize an application to access their account data through an API without the users needing to hand over sensitive credentials such as a username and password combination. Although OAuth is presented here in the context of the social web, keep in mind that it’s a specification that has wide applicability in any context in which users would like to authorize an application to take certain actions on their behalf. In general, users can control the level of access for a third-party application (subject to the degree of API granularity that the provider implements) and revoke it at any time. For example, consider the case of Facebook, in which extremely fine-grained permissions are implemented and enable users to allow third-party applications to access very specific pieces of sensitive account information.

Given the nearly ubiquitous popularity of platforms such as Twitter, Facebook, LinkedIn, and Google+, and the vast utility of third-party applications that are developed on these social web platforms, it’s no surprise that they’ve adopted OAuth as a common means of opening up their platforms. However, like any other specification or protocol, OAuth implementations across social web properties currently vary with regard to the version of the specification that’s implemented, and there are sometimes a few idiosyncrasies that come up in particular implementations. The remainder of this section provides a brief overview of OAuth 1.0a, as defined by RFC 5849, and OAuth 2.0, as defined by RFC 6749, that you’ll encounter as you mine the social web and engage in other programming endeavors involving platform APIs.

OAuth 1.0A

OAuth 1.0^[36^] defines a protocol that enables a web client to access a resource owner’s protected resource on a server and is described in great detail in the OAuth 1.0 Guide. As you already know, the reason for its existence is to avoid the problem of users (resource owners) sharing passwords with web applications, and although it is fairly narrowly defined in its scope, it does do very well the one thing it claims to do. As it turns out, one of the primary developer complaints about OAuth 1.0 that initially hindered adoption was that it was very tedious to implement because of the various encryption details involved (such as HMAC signature generation), given that OAuth 1.0 does not assume that credentials are exchanged over a secure SSL connection using an HTTPS protocol. In other words, OAuth 1.0 uses cryptography as part of its flow to guarantee security during transmissions over the wire.

Although we’ll be fairly informal in this discussion, you might care to know that in OAuth parlance, the application that is requesting access is often known as the client (sometimes called the consumer), the social website or service that houses the protected resources is the server (sometimes called the service provider), and the user who is granting access is the resource owner. Since there are three parties involved in the process, the series of redirects among them is often referred to as a three-legged flow, or more colloquially, the “OAuth dance.” Although the implementation and security details are a bit messy, there are essentially just a few fundamental steps involved in the OAuth dance that ultimately enable a client application to access protected resources on the resource owner’s behalf from the service provider:

1. The client obtains an unauthorized request token from the service provider.

2. The resource owner authorizes the request token.

3. The client exchanges the request token for an access token.

4. The client uses the access token to access protected resources on behalf of the resource owner.

In terms of particular credentials, a client starts with a consumer key and consumer secret and by the end of the OAuth dance winds up with an access token and access token secret that can be used to access protected resources. All things considered, OAuth 1.0 sets out to enable client applications to securely obtain authorization from resource owners to access account resources from service providers, and despite some arguably tedious implementation details, it provides a broadly accepted protocol that makes good on this intention. It is likely that OAuth 1.0 will be around for a while.

NOTE

“Introduction to OAuth (in Plain English)” illustrates how an end user (as a resource owner) could authorize a link-shortening service such as bit.ly (as a client) to automatically post links to Twitter (as a service provider). It is worth reviewing and drives home the abstract concepts presented in this section.

OAuth 2.0

Whereas OAuth 1.0 enables a useful, albeit somewhat narrow, authorization flow for web applications, OAuth 2.0 was originally intended to significantly simplify implementation details for web application developers by relying completely on SSL for security aspects, and to satisfy a much broader array of use cases. Such use cases ranged from support for mobile devices to the needs of the enterprise, and even somewhat futuristically considered the needs of the “Internet of Things,” such as devices that might appear in your home.

Facebook was an early adopter, with migration plans dating back to early drafts of OAuth 2.0 in 2011 and a platform that quickly relied exclusively on a portion of the OAuth 2.0 specification, while LinkedIn waited to implement support for OAuth 2.0 until early 2013. Although Twitter’s standard user-based authentication is still based squarely on OAuth 1.0a, it implemented application-based authentication in early 2013 that’s modeled on the Client Credentials Grant flow of the OAuth 2.0 spec. Finally, Google currently implements OAuth 2.0 for services such as Google+, and has deprecated support for OAuth 1.0 as of April 2012. As you can see, the reaction was somewhat mixed in that not every social website immediately scrambled to implement OAuth 2.0 as soon as it was announced.

Still, it’s a bit unclear whether or not OAuth 2.0 as originally envisioned will ever become the new industry standard. One popular blog post, entitled “OAuth 2.0 and the Road to Hell” (and its corresponding Hacker News discussion) is worth reviewing and summarizes a lot of the issues. The post was written by Eran Hammer, who resigned his role as lead author and editor of the OAuth 2.0 specification as of mid-2012 after working on it for several years. It appears as though “design by committee” around large open-ended enterprise problems suffocated some of the enthusiasm and progress of the working group, and although the specification was published in late 2012, it is unclear as to whether it provides an actual specification or a blueprint for one. Fortunately, over the previous years, lots of terrific OAuth frameworks have emerged to allay most of the OAuth 1.0 development pains associated with accessing APIs, and developers have continued innovating despite the initial stumbling blocks with OAuth 1.0. As a case in point, in working with Python packages in earlier chapters of this book, you haven’t had to know or care about any of the complex details involved with OAuth 1.0a implementations; you’ve just had to understand the gist of how it works. What does seem clear despite some of the analysis paralysis and “good intentions” associated with OAuth 2.0, however, is that several of its flows seem well-defined enough that large social web providers are moving forward with them.

As you now know, unlike OAuth 1.0 implementations, which consist of a fairly rigid set of steps, OAuth 2.0 implementations can vary somewhat depending on the particular use case. A typical OAuth 2.0 flow, however, does take advantage of SSL and essentially just consists of a few redirects that, at a high enough level, don’t look all that different from the previously mentioned set of steps involving an OAuth 1.0 flow. For example, Twitter’s recent application-only authentication involves little more than an application exchanging its consumer key and consumer secretfor an access token over a secure SSL connection. Again, implementations will vary based on the particular use case, and although it’s not exactly light reading, Section 4 of the OAuth 2.0 spec is fairly digestible content if you’re interested in some of the details. If you choose to review it, just keep in mind that some of the terminology differs between OAuth 1.0 and OAuth 2.0, so it may be easier to focus on understanding one specification at a time as opposed to learning them both simultaneously.

NOTE

Chapter 9 of Jonathan LeBlanc’s Programming Social Applications (O’Reilly) provides a nice discussion of OAuth 1.0 and OAuth 2.0 in the context of building social web applications.

The idiosyncrasies of OAuth and the underlying implementations of OAuth 1.0 and OAuth 2.0 are generally not going to be all that important to you as a social web miner. This discussion was tailored to provide some surrounding context so that you have a basic understanding of the key concepts involved and to provide some starting points for further study and research should you like to do so. As you may have already gathered, the devil really is in the details. Fortunately, nice third-party libraries largely obsolete the need to know much about those details on a day-to-day basis, although they can sometimes come in handy. The online code for this appendix features both OAuth 1.0 and OAuth 2.0 flows, and you can dig into as much detail with them as you’d like.

^[36^]Throughout this discussion, use of the term “OAuth 1.0” is technically intended to mean “OAuth 1.0a,” given that OAuth 1.0 revision A obsoleted OAuth 1.0 and is the widely implemented standard.