Build APIs You Won't Hate: Everyone and their dog wants an API, so you should probably learn how to build them (2014)

13. API Versioning

13.1 Introduction

Once you have built your wonderful new API, at some point it will need to be replaced or have new features added. Sadly there is no real consensus on which approach is the “best”.

The general advice you will find most experts giving is: try to limit change as much as possible. That is a very fair statement to make, but also seems like a bit of a cop-out. Regardless of how well planned your API is, your business requirements will likely be what kills you.

In the startup world where things are less structured this can be a killer. Kapture started off with “Opportunities”. which became “Photo Opps” and ended up being called “Campaigns.” You can laugh at that and say it will never happen to you, but it will. When you are least expecting it, business requirements will come at you like a wet mackeral to the face. When that happens, API versioning is your only solution.

Sure you could say that your API needs to maintain backward compatibility - but that is not very realistic when you are properly reusing your API across your product line. To demonstrate further, lets say you have 30 applications (and maybe a handful of external companies using the API), all of which are relying on the “customer” REST resource - your choices now are:

1.) Keep it backward compatible (and lose the million dollar sale because you couldn’t implement cool feature X)
2.) Change all 30 applications simultaneously to handle the new data (you likely don’t have enough resource to do this and deliver on time)
3.) Make the change, breaking the apps you don’t have time to upgrade, but get the sale. (of course you will fix the remaining apps in the future, right?)
– Source: Jeremy Highley, “Versioning and Types in REST/HTTP API Resources”

13.2 Different Approaches to API Versioning

As has been done in several other chapters, this chapter will outline several different approaches and list their pros and cons. In other chapters the final suggestion is generally implied to be a “better” solution, but in this chapter they are all compromises. Some are technically RESTful but incredibly complicated to implement, and are also complicated for your users to use. This means you have to put some real thought into the approach.

Throughout this chapter will be references to various popular services with public APIs and the type of API versioning they use. Credit goes to Tim Wood for compiling an extensive list in “How are REST APIs versioned?”, which will be used for reference in this chapter.

Approach #1: URI

Throwing a version number in the URI is a very common practice amongst popular public APIs.

Essentially all you do here is put a v1 or 1 in the URL, so that the next version can be easily changed.

https://api.example.com/v1/places

Due to being so prolific throughout various public APIs, this is often the first approach API developers take when building their own. It is by far the easiest and it does the job.

Twitter have two current versions: /1/ and /1.1/, which are both live at the the time of writing. This gives developers a chance to update any code that is referencing the old endpoints, so they can use the new ones. Most APIs would have called it /2/, but it was not a drastic change so perhaps they wanted a more subtle number.

Some say that URI versioning allows for a more “copy & paste”-friendly URL than other approaches (many of which involve HTTP headers) and this is supposedly better for support.

That might be true in some ways but seems like a bit of a misnomer. No RESTful API is ever going to be entirely “copy & paste”-friendly because there will always be headers involved: Cache-Control, Accept, Content-Type, Authorization, etc. Trying to make an entire API request fit in a URL just seems like a fools errand.

While the copy-paste argument is simply a lack of a positive, this versioning approach does have some potentially frustrating downsides.

The first thing people will say is that it is not technically RESTful. The idea is that a resource is meant to be more like a permalink. This link should never change, and it should always be there - just like a blog post. If the Internet is built around linking together and those links are changing all the time then… well things break. This might not be something you are too concerned about - especially if the API is internal - but it can be annoying for others.

For example, if you store the URL of an endpoint in your database for later reference, it might look like this:

https://api.example.com/v1/places/213

One day, you get an email from example.com saying that their v1 API is going to be deprecated in 3 months, and you need to start using the v2 API as soon as you can.

If you update your code to match whatever updated format, with whatever new fields or renamed fields the new version may contain then great, your new code will be ready to work with the new API version and you can start saving the new URL when you enter the record in your database. That works for new records, but you cannot leave the old records in there referencing the old API v1 URL.

So what do you do? One solution would be to string replace the old URL and hope the new URL is right:

https://api.example.com/v2/places/213

That might have worked, if it was not for the fact that you missed the note in the email that says they no longer use auto-increment IDs in their URLs (they read that it was a bad idea somewhere) and have decided to use slugs instead:

https://api.example.com/v2/places/taksim-bunk-hostel

Now what? The only solution here is to create a script that goes through each and every record in your database, hits their v1 API and gets information (hopefully that slug is available) and then constructs a v2 compatible URL to store.

If you do that with a few million records then you will probably hit some API limits fairly quickly. Twitter for example limits applications to 15 requests per endpoint per 15 minutes in some situations, so this would take about two weeks to update 1 million records.

Maybe that sounds like an edge-case, but putting the API version in the URL is asking for all sorts of obscure problems down the line, and asking your developers to manually construct resource URLs with string replacement is just rude. Peter Williams pointed this out in an article titled“Versioning REST Web Services” back in 2008, but everyone has been consistently ignoring him it seems.

Another downside with this approach is that pointing v1 and v2 to different servers can be difficult, unless you use some sort of Apache Proxy feature or nginx-as-a-proxy trickery. Generally speaking most systems expect the same path to be on the same server and doing otherwise can lead to overhead, so if v1 is PHP and v2 is Scala you can run into some trouble having them all set up on the same server.

The opposite of the “putting them on the same server can be hard” problem, is when API developers try and let one single code-base take care of this versioning internally in their web app. They simply make routes with the prefix /v1/places then when they want to make v2 they copy the routes, copy the controllers and tweak things. This can be done if you also version your transformers (to maintain structure and data types), and you are confident that all shared code (libraries, packages, etc) will maintain a consistent output throughout. This is rarely the case, and people putting v1 in their URLs are just doing it because it is the only solution they know.

Instead, consider making each version its own code-base. This means the code is totally separate, executed separately, with different web server vhosts or maybe even on different servers.

If the APIs are very similar (same language, same framework, etc) then you can simply share a Git history - be it different branch in the same api repository, or a different branch. Some people take the Git Flow model and prepends version numbers, so one repository may have the following branches:

· 1.0/master

· 1.0/develop

· 2.0/master

· 2.0/develop

As long as you share a Git history you can pull from the other repository or branch, and merge changes up from older versions to newer versions. This lets you fix bugs in multiple versions easily, instead of copying and pasting between all of your controllers in the the same code-base.

Popular APIs

· Bitly

· Disqus

· Dropbox

· Bing (lol)

· Etsy

· Foursquare

· Tumblr

· Twitter

· Yammer

· YouTube

Pros

· Incredibly simple for API developers

· Incredibly simple for API consumers

· “Copy-and-paste-able” URLs

Cons

· Not technically RESTful

· Tricky to separate onto different servers

· Forces API consumers to do weird stuff to keep links up-to-date

Approach #2: Hostname

Some API developers try to avoid the issues with server setup found with putting the version in the URI and simply put the version number in the hostname (or sub-domain) instead:

https://api-v1.example.com/places

This does not really solve any of the other problems. Having it in the URL in general (URI or sub-domain) shares all the same problems for API consumers, but it does at least reduce the chances of API developers trying to let one code-base handle it all.

Pros

· Incredibly simple for API developers

· Incredibly simple for API consumers

· “Copy-and-paste-able” URLs

· Easy to use DNS to split versions over multiple servers

Cons

· Not technically RESTful

· Forces API consumers to do weird stuff to keep links up-to-date

Approach #3: Body and Query Params

If you are going to take the URI version out of the URL, then one of the two other places to put it is the HTTP Body itself:

1 POST /places HTTP/1.1

2 Host: api.example.com

3 Content-Type: application/json

5 {

6 "version" : "1.0"

7 }

This solves the problem of URLs changing over time, but can lead to inconsistent experiences. If the API developer is posting JSON or a similar data structure then that is easy, but if they are posting with a Content-Type of image/png or even text/csv then this becomes very complicated very quickly.

Some suggest the solution to that problem is to move the parameter to the query string, but now the API version is in the URL again! Immediately many of the problems of the first two approaches are back.

1 POST /places?version=1.0 HTTP/1.1

2 Host: api.example.com

4 header1,header2

5 value1,value2

This… just do something else. Many PHP frameworks ignore the query string under anything other than a GET request, which goes against the HTTP specification but is still common. Having this parameter that moves around inside different content types in the body or sometimes in the URL or even always in the URL regardless of the HTTP Verb being used is just confusing.

Popular APIs

· Netflix

· Google Data

· PayPal

· Amazon SQS

Pros

· Simple for API developers

· Simple for API consumers

· Keeps URLs the same when param is in the body

· Technically a bit more RESTful than putting version in the URI

Cons

· Different content-types require different params, and some (like CSV) just do not fit

· Forces API consumers to do weird stuff to keep links up-to-date when the param is in the query string

Approach #3: Custom Request Header

So if the URL and the HTTP body is a bad place to put API version information, where else is left? Well, headers of course!

1 GET /places HTTP/1.1

2 Host: api.example.com

3 BadApiVersion: 1.0

This example was lifted from Mark Nottingham, who is the chair of the IEFT HTTPbis Working Group at the time of writing. That group is in charge of revising HTTP 1.1 and working on HTTP 2.0. He has this to say about custom version headers:

This is broken and wrong for a whole mess of reasons. Why?

First, because the server’s response depends on the version in the request header, it means that the response really needs to be:

1 HTTP/1.1 200 OK

2 BadAPIVersion: 1.1

3 Vary: BadAPIVersion

Otherwise, intervening caches can give clients the wrong response (e.g., a 1.2 response to a 1.1 client, or vice versa).
– Source: Mark Nottingham, “Bad HTTP API Smells: Version Headers”

Without specifying the Vary header, it is hard for a cache system like Varnish to know that somebody asking for 1.0 because the URL is any different than somebody asking for 1.1 or 2.0. That can be a big problem as API consumers asking for a specific version need to get that version, not a different one.

Beyond that rather tricky caching issue, it is just generally annoying. If you use a custom header then API consumers need to go and look at your documentation to remember which it is. Maybe it is API-Version or Foursquare-Version or X-Api-Version or Dave. Who knows, and who can remember.

Popular APIs

· Azure

Pros

· Simple for API consumers (if they know about headers)

· Keeps URLs the same

· Technically a bit more RESTful than putting version in the URI

Cons

· Cache systems can get confused

· API developers can get confused (if they do not know about headers)

Approach #4: Content Negotiation

The Accept header is designed to ask the server to respond with a specific resource in a different format. Traditionally many developers think of this in terms of only (X)HTML, JSON, Images, etc, but it can be more generic than that. If we can RESTfully ask for our data to come back with different content-types having different syntax, then why not reuse this exact same header for versions too?

GitHub follow the advice of many of the people named in this chapter so far, and use the Accept header to return different Media Types.

All GitHub media types look like this:

application/vnd.github[.version].param[+json]

The most basic media types the API supports are:

application/json
application/vnd.github+json

– Source: GitHub, “Media Types”

Basically if you ask for application/json or application/vnd.github+json then you are going to get JSON. Without specifying further, they will show you the current default response, which at the time of writing is v3 but could at any time change to v4. They warn that if you do not specify the version then your apps will break! Fair enough.

To specify the version, you must use Accept: application/vnd.github.v3+json, then if the default switches to v4 at some point in the future, your application will continue to use v3.

This solves the caching problem, solves the URL manipulation problems of the URL-based versioning approaches, is considered rather RESTful, but can confuse some developers. Maybe train them to get used to it, or maybe stick with URL-based versioning, but it is semantically more correct and generally works very well. This was done at Kapture for the internal API and it worked without problems.

The only downside is one that is found with all of the approaches mentioned so far, which is: If you version the entire API as a whole, it becomes very hard for API developers to upgrade their applications. It could be that only 10% of the API has changed between versions, but changing the version of the entire API can scare developers. Even with a changelog, it is hard for the developer to know if their entire application is going to completely break when they switch over. Even an extensive test-suite is not going to catch every issue with a third-party service like this because most developers use hard-coded JSON responses in their unit-tests to mock interactions.

If changing the version of the whole API is too much, the only other option is to version parts of the API.

Popular APIs

· GitHub

Pros

· Simple for API consumers (if they know about headers)

· Keeps URLs the same

· HATEOAS-friendly

· Cache-friendly

· Sturgeon-approved

Cons

· API developers can get confused (if they do not know about headers)

· Versioning the WHOLE thing can confuse users (but all previous approaches are the same in this)

Approach #5: Content Negotiation for Resources

Generally accepted to be the proper HATEOAS approach, content negotiation for specific resources using media-types is one of the most complex solutions, but is a very scalable way to approach things. It solves the all or nothing approach of versioning the entire API, but still lets breaking changes be made to the API in a manageable way.

Basically, if GitHub were to do this, they would take their current media-type and add an extra item:

1 Accept: application/vnd.github.user.v4+json

Alternatively, the Accept header is capable of containing arbitrary parameters.

1 Accept: application/vnd.github.user+json; version=4.0

This was suggested by Avdi Grimm and written about in an article by Steve Klabnik called “Nobody Understands REST or HTTP”. That whole article is a great rant containing lots of useful advice which was written in 2011. Again, most API developers seem to have ignored this advice or simply not known about it.

Picking between application/vnd.example.place.v1+json and application/vnd.example.place.json; version=1 will no doubt have pros and cons itself. Apparently Rails is not able to pick up the latter - or at least could not in 2011 - but that should not be considered much of a reason.

The other argument using the latter media type is that arbitrary parameter names can have the same confusion as arbitrary version header names, but developers can all just agree to just call it version. Right?

Whichever way you end up specifying the header, the advantage is not just specifying “I want the v4 API” but instead saying “I would like the v4 version of a place(s).” Services that provide an API can email their API consumers saying “We are updating the way places work, here is an example of the resource, here is what you need to change, specify the new version when you are ready.”

Partial updates like this ease third-party efforts to upgrade applications, and the chances of leaving developers stranded on an older version becomes far less likely.

Popular APIs

· GitHub

Pros

· HATEOAS-friendly

· Cache-friendly

· Keeps URLs the same

· Easier upgrades for API consumers

· Can be one codebase or multiple

Cons

· API consumers need to pay attention to versions

· Splitting across multiple code-bases is not impossible, but is hard

· Putting it in the same code-base leads to accidental breakage, if transformers are not versioned

Approach #6: Feature Flagging

This approach is something that so far I have only seen done by Facebook and its Graph API. Their approach is interesting, but not as common as some of these other approaches.

Facebook do not version their entire API with simple numbers like anybody else does. They do not version their resources, and they do not allow you to request different versions with headers, parameters or anything else.

They essentially make a custom version for each single client application. The way this works is there are various feature flags, which they call “Migrations.” They put out a migration every few months, write a blog, email API developers about it, and ask those developers to log into the developer area on the Facebook platform to manage their application.

Basically, they warn you that things are going to break in a few months. They list all the changes and give the chance to see if this will effect your application. If your application does not use an endpoint that is being changed, or they are removing a field your application does not use then you can click “Enable” for the migration. From that point on any interaction your application has with the Facebook Graph API will use the new format.

If you wait, eventually they will flip that switch regardless. This is considered a fair warning, and means they never have to support an old version for years. Facebook simply maintain one version with a few feature flags and those flags exist for a few months before that old code is removed. If your application still uses the old format then… tough.

This system to me has the most benefits, but one tricky part is that getting the timing right for that changeover is hard on API consumers. If your code is live looking at the old style, then you cannot push new code for the new style, because it will be broken until you flip the switch. That might only be seconds, but if you have multiple applications then you have to update and deploy all of them within minutes (or seconds) and then flip the switch.

Realistically speaking, that is very hard to do, so you end up with code having a lot of if statements, ready to look for fields that may or may not be there depending on the version. That leads to lots of extra code and you have to remember to remove it afterwards by shoving comment blocks throughout your code:

1 # @TODO Kill this when Facebook September 13 Migration is confirmed working

This is not insanely hard, but it can be complicated sometimes.

Generally speaking, the Feature Flag solution is the easiest for API consumers if the changes happen to hit a part of the API they do not care about. They do not need to be scared of changing to a whole new version of the API, they know their code will work, and things seem safer. If they dorequire changes then… well a few if statements never really hurt anyone.

13.3 Ask Your Users

None of these will have a drastic impact on your business, especially if you API is internal. If you are creating a platform as big as Facebook then maybe you need a solution as complex as theirs, but that is probably not the case.

My advice with versioning (as with most aspects of your API) is to know your audience. Twitter gets away with flagrant disregard for almost every single concept or principle that ever makes something RESTful whilst still calling it a REST API, so you can probably break a few rules too.

If I may leave others considering how to version their APIs with a final thought: Nobody will use your API until you’ve built it. Stop procrastinating. None of these are “bad” in any tangible sense, they’re just different.
They are all easily consumable, they all return the same result and none of them are likely to have any real impact on the success of your project.
– Source: Troy Hunt, “Your API versioning is wrong, which is why I decided to do it 3 different wrong ways”

The real truth is that all of the approaches are annoying in some ways, or technically “unRESTful” in some respects, or difficult, or a combination of it all. You have to pick what is realistic for your project in both the difficulty of the implementation and the skill/knowledge level of your target audience.

Conclusion

Thank you for reading the whole way through this book. This was a large and complex topic which I tried to turn into an interesting read with some humor.

It has been a really enjoyable experience and I have been blown away with the positive feedback. I have also received plenty of constructive criticism for which I am also grateful.

I will continue to maintain this book and I have some requests, suggestions and general ideas for improvements to make. But, if you have read this then the meat of the book is now the same as it will always be.

For example, I plan to;

· Improve the Behat test coverage on the sample application.

· Implement league\oauth2-server for the PHP sample app.

A dilema I am having currently is that any further explanation of RESTful API development is just going to be paraphrasing content in the HTTP 1.1 Specification. RESTful APIs respect as many aspects of the HTTP spec as possible, so headers like Accept-Language, Expires, Etag, Retry-After, etc could be catered for. A whole book could be written about the HTTP spec itself, so it seems somewhat outside the scope of this book. I will probably add in one last chapter on caching at a later point and leave it there.

This has been a great project, and a much needed break from writing code non-stop 24/7. Back to it I guess!

Further Reading

Here are some books that you should consider reading. While are not directly about API development, they are about related subjects. APIs must be secure. APIs need to be tested. APIs need virtual machines to run on, and they need provisioning tools to keep those virtual machines in check.

Building Secure PHP Apps - Is your PHP app truly secure? Let’s make sure you get home on time and sleep well at night.

The Grumpy Programmer’s PHPUnit Cookbook - Learning how to use PHPUnit doesn’t have to suck. Your code is untested and fixing bugs is tedious. You know you need something better, but time just doesn’t seem to be on your side. Making things “right” is costly and you need to deliver working code NOW.

Vagrant Cookbook - Learn how to create effective Vagrant development environments. This book will cover from basic to advanced concepts on Vagrant, including important ProTips to improve your Vagrant projects and avoid common mistakes. The book was updated to cover the new features on Vagrant 1.5, which are substantial compared to previous versions.

Laravel 4 Cookbook - This book is contains various projects built in the Laravel 4 framework, written by a well known Laravel 4 developer from sunny South Africa.