Responsive Web Design, Part 2 (2015)

Performance Optimization Roadmap

Deferring Non-Critical JavaScript

When the prospect of removing jQuery altogether became tangible as a long-term goal, we started working step by step to decouple jQuery dependencies from the library. We rewrote the script to generate footnotes for the print style sheet (later replacing it with a PHP solution), rewrote the functionality for rating comments, and rewrote a few other scripts. Actually, with our savvy user base and a solid share of smart browsers, we were able to move to vanilla JavaScript quite quickly. Moreover, we could move scripts from the header to the footer to avoid blocking construction of the DOM tree. In mid-July, we removed jQuery from our code base entirely.

We wanted full control of what was loaded on the page and when. Specifically, we wanted to ensure that no JavaScript blocked the rendering of content at any point. So, we used the Defer Loading JavaScript³² to load JavaScript after the load event by injecting the JavaScript after the DOM and CSSOM have already been constructed and the page has been painted. Here’s the snippet that we use on the website, with the defer.js script (which is loaded asynchronously after the load event):

function downloadJSAtOnload() {

var element = document.createElement("script");

element.src = "defer.js";

document.body.appendChild(element);

}

if (window.addEventListener)

window.addEventListener("load", downloadJSAtOnload, false);

else if (window.attachEvent)

window.attachEvent("onload", downloadJSAtOnload);

else

window.onload = downloadJSAtOnload;

However, because script-injected asynchronous scripts are considered harmful³³ and slow (they block browsers’ speculative parsers), we might look into using the defer and async attributes instead. In the past, we couldn’t use asyncfor every script because we needed jQuery to load before its dependencies; so we used defer, which respects the loading order of scripts. With jQuery out of the picture, we can now load scripts asynchronously, and fast. Actually by the time you read this chapter, we might already be using async.

Another issue we had to consider early on was the performance of our article pages. In fact, even after all these optimizations, they performed remarkably poorly. Articles with many comments were very slow because we use Gravatar.com for loading readers’ images when they leave replies to articles. Because each Gravatar URL is unique, each comment generates one HTTP request, and also blocks the rendering of the entire page. Therefore, we load a blank profile image at first and load Gravatar images asynchronously once the page has started rendering, and then replace blank images with the correct ones. This could be applied not only for comments but for pretty much any “skeleton” screens which would display the content right away and then load secondary or tertiary resources later.

Put simply, we deferred the loading of all JavaScripts that we identified previously, such as the syntax highlighter and comment ratings, and cleared a path in the header for HTML and CSS.

Inlining Critical CSS

That still wasn’t good enough, though. Performance had improved dramatically; but even with all of these optimizations in place, we didn’t hit that magical speed index value of below 1,000. In light of the ongoing discussion about inline CSS and above-the-fold CSS, as recommended by Google³⁴, we looked into more radical ways to deliver content quickly. To avoid an HTTP request when loading CSS, we measured how fast the website would be if we were to load critical CSS inline and then load the rest of the CSS once the page had rendered.

But what exactly is critical CSS? And how do you extract it from a potentially complex code base? As Scott Jehl points out³⁵, critical CSS is the subset of CSS that is needed to render the top portion of the page across all breakpoints. What does that mean? Well, you would decide on a certain height that you would consider to be “above the fold” content — it could be 600, 800 or 1,200 pixels or anything else — and you would collect into their own style sheet all of the styles that specify how to render content within that height across all screen widths.

Then you inline those styles in the <head>, and thus give browsers everything they need to start rendering that visible portion of the page — within one single HTTP request. You’ve heard it a few times by now: everything else is deferred after the initial rendering. You avoid an HTTP request, and you load the full CSS asynchronously, so once the user starts scrolling, the full CSS will (hopefully) already have loaded.

Content will appear to render more quickly, but there will also be more reflowing and jumping on the page. If a user has followed a link to a particular comment below the initially loaded screen, they will see a few reflows as the website is constructed; the page is rendered with critical CSS first (there is only so much we can fit within 14KB!) and adjusted later with the complete CSS. Of course, inline CSS isn’t cached. If you have critical CSS and load the complete CSS on rendering, it’s useful to set a cookie so that inline styles aren’t inlined with every single load. The drawback is that you might have duplicate CSS because styles would be defined both inline and in the full CSS, unless you’re able to strictly separate them.

Because we had just refactored our CSS code base, identifying critical CSS wasn’t very difficult. There are smart tools³⁶ that analyze the markup and CSS, identify critical CSS styles and export them into a separate file during the build process; but we were able to do it manually. Again, you have to keep in mind that 14KB is your budget for HTML and CSS, so in the end we had to rename a few classes here and there, and compress CSS as well.

We analyzed the first 800px, checking the inspector for the CSS that was needed and separating our style sheet into two files – and actually, that was pretty much it. One of those files, above-the-fold.css, is minified and compressed, and its content is placed inline in the <head> of our document as early as possible, so as to not block rendering. The other file, our full CSS file, is then loaded by JavaScript after the content has loaded, and if JavaScript isn’t available for some reason, or the user has a legacy browser, we’ve put a full CSS file inside <noscript> tags at the end of the <head>, so the user doesn’t get an unstyled HTML page.

Finally, we gradually tweaked the amount of inline CSS to avoid drastic reflows as the page is constructed. We do load some CSS for styling the comments area, in case the displayed page is a permalink to one of the comments. We keep refining and prioritizing what does and what doesn’t have to be inlined; with every refactoring session, the amount of CSS in the <head> gets smaller — which obviously is a pretty good sign.

Improving First-Byte Time and Delivery Time

Once you have a clear separation of concerns and are loading assets in a modular way, you have a healthy foundation to deliver those assets quickly. However, as we discovered early in 2013, our servers had major hiccups and massive performance bottlenecks causing high latency and quite slow first-byte time. Obviously, good front-end optimization doesn’t really help unless you are able to send data down the line quickly.

So we started looking closely at how we could improve the delivery performance on the back-end, and decided to move to new servers. Over time, we’ve set up solid state drives for pretty much everything that writes and reads out data, to avoid any latency or delays when processing that data. We’ve also set up a content delivery network (CDN) for serving images and static assets.

The load testing showed significant improvements, with an average response time of 0.07 seconds for 1,500 simultaneous requests for the home page, and an average response time of 0.24 seconds for a standard article page. This was pretty good, but could we also get improve how we delivered our assets? Well, let’s take a step back first.

Given the constraints and diversity of today’s web, it’s remarkable to see the HTTP protocol working so well, even on slow mobile connections. Nevertheless, HTTP/1.1 was designed and implemented for connections and bandwidth much different from those we use today: times of flourishing bulletin board systems, exploding FidoNet use and noisy modem handshakes with at best 56Kbps speed. By design, HTTP/1.1 does have a few shortcomings, such as a single request per connection, exclusively client-initiated requests, and the maximum of six to eight connections per domain (first introduced in browsers in the early 2000s). Furthermore, HTTP is slow, but HTTPS is (usually) even slower.

These issues will persist and cause latency no matter how much better the bandwidth — on cable and mobile networks — is going to get. In fact, according to Google’s research, the page load time will gain just a single digit percentage performance improvement after 5Mbps. At the same time, if we manage to reduce latency, we should expect linear improvement in page load time. And to do that, we have to look into ways of improving the communication between the server and the client.

The good news is that the issues are being actively addressed right now, with HTTP/2 currently on the roadmap for 2015. Among other things, the protocol provides a number of goodies for better performance: an unlimited number of parallel requests, quicker slow start, and better compression. Developers can also prioritize resources and push assets to the client. By design, the protocol is based on SSL/TLS, so it manages HTTPS traffic as well. The predecessor of HTTP/2, the SPDY (speedy) protocol, has been actively developed by Google over the last years, and is well-supported in major browsers³⁷. And since it’s backwards-compatible with HTTP/1.1, browsers that don’t support SPDY will run with the older protocol, while modern browsers will benefit from the advantages of the new.

This sounded very promising, so we decided to look into how we could further improve asset delivery using SPDY as well. As a protocol, SPDY requires client-side and server-side implementations. Since it’s well-supported in modern browsers (and we do have a very comfortable browser base to deal with, of course), we would need to set up the server-side implementation and get it running. In fact, SPDY modules are available for Apache 2.2 (mod_spdy) and Nginx (ngx_http_spdy_module); with Nginx in our back-end, it should have been a matter of hours to set it up.

Well, unfortunately, it wasn’t as straightforward as we thought it would be. Since SPDY is based on SSL/TLS, the entire asset delivery is also required to go through SSL/TLS, and this includes both our own content and third-party content. It’s common with third-party content that assets are out of your hands and there isn’t much you can do to influence what they look like or how they are delivered. One of our major advertisers was running large multisite advertising campaigns through their own ad management network (over HTTP), and a change of infrastructure wasn’t an option for them.

Now, if we moved to SPDY but had even just one resource fail to deliver data over SSL/TLS, our readers would keep seeing a mixed content error message every time an ad was delivered from that single advertiser. Despite all the configuration settings and server adjustments and green light from pretty much all the parties involved, a move to SPDY wasn’t possible after all — we would’ve had to lose a major advertiser. In our tests, however, we did see an improvement in page load delivery times of around 6–8%, again mostly due to an unlimited number of parallel connections. But SPDY was no-go at the time of writing; hopefully this will change very soon, especially with Google now using HTTPS as a ranking signal³⁸ as well.

In spite of that, the new server was set up and ready. We switched DNS, fixed a few minor bugs, monitored traffic, and started looking into another issue that had been in our backlog for quite some time: responsive images.