Heroku: Up and Running (2013)
Chapter 3. Understanding Performance and Scale
When talking about performance in apps, two factors affect your users: speed and throughput. Speed is how fast your application can take a request and return a single response. Performance, when most people talk about it, refers to raw speed. Throughput is how many simultaneous requests can be handled at the same time. If speed is a drag racer screaming down the track, then throughput is thousands of participants crossing the line of a marathon.
Speed and throughput are closely related. If the throughput limit of your application isn’t high enough, then individual users will likely see slower response times, or no response at all. Likewise, speeding up individual requests with no other changes by definition means your app can process more requests per second. It’s important to account for both of these factors when considering performance. So, how can you make your app faster while handling more load at the same time? You’ll just have to keep reading to find out.
Horizontal Scaling and You
In the past, if you had an application that needed to handle more concurrent connections, you could increase the RAM, upgrade the hard drive, and buy a more expensive processor. This is known as vertical scaling; it is very easy to do but is limited. Eventually, you’ll max out your RAM slots or some other component. Instead of adding more capacity to a single computer, you can horizontally scale out by networking multiple computers together to form a horizontal cluster. Although this allows virtually unlimited scaling, it is difficult to set up, maintain, and develop on. Luckily for you, Heroku takes care of most of the difficult parts for you.
Each dyno on Heroku is an identical stateless container, so when your app starts getting too many requests at a time and needs more throughput, you can horizontally scale by provisioning more dynos. Heroku takes care of the networking, setup, and maintenance. What has in the past taken hours of setup can be accomplished on Heroku with a simple command:
$ heroku ps:scale web=4 worker=2
You just need to make sure that your app is architected so that it can run on multiple machines.
Stateless Architecture
Although Heroku maintains spare capacity for your application so you can scale, you still have to build your app so that it can run across multiple machines at the same time. To do this, your app needs to be “stateless,” meaning that each and every request can be served by any dyno.
In the early days of the Web, it was easy to store files and “state” as logged information, to an application’s memory or to disk. This makes it much harder to horizontally scale, and can break user expectations. Imagine you sign up for a new web service, upload your picture, and click Save. The picture gets written to one of four machine drives used for this service. The next time you visit the site, your request goes to one of the three without your photo on the machine, which will make the website appear broken.
Many “modern” frameworks support and encourage stateless architecture by default. Even so, understanding the features of stateless architecture is crucial to using it correctly. There is a long list, but the two most common are using a distributed file store and managing session state. Let’s take a quick look at them.
Distributed file stores
Continuing with the website example, a file written to one machine that is not available to the others is a problem, so how do you get around it? By creating a distributed file store that is shared and accessed by all the applications. The most commonly used product for this purpose is Amazon’s S3. As a file, such as a user’s avatar image, comes into your app, it is written to S3 and can be read from all machines. Doing this also prevents your app from running out of disk space if too many files are uploaded. It is common to keep a record of the file in a shared database between dynos so that files can be located easily. There is also an add-on called bucket that provisions a distributed file store for you.
It is inadvisable to write data directly to disk on a Heroku dyno. Not only is it not scalable, Heroku uses “ephemeral” disks that are cleared at least once every 24 hours. Most frameworks support easy integration with third-party distributed file stores, so take advantage of them. Now that you don’t have to worry about file state being different on different machines, let’s take a look at session state.
Session state
When a user logs in to a website, information about that user is captured (e.g., email, name, when she last logged in, etc.). It is common to store some identifying information about the user in an encrypted cookie on the user’s machine. This cookie usually contains just enough information to look up the needed information. If your application is storing the related details, name, email, and so on (either on disk or in application memory), then a user will appear logged out when visiting different machines. To fix this, we need a distributed data store to keep that information so all dynos have access. It is common to store this information in a relational data store such as Postgres, or in a distributed cache store such as Memcache.
Many popular frameworks, such as Rails, store session state this way out of the box, so you don’t need to worry about it. If you’re using a lighter framework, or some custom code, then you should double-check that session state is safely shared between dynos.
Now that you understand why your application should be running statelessly, let’s take a look at the Dyno Manifold. This is the component in Heroku’s stack that allows us to easily scale our dynos.
Dyno Manifold
The Dyno Manifold always maintains spare capacity that is ready and waiting to receive new running dynos at a moment’s notice. Let’s say your application is new and running on a single dyno, having just been freshly deployed, and now you want to scale it up to six dynos. A simple request to the Heroku API issues commands to the platform to create five new dynos within the spare capacity and deploys your application to them. Within seconds you now have six dynos running and therefore six times the capacity that you had only a few seconds ago.
PERFORMANCE VERSUS SCALING
Let’s clarify a common misconception of what scaling will do to your application. A common belief is that more servers (or in this case, dynos) causes increased performance for your application. This is incorrect. Scaling your application gives you an increased concurrency (the ability to serve more than one request at the same time).
Imagine, for a moment, an application that has a mean request time of 100 ms. In an ideal world, in any given second, this application, when serving a single request at a time, can thus serve 10 requests a second. Adding a second dyno will not change this, as your code will always take a finite time to process; only software engineering and optimization will reduce this time. This second dyno, though, does mean that your application, across the two dynos, is now able to serve twice the number of requests at a time.
Autoscaling
Autoscaling is a myth. Heroku does not autoscale your app for you, but not to worry—it’s pretty simple to learn when to scale and even easier to do the scaling.
Why doesn’t Heroku just build an autoscaler? It is super easy, I have a friend who has an algorithm that…
— Everyone ever
Scaling is a surprisingly complex problem. No two applications are the same. Although scaling usually means adding extra dynos for extra capacity, that’s not always the case. If the bottleneck is with your database, adding extra frontend dynos would actually increase the number of requests and make things even slower. Maybe your Memcache instance is too small, or, well, you get the point really—it could be a thousand other tiny application-specific details.
One concern to be aware of is how your backing services may be crushing your ability to scale. One easily overlooked limitation of horizontal scalability is database connections. Even the largest of databases are limited in the number of connections they can accept. Provisioning additional dynos will give your application more processing power, but each of these dynos will require one or more connections. It is important to be aware of this maximum number and to begin to take appropriate preventative measures such as sharding your database or using a collective connection service such as PGBouncer.
At the end of the day, the best autoscaler is the one reading this book. You know your application better than any algorithm ever could. Heroku can give you access to the tools to measure and make scaling easier than ever, but there should be a few eyes on the process. External services can provide autoscaling behavior, but when it comes to capacity planning and performance, there is no substitute for a human. Also, if any of your friends solve this problem, they could make a good chunk of change making an add-on.
Let’s take a look at how you can estimate your required resources.
Estimating Resource Requirements
By understanding your mean request time, and knowing what sort of traffic levels you receive at a particular time of day, it is possible to predict approximately how many dynos you require to suit your application needs.
For example, imagine you have a peak load of 1,500 concurrent users generating traffic at 500 requests a second. You can check your logs or use a monitoring tool such as New Relic to know that your mean request time is 75 ms. We can simply calculate the number of required dynos by working out how many requests per second a single dyno could handle (1,000 ms divided by 75 ms equals 13 requests) and how many dynos at 13 req/sec are required to serve 500 requests (so 500 divided by 13). This gives us the final answer of 38 dynos to handle your traffic:
Requests Per Second / (1000 / Average Request Time in ms)
Note, though, that it’s not quite that simple. While predication and some simple math will get you so far, some application containers, such as Unicorn and Tomcat, are able to serve multiple requests per second, so in these cases one dyno may be able to serve three or four concurrent requests alone. Traffic also fluctuates massively, so predicting a throughput for your application is generally quite difficult unless you have a wealth of historical trends for your particular application. The only sure-fire way to ascertain the required resources for your application is to bring up the approximate number of dynos you believe you require, and then monitor your application as the traffic comes in.
Request Queuing
It is worth mentioning at this point the concept of request queuing. When a request arrives, it is randomly sent to one of your applications and available dynos for processing. Should a dyno be unavailable for some reason, it is dropped from the pool of dynos receiving requests automatically. However, remember that while this is happening, the router processing the request is counting down from 30 seconds before returning an error to the user. Therefore, it is important to monitor your application’s queue length to see if more resources are required to handle the load. Because of this, it can be helpful to offload non–HTTP-specific tasks to a background worker for asynchronous processing.
Speeding Up Your App
Now that you understand how to scale your app for throughput, we will take a look at a few different ways to speed up your response time. After all, everything today is measured in time, and when it comes to applications, the quicker the better.
Because speed refers to the amount of time it takes to get a response to the user, we will cover a few backend performance optimizations as well as some on the frontend. Although there are big gains to be made in the backend, a browser still takes time to render content, and to visitors of your site, it’s all the same.
We’ll start off with some very simple frontend or “client-side” performance optimizations and then move on to more invasive backend improvements. Let’s get started with how we serve static files to your visitors.
Expires Headers
When you visit sites like Facebook and Google, you might notice large logos, lots of CSS, and a good amount of JavaScript if you were to view the source. It would almost be insane to have to redownload all of these files every time you clicked on a link or refreshed the page; instead, your browser is smart enough to cache content it knows will not change on the next page load. To help out browsers, Google’s servers add a header to each static file; this lets a browser know how many seconds it can safely keep a file in cache:
Cache-Control: public, max-age=2592000
This tells the browser that the file is public and can serve the same file for the next 2,592,000 seconds. Most frameworks will allow you to set the cache control globally.
While setting an expires header far into the future can be good for decreasing repeated page loads, you need to make sure that when a static file is altered (i.e., when a color is changed from red to blue), that the cache is properly invalidated. One option is to add a query parameter to the end of the filename with the date that the file was last saved to disk. So a file called smile.png would be served as smile.png?1360382555. Another option is to take a cryptographic hash of the file to “fingerprint” it. If you were to take an MD5 hash over your smile.png, it might produce a “hash” of908e25f4bf641868d8 683022a5b62f54, so the output filename would be something along the lines of smile-908e25f4bf641868d8683022a5b62f54.png. The key here is that the URL to the static file is different, so the browser doesn’t use the old cached file.
Different frameworks support different cache invalidating strategies, so check the documentation of your favorite frameworks for more information.
Now that you know how to reduce repeated page load time, how can we reduce all page loads? One option is to use a CDN.
Faster Page Response with a CDN
Heroku’s Cedar stack has no Nginx or Apache layer to serve static files. While this means more system resources on a dyno to run your application, and more control, it also means that your app has to use extra resources to serve static files or assets. Instead of using valuable dyno time to serve these files, you can serve files from an edge-caching CDN.
CDN stands for Content Delivery Network. A CDN is made up of many different machines that are spaced all across the globe and are geolocation aware. By moving your assets to a CDN, not only do you reduce load on your app, you also decrease latency of each individual file request. A CDN is likely closer to your visitor’s location than your app running on Heroku. The math is pretty simple: the less distance data has to travel, the quicker it gets there. In the past, using a CDN to serve these static files meant having to copy assets to a third-party server. Instead of copying files using an “edge cache,” the CDN pulls assets straight from your application and caches them, so there is no syncing involved.
In general, you will set up a CDN to pull from your app example.herokuapp.com and in return you will get a dedicated CDN subdomain such as jbn173nxy4m821.cdndomain.com. Once in place, you can then get to your assets via that subdomain, so an image assets/smile.png would be located at http://jbn173nxy4m821.cdndomain.com/assets/smile.png. This will tell the CDN to download that image and serve it directly from the edge caches. There are various services that provide edge cache CDNs, such as CloudFront and CloudFlare, that can be manually integrated with your Heroku application.
As more and more Internet traffic goes global, your visitors depend on you to provide them with the quickest experience possible. Using a CDN in your application is one way you can help to meet their needs.
One of the most notorious causes of slow apps, as we’ve mentioned, is the database. Let’s take a look at what we can do to speed things up there.
Postgres Database Performance
Improper use of any data store will lead to slow performance and a slow application experience for any of your visitors; Postgres is no exception. This relational database offers many ways to tweak settings and measure performance. Here we’ll look at just the most common causes of slow app database performance: N+1 queries and queries in need of an index.
N+1 queries
One of the most common database problems comes not from improper setup but from accidentally making many more queries than are needed. Imagine you just started a new social network where a user can have friends, and those friends can all have posts. When a user loads her user page, you want to show all of her friends’ posts. So, you grab all of the user’s friends, iterate over that array to grab all of the posts from a database, and then show that result. If a user has 100 friends, and there is an average of 9 posts per friend, you just had to make (100 × 9) + 1 = 901 database queries. This problem is called N+1 because we’ve had to make 900 queries to get the data we want plus the first query to grab our user. Even if each of these queries take a fraction of a second, they quickly add up, and the more friends a user has, the longer it will take for the page to load.
Instead of doing these calculations manually, it is much easier to tell the database about all of the data we want. Postgres is a relational data store, so we can do this using joins to grab the exact values we need. This common pattern is known as eager loading since we’re using a join to preemptively grab the extra data we know we will eventually need.
The best way to look out for this cause of slowness is to check your application logs in development, assuming that queries to the database are logged. If you have this problem, you’ll likely see hundreds of SQL queries per request where you should only see a few.
Properly using relations can help tremendously in allowing the database to do its job. However, you still may find that your application is slow; perhaps it could benefit by adding an index.
Queries in need of an index
Relational databases have been around for decades—plenty of time for optimizations and speed improvement, but it doesn’t mean that they can read your mind. If you know ahead of time what columns you will be querying your table with, you can dramatically speed up that query time by using an index. When you ask a database to find a row in the users table that has a value of foo@example.com in the email column, the database will perform what is called a “sequential scan” to find the row you want (i.e., it looks at each field to find the value you’re looking for, and once it does, the value is returned). By adding an index to a column, the database keeps metadata about the structure of the column so that it can find the value you’re looking for with many fewer data reads.
So, if adding indexes to a column can speed up query performance, how do we know if we need to add an index to a column? Get the database to explain it to you.
Explaining Postgres Performance
Postgres has a built-in EXPLAIN function that can be invoked in a query that returns the plan and execution time of the query.
For example, let’s say we have a query that is taking longer than we would like:
SELECT * FROM users WHERE id = 1 LIMIT 1;
We can get more information by adding the EXPLAIN keyword before the query like this:
EXPLAIN SELECT * FROM users WHERE id = 1 LIMIT 1;
QUERY PLAN
---------------------------------------------------------------------------------
Limit (cost=0.00..8.27 rows=1 width=1936)
-> Seq Scan on users (cost=0.00..9.68 rows=1 width=1936)
Here you can see that Postgres chose to sequentially scan the users table (i.e., it looks at each of the rows in the table until it finds the one you’re looking for).
Sequential scans are expensive for large datasets and should be avoided. Frameworks such as Rails have the ability to “autoexplain” any queries that run over a given threshold while in development.
If you consistently see sequential scans over tables with many rows, consider adding indexes to speed up performance. When you do, Postgres will tell you when it is scanning using an index:
-> Index Scan using users_pkey on users (cost=0.00..8.27 rows=1 width=1936)
So, when your data store starts getting slow, use logs and EXPLAIN to isolate and fix the problem. Once you’ve tuned your database, your queries may still have some calls to your database that will continue to be slow. Because the quickest database call is the one you never have to make, you can cache the results of long-running queries into a distributed cache store.
Caching Expensive Queries
Heroku offers many different add-ons that provide quick and efficient key/value stores that can be used to cache queries. Memcache and Memcachier are two add-ons that both provide access to a volatile cache store memcache. It is common for developers to use memcache to store “marshaled” data. (Marshaled data is persisted in a way that allows for simple transmission between systems: e.g., as a JSON or YAML serialization.) This provides developers with the ability to store not only primitive values like integers and strings, but also complex custom data objects such as users, products, and any other class imaginable (as long as your language provides the ability to marshal and then dump those data structures).
There are other key/value data stores known for their performance, such as Redis. We can’t compare the costs and benefits of all of them here. The important thing is to remember that for calculations that are system intensive, caching the results can dramatically speed up your results.
Back that Task Up
Not everything your application does needs to be within the request/response cycle of an HTTP request. If a new user signs up to your application, she doesn’t need to sit around and wait for a few milliseconds while your SMTP server responds because you’re sending her a welcome email. Your visitor probably doesn’t care if that email gets sent out immediately, so why are we making her wait for it before rendering the next page? Instead, kick this action off to a “background task” that can be handled by a separate “worker” dyno.
On Heroku, you can start a separate worker dyno by adding it to your job Procfile. If you’re using Resque with Ruby, your Procfile might have something in it like this:
worker: bundle exec rake resque:work VERBOSE=1 QUEUE=*
Here you’re telling Heroku to run the task resque:work in a set of dynos classified as worker and look at all of the queues *. There are a number of these worker libraries in any number of different languages. The general concepts are the same. You store a task—such as sending out email, performing an expensive calculation, or making an API request—in some standard format in a shared data store such as Redis or PostgreSQL. The worker dyno reads the task data in from the data store and performs the necessary action. For some tasks, like email, the worker just needs to send an outbound communication, but for others it might need to write a value back to a database to save a calculation.
This will speed up individual requests, but it also improves your throughput, since each request now requires less computation. If you find that your background worker becomes slow or is not capable of catching up to all of the tasks given to it, you can scale it independently of your web dynos.
Another common component of applications is some kind of full-text search. Let’s look at how we can move beyond like queries to go faster and return more relevant results.
Full-Text Search Apps
The dirty secret of most new web applications is that the search box prominently featured on the top of the page is in fact backed by a simple like database query:
select * from users where name like '%schneem%';
This strategy can work well for development and prototyping, but it scales horribly. The more entries you have in your database, the worse speed and relevancy will be.
The like query performs a sequential scan over your desired text. It also doesn’t help you rank the relevancy of your results. Modern full-text search is capable of also “stemming” words so that it knows that “cars” and “car” are talking about the same root item. They also get rid of common low-quality words like “an,” “the,” “and,” and so on.
On Heroku, you have two possibilities if you want to use full-text search. You can use a full-text search add-on such as web solr, Bonsai Elasticsearch, or Sphinx; or you can use the built-in full-text search in the PostgreSQL database.
If your database has heavily read traffic already, adding on a full-text search will only increase the load; instead, you may want to move this functionality to a full-text search add-on. None of these add-ons will turn you into Google overnight, but many of the underlying technologies have had years of optimization. Using default configurations makes it possible to get pretty good performance, though optimizations can produce huge gains no matter where you’re doing your full-text search. The downside to using a full-text search add-on is that it adds an extra piece of complexity, and your app has to sync the text you want to be searchable.
Whichever option you end up choosing, don’t ship your “search” feature powered by a like query.
Performance Testing
We’ve talked a lot about backend optimizations and frontend performance boosts, background workers, and full-text search engines, but at the end of the day the only thing that matters is that your site responds faster and remains scalable. It is important to measure your application’s performance so you’re able to determine how effective your improvements have been and to detect any regressions. This is easier said than done, as speed can be heavily influenced by network effects and the load the app is under during testing.
If you’re interested in getting real-world performance measurements, there are two ways to keep track of your app: use in-app performance analytics services or use external load and performance testing tools.
In-App Performance Analysis
Wouldn’t it be great if we could get our SQL log data and other backend performance characteristics in near real time with easy-to-understand formatted graphs? That’s exactly what services like New Relic aim to do. Many languages and frameworks have instrumentation that allow a client running on production machines to measure and report to these services.
With these services, you can see details such as time spent fetching results from Memcache, running SQL queries, performing garbage collection, and anything else that your framework is instrumented to measure. You may choose to collect these numbers all the time in production, or only while you’re actively working on performance tuning.
External Performance Tools: Backend
Aside from measuring your application performance from within your app, you can analyze the frontend and backend performance characteristics using third-party tools.
To test the throughput and response time of your app, you can use a service that generates traffic and hits your staging server. These services are best used for measuring backend speed because they don’t take into account the time a browser takes to render a page. Heroku has several add-ons that can load and test your application, including BlitzIO and Loader.io.
Keep in mind that the performance of each run can vary, so you may want to perform these tests several times to get a good baseline. Once you have that data, you can use it to determine the effectiveness of speed-related changes. You can also use a load testing tool like this in combination with in-app analytics to determine which parts of your app are the first to slow down under increased load.
Load testing tools are a great way to get more visibility into your backend, but what about the page render time on the client side?
External Performance Tools: Frontend
Most browsers ship with some form of a request inspector that can be used to measure performance of page loads. For example, Chrome has a “Network” inspector. This Network tab will show you exactly how long an individual asset takes to download:
The Chrome inspector and similar tools for other browsers will show you where time is being spent, but they won’t necessarily tell you what can be improved. One simple, yet useful, tool for this is YSlow, which will attempt to “rank” the frontend performance of your app and give you general advice on how to improve.
Being Fast
Applications aren’t fast by accident; their developers take the time to understand and measure, and then improve areas found to be lacking. Performance is gained by careful work and can be lost with negligence. Setting aside time dedicated to measuring and improving speed is crucial. Consider taking a few hours at the end of the week with your team to look at application performance. Institutionalizing “Fast Fridays” can be a great way to shave off some of the performance debt that gets built up between rushed deadlines and shipped features.
If the performance of your application is crucial enough, you can consider instrumenting performance tests that benchmark and warn of dips in speed.
Finally, don’t be afraid of application performance. Take slow, measured steps. Gather data, find the slowdowns, and then implement the fix. Applications don’t become slow in a day, so expect to take some time to speed yours up. Performance is incremental. Enjoy the process of discovering more about your application, and be pleasantly surprised when your users start raving about how fast your app runs.