Case Study: Scaling MySQL Vertically (10 to 3 MySQL Servers w/ SSDs) - Scaling PHP Applications

Scaling PHP Applications (2014)

Case Study: Scaling MySQL Vertically (10 to 3 MySQL Servers w/ SSDs)

If scaling out horizontally is all the rage these days, why would you ever want to scale vertically? Hardware is getting faster and cheaper these days and it’s getting cheaper to scale vertically, to a point.

Scaling horizontally is when you scale by adding more servers to your cluster.

Scaling vertically is when you scale by adding faster hardware (CPUs, RAM, etc) to your existing hardware.

There’s a fine balance between the two, it’s an art. The goal is to maximize the cost/performance ratio, not choose one mantra over the other. If you scale horizontally without doing any vertical scaling, you end up with more complexity. If you scale vertically without doing any horizontal scaling, you end up with very expensive hardware and single points of failure, all of your eggs are in one basket.

It’s getting way cheaper to build out huge systems these days, even compared to two years ago. Basecamp purchased 864GB of Memory in January. Total cost: $12,000.

Even Amazon EC2 now offers servers with ridiculously fast SSD drives and 240GB of memory.

I’m not saying you should always scale vertically, but it’s worth weighing the costs of “yet another low-powered server” versus making an investment into purchasing better hardware.

At Twitpic, we found ourselves evaluating that very question earlier this year. One of our very read-heavy database clusters had grown over the past years from just a handful of MySQL servers to 10. Each of these 10 servers had a very expensive RAID-10 with a $1000 Adaptec Hardware RAID Card, several 15,000 RPM SAS Hard Drives, and a meager amount RAM. And, because of the RAID card and drives, these servers were expensive.

Anyways, we found ourselves peaking the capacity of all 10 servers. During busy hours, they were running at 100% I/O capacity, slowing down response times and causing unpredictable query speed. I predicted that at least another 5 servers were needed to keep performance at an acceptable, healthy range.

We had gotten bitten by the “scaling horizontally is always better” bug and hadn’t even considered the performance benefits or cost of using better hardware— we had always just ordered the same hardware that we were already using for our other MySQL servers.

A different approach

Having to purchase yet another 5 servers, I knew that I’d be back down this road in just a few more months, having to purchase even more servers. It was unsustainable.

I decided, instead, to take a step back and evaluate the hardware decisions we had made and why were we using this specific build.

Scrap the RAID-10

My first thought- we can scrap the RAID-10. It requires double the hard-drive capacity and we don’t need such high data integrity on our MySQL slaves. Since we took backups and scaled MySQL slaves with our HAProxy load balancer, we can just rebuild slaves when a drive fails.

In fact, we could get even better of a performance by using a RAID-0, normally unsafe, but since our master server is already using a RAID-10 and we take frequent backups, it’s completely safe in our scenario.

More RAM!

Our MySQL server build was configured with a tiny amount (16GB) of memory, a relic from the past when we were many orders of magnitude smaller. The fact we had gotten so far without ever considering to add more RAM was crazy. Nowadays, memory is so cheap that adding 64GB, 128GB, even 512GB to a single server is both cheap and completely normal on regular server hardware.

SSDs here we come

The last thing I evaluated were SSDs. I hadn’t expected the difference in performance between an expensive 4 drive RAID-10 setup and a cheap Intel SSD to be so staggering. In fact, further testing and benchmarking showed that a ~$2000 RAID-10 Setup and ~$2000 in Enterprise SSDs had about a 30x performance difference. This was in 2011. It’s even more stagging and ridiculous today.

The old RAID-10 setup maxed out around 5,000 IOPS while the SSDs dominated by pushing over 150,000 IOPS. Mind = Blown.

The end result

After evaluating our hardware and deciding to scale both vertically AND horizontally, we ended up not only saving a significant amount of money on servers, but we also sped up our infrastructure and reduced the overall complexity by having a lower total number of servers.

In fact, we were able to actually scale down from 10 servers to just 3 with the new hardware configuration, reducing our overall infrastructure costs. On top of that, with just those 3 servers, we’re only using 50% of our overall capacity. The 10 older servers were struggling to run at 100% of their capacity.

Less servers to monitor, administer, and host equals less complexity in your infrastructure and is a big win for everyone.