Coding and Debugging at Scale - Scaling PHP Applications

Scaling PHP Applications (2014)

Coding and Debugging at Scale

We’ve gone through every layer of the stack and we’ve finally made it down to the most uninteresting layer- the PHP code! Everyone wants to know tricks and cheat codes for optimizing PHP. Does this code run faster than that? Well, let’s settle some debates.

Scaling your code

Everyone wants to know “scaling” tactics. They think that big sites are doing things like using for loops instead of foreach because they’re marginally faster or using " instead of ' for strings. It’s just not the case. Microoptimizations are the pick up lines of scaling— they don’t work and they make you look like a fool. Scaling your code is the easiest part of scaling the stack- you just add more application servers. It’s completely horizontal. As indulging as it seems, you’d gain far bigger payoffs by optimizing your database or cache instead of trying to squeak out 1 or 2% performance gain in your code. If raw speed is that important- just throw money at the problem. More application servers and more GHz (remember- PHP is single threaded- more GHz, not more cores, will improve response time).

Scale algorithms, not syntax (but write clean syntax anyways)

Here’s the bottom line- none of the code stuff really matters. The very marginal speedup that you’d gain is totally flushed down the toilet as soon as you hit the database or cause the garbage collector to run. I guarentee you’ll find 15 low-hanging infrastructure optimizations for every single code-path optimization you find.

I’m not saying to write bad code- write sane code and optimize the blatantly poor code but don’t go chasing the 2% stuff. The most obvious offenders (as with any programming language) are-

Nested Loops (Big O and Algorithm Analysis)

If you need more than 3 levels of indentation, you’re screwed anyway. - Linus Torvalds

If you can code around it, try to avoid a loop inside of a loop. Simple enough- more loops, more code being run. If you know Big O/Algorithm Analysis, you want to avoid O(n²) and O(n³)

An Example of O(n²):

1 $foo = range(0, 100000);

2 $bar = range(0, 100000);

3

4 foreach ($foo as $f) {

5 foreach($bar as $b) {

6 // This code will run 100000 * 100000 times

7 // You'd want to focus your attention on optimizing

8 // this code, ideally removing the loop.

9 }

10 }

Objects or arrays?

Common sense says that arrays should use less memory than an object. There is just more “stuff” that tags along for the ride when you use an Object versus a list of values in an array.. right?

Not so in PHP 5.4— Objects can actually optimize the way they store data and end up with a smaller memory footprint than arrays. Convenient for both performance AND readability. Gone are the days of using arrays over objects for “performance reasons.” An in depth post on the subject can be found here.

Use the standard library, luke

Just quickly worth talking about- when possible, always prefer using built-in standard library functions over your own functions, if possible- remember, the PHP standard library is written in C so it is much faster than PHP code. This especially applies to the array_ functions, which can often be 3-10x faster than the same algorithm coded in PHP.

The True Evil of Premature optimization

Premature optimization is a true evil. I hope that if you’ve learnt one thing from this book, it’s to make smart scaling decisions and shoot for scaling out the big pieces that are going to improve performance the most. There’s alot of bad advice on the internet from archair experts that give scaling advice like “never use temporary variables! change double quotes to single quotes, they’re faster!”. As a smart reader, I know you’re not falling for traps like that, but it can be hard to wade through all of the BS.

Except for rare circumstances, where you’re doing something really inefficient, your PHP code will likely never be the cause of scaling issues. PHP’s virtual machine tends to be pretty poor performance wise, but because the PHP standard library is written in C, your code still tends to run fast.

Note- I said your CODE, not your architecture. But swapping out double quotes for single quotes won’t make any difference. You’re a smart reader, though, so I know you won’t be falling for those traps.

To Framework or not to framework?

Choosing whether or not to use a framework is a no-brainer. I’ve never heard of a site failing because they chose to use a framework. If you look any other language for the web- Python, Ruby, Java- all of these guys are using MVC frameworks like Django, Rails, and Play.

PHP seems to have a different sentiment in it’s community, though. People say frameworks are slow. Or that they’re bloated. Or too complex. While some of this may be true, it doesn’t represent modern PHP frameworks- I think the hate is just a defense mechanism for guys that don’t understand or “get” MVC.

You can use one of the “traditional” PHP frameworks that have been around forever like Zend, Kohana, Symfony, or CakePHP, but alternatively check out some of these newer frameworks that tend to be more modern and “lightweight”.

· Laravel

· FuelPHP

· Fluf PHP

· Slim

How about DIY frameworks?

So you want to build a DIY framework for your product? Don’t do it- learn from my mistakes!

Managing your own framework is like building two products instead of one. Not only do you have to worry about coding your application, but you now also need to deal with writing all of the underlying framework code for it too. This is likely the difference between failing miserably and succeeding.

Truthfully, I fell into this trap with the second version of Twitpic. We built out our own framework. It was difficult- not because writing a MVC framework is hard, but because handling all of the edge cases, gracefully failing, and covering all of the features we needed takes a lot of code.

It made development painful. Want to add a new feature to the app? Let me just code it up… oh wait, we’re missing a function from the framework to do that, so before I can add that new feature to the app, I need to fill in the missing framework code. It slows you down.

If you want to build a framework to get a better grasp on how they work- that’s great. But sure as hell don’t use it in production. Learn from my mistakes- you’ll kick yourself in the butt later.

ORMs aren’t evil

Related to frameworks, let’s talk about ORMs for a second- usually the “Model” part of your application. Plain and simple, they are very useful- ORMs simplify your code and make it easier because you can just call methods instead of having to write SQL.

But.. but.. if I can’t write SQL, then it’s going to be inefficient! In my experience, most mature ORMs generate almost the exact same SQL that I would have written. Sure, there are some edge cases around JOINs that might cause problems, but most ORMs let you pass in your own SQL to handle those situations.

At Twitpic, we use an ORM library and I wouldn’t ever think about not using one. The benefits outweigh the slight performance overhead- and to be honest, most of the “controversy” surrounding ORMs comes from pre-mature optimizers and is often incorrect. Our ORM allows us to pass in raw SQL for complex-to-generate queries, but if I had to do it again I would leave this feature out- as we’ve scaled our architecture, we’ve optimized most production JOINs out, so any off-the-shelf ORM would have done.

I’m not saying never write SQL. For example, ORMs usually don’t handle updating or deleting multiple objects very well (also called N+1 queries). Consider something like this-

1 <?php

2

3 // I want to delete all of the "notes" this user has added.

4

5 $notes = Note::find_by_user_id($user_id);

6

7 // Loop over each note and delete it, effectively generating

8 // DELETE FROM notes WHERE id = $note_id

9 foreach($notes as $note) {

10 $note->delete();

11 }

The example above will generate and execute a SQL statement for each note that exists. If you were doing your own SQL, you could have just executed DELETE FROM notes WHERE user_id = $user_id instead and handle the whole delete in a single query. This isn’t usually a big deal, but if you’re deleting (or updating), say, 5000 items, it can have a meaningful impact. Luckily, it’s not a common design in most web applications, but this is a case where I’d just write my own SQL instead of relying on an ORM.

Although most frameworks package thier own ORM, if you aren’t using a framework or have your own in-house framework, there are a few open-source standalone ORM packages that you can use. Similar to recommending against DIY frameworks, I also don’t recommend DIY ORMs, as they are especially complex and have many edge cases to consider. Here are some modern options-

· PHP ActiveRecord

· Propel

· Doctrine

Capistrano for code deployment

Hopefully you’re not still stuck deploying code by dragging and dropping over FTP, or even worse- editing files on your production machine! If you are, you need to get yourself some Capistrano. Besides being horribly inefficient and prone to failure, manual deployment doesn’t scale well past one server. Capistrano fixes that. You run a command on your computer (cap deploy) and it deals with pushing new code to all of your servers.

And it’s easy to setup, you can be rolling in the next 10 minutes.

You need to have Ruby installed on your computer. Capistrano is language-agnostic, but itself is written in Ruby.

You’re installing Capistrano on the computer you’re deploying from, i,e your development machine. Not your application server.

1 > gem install capistrano

Now, go into your project and make a new file named Capfile.

1 load 'deploy' if respond_to?(:namespace)

2 load 'app/config/deploy' # The path to your deploy.rb file

Your Capfile is read by Capistrano and tells it where to find deploy.rb, the configuration file for your deployment options. In my case, my deploy.rb file is located at ./app/config/deploy.rb, but you can change the second line of the Capfile to wherver you want to put your deploy.rb file.

Next, we need to create the deploy.rb file. Mine is in app/config/, but you can put it anywhere as long as you modify the second line of your Capfile.

You can do ALOT with Capistrano. The most basic use case is this-

1. Run cap deploy on your computer.

2. Capistrano logs into your application server(s), usually over SSH.

3. Capistrano clones the latest version of your trunk/master branch (both SVN and Git are supported)

4. Reloads PHP-FPM to get the latest code changes

That’s the basic use case, and that’s all our basic deploy.rb file will do (we’ll strip out some of the default Rails options). But here are some other ideas that you can use it for-

· Minify/Compile your Javascript, LESS, Coffeescript, SASS

· Run database changes, i.e, adding new columns or new indexes on deployment

· Tail the log files of all your servers

· Deploy to multiple environemnts (production, development, staging)

Example deploy.rb file

1 set :application, "My Application"

2 set :repository, "git@github.com:MyApp/app.git"

3 set :branch, "master"

4

5 set :deploy_to, "/u/apps/myapp/"

6

7 set :deploy_via, :remote_cache

8 set :scm, :git

9

10 # SSH User to Deploy As

11 set :user, 'deployer'

12 set :password, '123456'

13

14 set :use_sudo, false

15 set :keep_releases, 5

16

17 ssh_options[:forward_agent] = true

18

19 # Hostnames of your App Servers

20 role :app, ['app01', 'app02', 'app03']

21

22 namespace :deploy do

23

24 task :finalize_update do

25 # This strips out some of the default rails

26 # tasks that we don't need for PHP.

27 end

28

29 # Optional- Reload PHP-FPM after the code is deployed

30 task :restart, :roles => [:app] do

31 run "#{sudo} service php5-fpm reload"

32 end

33

34 end

Atomicity and Rollbacks

If you’re not sold on the ease-of-use that Capistrano adds to your development flow- consider this.

Capistrano deployments are atomic. That means that while it’s deploying code, it doesn’t impact any currently running code.

Think about it: if you have a website that you manually deploy to, when you upload files to it over FTP or manually run git pull, while it’s transfering the new code to your remote server, there are old files and new code files mixed together. Like, if you changed both foo.php and bar.php, since the files are transfered sequentually, for a split moment the old foo.php and the new bar.php will be running on the server at the same time.

This can cause all sorts of hairy situations- crashes, exceptions, and data loss, because you’re running a mix of both old and new code, who knows what the outcome will be. It gets exponentially worse, the more files that you have.

Capistrano works differently, though.

1. PHP-FPM is told to read code from a symbolic link, let’s call it /u/apps/my_app/current.

2. When you deploy new code with Capistrano, it creates a new directory in /u/apps/my_app/releases (i.e, /u/apps/my_appl/releases/749847589) and downloads your code into that folder.

3. Once all of the code has been transfered, it moves the symlink of /u/apps/my_app/current to the new directory it created- /u/apps/my_appl/releases/749847589.

4. Jump up and down! Capistrano deployed your code atomically. All of it changed at once, all of the new code works together.

This method also makes it easy to rollback bad deployments. Pretend you deploy some code with a hidden bug that breaks your site (this never happens in real life… right!?). If you were using traditional manual deployment, you’d have to find the old files and transfer them back, adding to the amount of downtime.

Instead, since capistrano keeps each deployment in its own directory, it can just roll back to the previous version by symlinking /u/apps/my_app/current back to the old release. And with just one command!

1 > cap deploy:rollback

Easy peasy. I could write a whole book on Capistrano and this only touches on the surface- but DO IT! It’s a really great tool and a “best practice”. Checkout more documentation on the Capistrano wiki.

Live Debugging PHP with strace

I have a confession to make. strace is probably my alltime favorite tool for debugging when shit hits the fan. It’s saved my ass more than once and if you learn how to use it, it’ll save yours too.

strace is a unix tool that attaches to a currently running process and monitors the system calls that are made by that process- it gives you visibility into how the program is interacting with the OS. strace can be used for a bunch of different things, but it’s a life saver for debugging weird issues in production.

Let’s pretend that you have some PHP code that uses in Memcached and your Memcached server goes down. Ideally, you’d have some monitoring or logging in place that would tip you off to this issue, but if you didn’t you’d be going in blind- you wouldn’t have any visibility into what was broken, your site would just be broke, likely timing out if you don’t have proper Memcached timeouts set.

What to do? Well, you could check each Memcached server manually, but this gets let’s convenient if you have more than one Memcached server- logging into each one, trying to figure out which one is broken and timing out.

First, we need to find a running php-fpm process to attach to.

1 > ps ax | grep php-fpm

2

3 11662 ? S 0:03 php-fpm: pool www

4 11663 ? Sl 0:06 php-fpm: pool www

5 11698 ? S 0:04 php-fpm: pool www

6 11721 ? S 0:04 php-fpm: pool www

7 11745 ? Sl 0:11 php-fpm: pool www

Ok, we have several php-fpm processes running, let’s go ahead and trace 11662.

1 > strace -r -p 11662

2

3 0.000000 restart_syscall(<... resuming interrupted call ...>) = 0

4 4.720248 socket(PF_INET6, SOCK_DGRAM, IPPROTO_IP) = 3

5 0.000110 close(3) = 0

6 0.000145 gettimeofday({1361146314, 442683}, NULL) = 0

7 0.000115 socket(PF_INET, SOCK_STREAM, IPPROTO_IP) = 3

8 0.000086 fcntl(3, F_GETFL) = 0x2 (flags O_RDWR)

9 0.000097 fcntl(3, F_SETFL, O_RDWR|O_NONBLOCK) = 0

10 30000.000079 connect(3, {sa_family=AF_INET, sin_port=htons(11211), sin_addr=i\

11 net_addr("192.150.0.10")}, 16) = -1 EINPROGRESS (Operation now in progress)

Very quickly, we can see on the last line that it’s stalling for 30 seconds making a connection to 192.150.0.10 on port 11211, which happens to be one of our 10 Memcached servers. Now we can jump right into that server and fix the problem, since we know it’s the culprit.

Another use case for this type of live-debugging is if you’re using a 3rd party API that’s slowing down your code- you can use strace to see how long the connections are taking. This is probably my most often go-to method for jumping into an “oh-shit-the-site-is-down situation”.

In the CakePHP Cache Files Case Study, I demonstrate how strace was invaluable in figuring out exactly where the bug in my code was.

A real life strace scenario

Last week, I was sitting on my couch, binging on some new episodes on NetFlix.

DingDing

A text message? This late? Can’t be good.

Grudgingly, I pause NetFlix after a few seconds (“How it’s made” is super addicting) to check out the messages…

“From PagerDuty: Your API is Down!”

Fuck! An alert telling me that my site is down. EXCUSE ME! I’m over here trying to learn how Nail Clippers are made. Damnit. Off to the computer.

I sit down and try to load up api.myjob.com and… nothing. Chrome just sits there, the condescending little blue circle thing spinning into oblivion.

My site is DOWN, and it’s down hard. Requests are timing out.

What tactics would you use? Some good answers might be…

· Check out automated alerts, like ones sent from Nagios

· Look at your alerting/graphing systems like Copperegg or Graphite

· Verify the “usual suspects” like Memcache or MySQL are healthy

But that’s NOT what I did. Any guesses?

I logged into one of my PHP servers, grabbed one of the PIDs for a PHP-FPM worker, and ran strace -p.

If you’re not using strace on a reoccurring basis, you’re doing yourself a MAJOR disservice. It’s perfect for quick and dirty debugging. Sure, in an IDEAL world, you’d get a notification telling you exactly which service was down, but I’m a realist and I know that doesn’t always happen.

When your site is down, wishing you had better alerting isn’t the answer. You need to get your API back up ASAP. Everything else comes second.

So, I log into my webserver over SSH and run ps to grab a random PHP-FPM process id, like this:

1 $ ps aux | grep php-fpm

2 myapp 128131 2.9 0.0 324132 57396 ? R 10:54 11:50 php-fpm: pool www

See 128131? That’s a process id I can get a sneak peak into using strace.

Next, I attach strace to the process and see what’s up…

1 $ sudo strace -p 128131

2 gettimeofday({1399247077, 563898}, NULL) = 0

3 sendto(17, "\372\0\0\0\3SELECT * FROM users WHERE id ="..., 254, MSG_DONTWAIT\

4 , NULL, 0) = 254`

I see it waiting on the last line for several seconds. Just, like that, I can immediately see that it’s sending a SQL query to the database server and is waiting for the response.. looks like my database server is fubared again!And just like magic, I’m immediately able to diagnose the downtime. Strace gives me the power to address the database server faster, fix the problem, and get back to watching “How It’s Made”.

A good friend of mine is a scaling expert…

He’s the kind of guy that can not only program well, but can also talk for hours about hardcore linux internals. I asked him— ”If you could teach everyone just ONE thing about scaling… what would it be?”

His response? Strace. 1000 times strace. Nothing else can give you so much insight and information about your stack in a single command. You can have all of the monitoring IN THE WORLD, and strace will STILL give you new insights into your application.