Static Code Analysis - The Grumpy Programmer's Guide To Building Testable PHP Applications (2015)

The Grumpy Programmer's Guide To Building Testable PHP Applications (2015)

4. Static Code Analysis

Tools for doing static code analysis have come to PHP very late in its lifecycle. Tools like these have been far more common in languages like Java or C++: languages where it’s easy to create complex code…and difficult to tame if it gets out of control.

Obviously it is easier to do static analysis of code that is statically typed and compiled. Proper static code analysis of dynamic languages is difficult and has been the subject of a few research papers. So treat them as tools that compliment what you are doing, not critical parts of the testing toolchain.

In this chapter we will examine several tools that I use every day when working with fairly complex code bases.

For a project where more than one person has to work on it (or integrate it into their own project) it’s important to pick a coding standard. This covers things like tabs vs. spaces, brace location, variable naming conventions, things like that. Sure, there have been plenty of nerd fights over what set of coding standards are the “best”, but most important is to set a standard and FOLLOW it.

Have you ever worked with a programmer who coded in such a distinct style that you could instantly tell which code was his? Don’t be that person. Follow the standard and save being an individual for your choice of code editor.

PHP Code Sniffer is a CLI utility that will take your code and compare it to a particular standard. I have found it to be a useful tool for making sure that your code stays organized and consistent.

Let’s run it on a file I created for this project, a Data Mapper for talking to our Postgres database to get information about franchises in our simulation baseball league.

1 chartjes@yggdrasil [09:39:25] [~/Sites/local.ibl/lib/ibl] [master *]

2 -> % phpcs --standard=Zend FranchiseMapper.php

3

4 FILE: /Users/chartjes/Sites/local.ibl/lib/ibl/FranchiseMapper.php

5 FOUND 4 ERROR(S) AND 5 WARNING(S) AFFECTING 9 LINE(S)

6 6 | ERROR | Expected 0 spaces before opening brace; 1 found

7 7 | ERROR | Protected member variable "conn" must contain a leading

8 | | underscore

9 8 | ERROR | Protected member variable "map" must contain a leading

10 | | underscore

11 15 | WARNING | Line exceeds 80 characters; contains 86 characters

12 77 | WARNING | Line exceeds 80 characters; contains 84 characters

13 139 | WARNING | Line exceeds 80 characters; contains 90 characters

14 142 | ERROR | Line exceeds maximum limit of 120 characters; contains 181

15 | | characters

16 152 | WARNING | Line exceeds 80 characters; contains 117 characters

17 154 | WARNING | Line exceeds 80 characters; contains 86 characters

18

19 Time: 1 second, Memory: 5.50Mb

Ah, look at all the problems it found. Tsk, tsk, Mister Hartjes. That’s some laziness in there. Complaints about line lengths (makes things less readable) and not following proper naming conventions for protected variables.

I will go and fix those right now but let’s see what happens if we go with the default standard it uses.

1 chartjes@yggdrasil [09:39:37] [~/Sites/local.ibl/lib/ibl] [master *]

2 -> % phpcs FranchiseMapper.php

3

4 FILE: /Users/chartjes/Sites/local.ibl/lib/ibl/FranchiseMapper.php

5 FOUND 13 ERROR(S) AND 5 WARNING(S) AFFECTING 18 LINE(S)

6 2 | ERROR | Missing file doc comment

7 5 | ERROR | Missing class doc comment

8 6 | ERROR | Expected 0 spaces before opening brace; 1 found

9 10 | ERROR | Missing function doc comment

10 15 | WARNING | Line exceeds 85 characters; contains 86 characters

11 20 | ERROR | You must use "/**" style comments for a function comment

12 36 | ERROR | Missing function doc comment

13 53 | ERROR | Missing function doc comment

14 74 | ERROR | Missing function doc comment

15 95 | ERROR | Missing function doc comment

16 109 | ERROR | Missing function doc comment

17 127 | ERROR | Missing function doc comment

18 136 | ERROR | Missing function doc comment

19 139 | WARNING | Line exceeds 85 characters; contains 90 characters

20 142 | WARNING | Line exceeds 85 characters; contains 181 characters

21 149 | ERROR | Missing function doc comment

22 152 | WARNING | Line exceeds 85 characters; contains 117 characters

23 154 | WARNING | Line exceeds 85 characters; contains 86 characters

24

25 Time: 0 seconds, Memory: 6.75Mb

That’s a completely different set of problems. In this case, it expects that I would put DocBlock comments around all my methods along with some warnings about line length.

It is an interesting exercise to analyze your code using all of the standards that it includes by default, just to see some other ideas on how code should look. The PHP Code Sniffer website also shows you how to create your own coding standards should you want to undertake such a task.

However, ensuring 100% compliance with what PHP Code Sniffer reports isn’t always a practical goal to aim for. For example, it complains about my use of non-camel-cased, protected variable names in our model classes. They are protected to prevent direct access by developers and are non-camel-cased to match the values that are in the database tables themselves.

Altering the code to allow the model variables to be camelCased and prefixed with an underscore (as is the Zend convention) might also not be practical. The code that dynamically determines what model variable needs to be returned would also need to be altered.

Should I do the necessary work to make it happen? That depends. Do I have tests in place that I can run while I tweak things under the hood to make it happen? Of course I do! Showing you boring patches for the changes I made to the code will not suffice. Instead, let me show you the code itself and we’ll discuss what had to be done.

I really only had to change code in one place: the BaseModel I am extending all my other models off of:

1 <?php

2

3 // Base data model that all our existing models will extend off of

4 namespace IBL;

5

6 class BaseModel

7 {

8 protected $_id;

9

10 public function setId($id)

11 {

12 // Never allow the ID for a model to be overwritten

13 if (!$this->_id) {

14 $this->_id = $id;

15 }

16 }

17

18 public function __call($name, $args)

19 {

20 if (preg_match('/^(get|set)(\w+)/', $name, $match)) {

21 $attribute = $this->validateAttribute($match[2]);

22

23 if ($match[1] == 'get') {

24 return $this->{$attribute};

25 } else {

26 $this->{$attribute} = $args[0];

27 }

28 } else {

29 throw new \Exception(

30 'Call to undefined ' .

31 __CLASS__ .

32 '::' .

33 $name .

34 '()'

35 );

36 }

37 }

38

39 protected function validateAttribute($name)

40 {

41 // Convert first character of the string to lowercase

42 $field = '_' . $name;

43 $field[1] = strtolower($field[1]);

44

45 // We don't use __CLASS__ here because there are scope problems

46 $currentClass = get_class($this);

47

48 if (in_array($field, array_keys(get_class_vars($currentClass)))) {

49 return $field;

50 }

51

52 return false;

53 }

54 }

Before all this, I used to have an inflect() module that was using PHP 5.3 closures to take something like HomeTeamId and turn it into home_team_id, and then match it against the available class variables for the model.

In order to make all the class variables for the class models conform to the standard, I no longer needed the inflector. Sure, it was a neat bit of code that used a closure as a callback, but the cleverness wasn’t needed.

Also note how I commented on the section of code where I am using a shortcut to set the first alphanumeric character of my attribute lower case. I needed this because the code takes something like setHomeTeamId and chops off the set part. In order to match it, I need to change it into _homeTeamId to match the coding standard for protected variables.

So now all my code meets the Zend coding standard. Onto other things like seeing how messy this pretty-looking code is.

Having updated my code until I got rid of all warnings having to do with the FranchiseMapper code being up to the Zend standard, the next tool to use is the PHP Mess Detector.

The job of this tool is to examine the complexity of your code and make suggestions on how to fix it. People create complex methods inside their code for the most innocent of reasons, but (beating a dead horse here) complexity is the enemy of maintainability and extendability. So, let’s take a look at what it says about FranchiseMapper.

1 chartjes@yggdrasil [10:30:20] [~/Sites/local.ibl/lib/ibl] [master *]

2 -> % phpmd FranchiseMapper.php text codesize,design,naming,unusedcode

3

4 FranchiseMapper.php:5 This class has too many methods, consider

5 refactoring it.

6 FranchiseMapper.php:137 Avoid variables with short names like $id

Aw, that’s no fun at all. No real news to report there. But what if we use it on something a lot messier? Here’s the same command run on a complex library for a work-related project I did.

1 > % phpmd lib/vendor/Moontoast/RpcApi.php text

2 codesize,design,naming,unusedcode

3 RpcApi.php:12 The class RpcApi has an overall complexity of 77 which

4 is very high. The configured complexity threshold is 50

5 RpcApi.php:12 The class RpcApi has a coupling between objects value

6 of 14. Consider to reduce to number of dependencies under 13

7 RpcApi.php:27 Avoid unused parameters such as '$params'.

8 RpcApi.php:114 The method getProduct() has a Cyclomatic Complexity of 14.

9 RpcApi.php:114 The method getProduct() has an NPath complexity of 655.

10 RpcApi.php:114 Avoid really long methods.

11 RpcApi.php:114 Avoid variables with short names like $id

12 RpcApi.php:234 Avoid unused parameters such as '$params'.

13 RpcApi.php:282 The method getStore() has a Cyclomatic Complexity of 12.

14 RpcApi.php:282 Avoid really long methods.

15 RpcApi.php:302 Avoid unused local variables such as '$price'.

16 RpcApi.php:303 Avoid unused local variables such as '$previousPrice'.

17 RpcApi.php:398 Avoid unused parameters such as '$csc'.

18 RpcApi.php:477 The method saveCart() has a Cyclomatic Complexity of 26.

19 RpcApi.php:477 The method saveCart() has an NPath complexity of 5644802.

20 RpcApi.php:477 Avoid really long methods.

21 RpcApi.php:477 Avoid variables with short names like $id

22 RpcApi.php:588 Avoid excessively long variable names like

23 $useBillingForShipping

24 RpcApi.php:608 Avoid excessively long variable names like

25 $shippingAddressObject

26 RpcApi.php:671 Avoid unused local variables such as '$gateway'.

27 RpcApi.php:702 Avoid variables with short names like $q

28 RpcApi.php:707 Avoid variables with short names like $mm

You can see all kinds of places where you could make some improvements to your code. The PHP Mess Detector web site has good documentation on what all these messages mean, although some of them should be pretty self-explanatory.

My personal experience with PHP Mess Detector is that it does an awesome job of identifying places in my code where I’m trying to do too much in one method or have come up with an algorithm that might be too complicated for the task I’m asking it to do. It surprised me to see those complaints about unused parameters and local variables, as I am usually very diligent about not doing stuff like that.

Reducing the “mess” in your code will also go a long way to figuring out the best way to actually test it.

One of the biggest temptations when trying to “just get this done” is to slack off on your efforts to reuse code or extract functionality into its own methods (or even own classes) and just cut-and-paste code everywhere. PHP Copy Paste Detector will statically examine your code and identify lines in your code where you appear to be repeating yourself, making it a candidate for refactoring that code into a method of it’s own. Let’s see what it says about our trusty FranchiseMapper

1 -> % phpcpd ./FranchiseMapper.php

2 phpcpd 1.3.2 by Sebastian Bergmann.

3

4 0.00% duplicated lines out of 197 total lines of code.

FLAWLESS VICTORY. Again, that’s pretty boring. Let’s take a look at another bit of code from a work project, code that deals with handling AMF (Action Message Format) calls from a Flash file used to generate store fronts for an ecommerce site:

1 -> % phpcpd lib/services/AMF_Checkout.php

2 phpcpd 1.3.3 by Sebastian Bergmann.

3

4 Found 2 exact clones with 52 duplicated lines in 1 files:

5

6 - AMF_Checkout.php:60-68

7 AMF_Checkout.php:576-584

8

9 - AMF_Checkout.php:88-132

10 AMF_Checkout.php:602-646

11

12 7.21% duplicated lines out of 721 total lines of code.

13

14 Time: 0 seconds, Memory: 6.50Mb

That’s a little better. The first bit of analysis worried me, but 9 lines of duplicate code isn’t that bad. Closer inspection revealed that it was the same set of ‘if’ statements being executed. I’ll think about whether or not to extract that into a method of its own.

That second problem is worth examining. Almost 50 lines of code have been duplicated. Definitely worth extracting into it’s own method, to cut that down to just 50 lines of code. Less code means less bugs…usually.

I have found that the best way to use these tools is to compliment your attempts to have well-organized code. If you find you are having problems in generating code that is easily understood by others, these tools can be used to identify places where friction is occurring in the code base.