Writing Idiomatic Python (2013)

7. General Advice

7.1 Avoid Reinventing the Wheel

7.1.1 Get to know PyPI (the Python Package Index)

If Python’s standard library doesn’t have a package relevant to your particular problem, the chances are good that PyPI does. As of this writing, there are over 27,000 packages maintained in the index. If you’re looking to accomplish a particular task and can’t find a relevant package in PyPI, chances are it doesn’t exist.

The index is fully searchable and contains both Python 2 and Python 3 based packages. Of course, not all packages are created equal (or equally maintained), so be sure to check when the package was last updated. A package with documentation hosted externally on a site like ReadTheDocsis a good sign, as is one for which the source is available on a site like GitHub or Bitbucket.

Now that you found a promising looking package, how do you install it? By far the most popular tool to manage third party packages is pip. A simple pip install <package name> will download the latest version of the package and install it in your site-packages directory. If you need the bleeding edge version of a package, pip is also capable of installing directly from a DVCS like git or mercurial.

If you create a package that seems generally useful, strongly consider giving back to the Python community by publishing it to PyPI. Doing so is a straightforward process, and future developers will (hopefully) thank you.

7.1.2 Learn the Contents of the Python Standard Library

Part of writing idiomatic code is making liberal use of the standard library. Code that unknowingly reimplements functionality in the standard library is perhaps the clearest signal of a novice Python programmer. Python is commonly said to come with “batteries included” for a good reason. The standard library contains packages covering a wide range of domains.

Making use of the standard library has two primary benefits. Most obviously, you save yourself a good deal of time when you don’t have to implement a piece of functionality from scratch. Just as important is the fact that those who read or maintain your code will have a much easier time doing so if you use packages familiar to them.

Remember, the purpose of learning and writing idiomatic Python is to write clear, maintainable, and bug-free code. Nothing ensures those qualities in your code more easily than reusing code written and maintained by core Python developers. As bugs are found and fixed in the standard library, your code improves with each Python release without you lifting a finger.

7.2 Modules of Note

7.2.1 Learn the contents of the itertools module

If you frequent sites like StackOverflow, you may notice that the answer to questions of the form “Why doesn’t Python have the following obviously useful library function?” almost always references the itertools module. The functional programming stalwarts that itertools provides should be seen as fundamental building blocks. What’s more, the documentation for itertools has a ‘Recipes’ section that provides idiomatic implementations of common functional programming constructs, all created using the itertools module. For some reason, a vanishingly small number of Python developers seem to be aware of the ‘Recipes’ section and, indeed, the itertools module in general (hidden gems in the Python documentation is actually a recurring theme). Part of writing idiomatic code is knowing when you’re reinventing the wheel.

7.2.2 Use functions in the os.path module when working with directory paths

When writing simple command-line scripts, new Python programmers often perform herculean feats of string manipulation to deal with file paths. Python has an entire module dedicated to functions on path names: os.path. Using os.path reduces the risk of common errors, makes your code portable, and makes your code much easier to understand.

7.2.2.1 Harmful

from datetime import date

import os

filename_to_archive = 'test.txt'

new_filename = 'test.bak'

target_directory = './archives'

today = date.today()

os.mkdir('./archives/' + str(today))

os.rename(

filename_to_archive,

target_directory + '/' + str(today) + '/' + new_filename)

today) + '/' + new_filename)

7.2.2.2 Idiomatic

from datetime import date

import os

current_directory = os.getcwd()

filename_to_archive = 'test.txt'

new_filename = os.path.splitext(filename_to_archive)[0] + '.bak'

target_directory = os.path.join(current_directory, 'archives')

today = date.today()

new_path = os.path.join(target_directory, str(today))

if (os.path.isdir(target_directory)):

if not os.path.exists(new_path):

os.mkdir(new_path)

os.rename(

os.path.join(current_directory, filename_to_archive),

os.path.join(new_path, new_filename))

7.3 Testing

7.3.1 Use an automated testing tool; it doesn’t matter which one

Having automated tests is important for a variety of reasons (some of which are discussed in detail later in this chapter). Developers who haven’t used automated testing tools often spend a great deal of time worrying about which tool to use. This is understandable, but usually not an issue. What is important is that you actually use an automated testing tool (any of them) and learn its functionality.

For most, the standard library’s unittest module will be sufficient. It’s a fully featured and reasonably user-friendly testing framework modeled after JUnit. Most of Python’s standard library is tested using unittest, so it is quite capable of testing reasonably large and complex projects. Among other things, it includes:

· Automated test discovery

· Object-oriented resources for creating test cases and test suites

· An easy to use command line interface

· Ability to selectively enable/disable a subset of tests

If you find the unittest module lacking in functionality or writing test code not as intuitive as you’d like, there are a number of third-party tools available. The two most popular are nose and py.test, both freely available on PyPI. Both are actively maintained and extend the functionality offered by unittest.

If you decide to use one of them, choosing which is largely a matter of support for the functionality required by your project. Otherwise, it’s mostly a matter of taste regarding the style of test code each packaged supports. This book, for example, has used both tools at various points of its development, switching based on changing test requirements.

When you do make a decision, even if it’s to use the unittest module, familiarize yourself with all of the capabilities of the tool you chose. Each have a long list of useful features. The more you take advantage of these features, the less time you’ll spend inadvertently implementing a feature which you weren’t aware the tool you use already supports.

7.3.2 Separate your test code from your application code

When writing test code, some developers are tempted to include it in the same module as the code it’s meant to test. This is typically done by including test classes or functions in the same file as the code to be tested and relying on test discovery tools to run them. A (thankfully) less common alternative is to use the if __name__ == '__main__' idiom to run test code when the module is invoked directly.

There’s no good reason to shoehorn test code and application code into the same file, but there are a number of reasons not to. The documentation on Python’s unittest module lists succinctly enumerates these, so I’ll simply list their reasons here:

· The test module can be run standalone from the command line.

· The test code can more easily be separated from shipped code.

· There is less temptation to change test code to fit the code it tests without a good reason.

· Test code should be modified much less frequently than the code it tests.

· Tested code can be refactored more easily.

· Tests for modules written in C must be in separate modules anyway, so why not be consistent?

· If the testing strategy changes, there is no need to change the source code.

As a general rule, if the official Python documentation strongly suggests something, it can safely be considered idiomatic Python.

7.3.3 Use unit tests to aid in refactoring

Idiomatic Python code terse and easy to read. Python developers are accustomed to getting results quickly without the need to give much thought to the organization of code. This is great for rapid prototyping and short scripts, but any non-trivial piece of software will likely be refactoredmany times throughout its lifetime.

To refactor code is to restructure it without changing its observable behavior. Imagine we have a function that calculates various statistics about students’ test scores and outputs the results in nicely-formatted HTML. This single function might be broken into two smaller functions: one to perform the calculations and the other to print the results as HTML). The resulting HTML output will be the same as before, but the structure of the code itself was changed to increase readability, among other things.
Refactoring is a deep topic and a full discussion is outside the scope of this book, but it’s likely something you’ll find yourself doing often.

As you’re making changes, though, how do you know if you’ve inadvertently broken something? And how do you know which portion of code is responsible for the bug? Automated unit testing is your canary in the mine shaft of refactoring. It’s an early warning system that lets you know something has gone wrong. Catching bugs quickly is important; the sooner you catch a bug, the easier it is to fix.

Unit tests are the specific type of tests helpful for this purpose. A unit test is different from other types of tests in that it tests small portions of code in isolation. Unit tests may be written against functions, classes, or entire modules, but they test the behavior of the code in question and no more. Thus, if the code makes database queries or network connections, these are simulated in a controlled way by mocking those resources. The goal is to be absolutely sure that, if a test fails, it’s because of the code being tested and not due to some unrelated code or resource.

It should be clear, then, how unit tests are helpful while refactoring. Since only the internal structure of the code is changing, tests that examine the code’s output should still pass. If such a test fails, it means that the refactoring introduced unintended behavior. Sometimes you’ll need to make changes to your tests themselves, of course, depending on the scope of the code changes you’re making. In general, though, passing unit tests is a good litmus test for determining if your refactoring broke anything.

Lastly, your unit tests should be automated so that running them and interpreting the results requires no thought on your part. Just kick off the tests and make sure all the tests still pass. If you need to manually run specific tests based on what part of the system you’re changing, you run the risk of running the wrong tests or missing a bug introduced in a portion of the code you didn’t intend to affect (and thus didn’t test). Like most developer productivity tools, the purpose is to reduce the cognitive burden on the developer and increase reliability and repeatability.