MySQL Stored Procedure Programming (2009)

Part IV. Optimizing Stored Programs

Chapter 23. Best Practices in MySQL Stored Program Development

The objective of this chapter is to provide concrete, immediately applicable, quickly located advice that will assist you in writing code that is readable, maintainable, and efficient.

It might seem odd that we have written a "best practices" chapter for a language that is still in its first major release. Aren't "best practices" supposed to be determined and documented after years of trial and error, sweat, and heartache? Absolutely. Those are, in fact, precisely the kinds of best practices you will find in this chapter.

We spent more than a year between the first alpha release of MySQL 5.0 in late 2004 and the most recent production release in early 2006, learning the hard way about what works and does not work in MySQL stored programs. Beyond that, while stored programs might be new to MySQL, they have been around in other databases for years—and both of us have plenty of experience (altogether over two decades' worth) to draw from—with MySQL, Oracle, and SQL Server. Most of the lessons learned in developing stored programs in other languages apply directly to MySQL.

We will start off with some general-purpose guidance that is intended to assist with software development in any language, then move on to guidelines specifically crafted for the MySQL stored program language. If you find yourself reading these and saying "Well, sure, of course that is what you are supposed to do!" then we congratulate you and hope that you not only know about these best practices, but also apply them as you write your code!

The Development Process

To do your job well, you need to be aware of, and to follow, both "little" best practices—tips focused on particular coding techniques—and "big" best practices. This section offers some suggestions on the big picture: how to write your code as part of a high-quality development process .

In other words, if you (or your methodology) don't follow some form of the best practices in this section, you are less likely to produce high-quality, successful software.

Name

DEV-01: Set standards and guidelines before writing any code

Synopsis

These standards and guidelines might include many or all of the best practices described in this book. Of course, you need to make your own decisions about what is most important and practical in your own particular environment.

Key areas of development for which you should proactively set standards are:

§ Selection of development tools : You should avoid relying on the MySQL command-line client to compile, execute, and test code, and avoid relying on a basic editor like Notepad or vi to write the code. MySQL AB and other software companies offer a multitude of tools (with a wide range of functionality and price) that will help you to dramatically improve your development environment. Decide on the tools to be used by all members of the development group.

§ How SQL is written in stored programs: The SQL in your application can be the Achilles' heel of your code base. If you aren't careful about how you place SQL statements in your stored program code, you'll end up with applications that are difficult to optimize, debug, and manage over time.

§ An exception-handling architecture : Users have a hard time understanding how to use an application correctly, and developers have an even harder time debugging and fixing an application if errors are handled inconsistently (or not at all). Use a consistent approach to handling runtime errors using exceptions.

§ Processes for code review and testing : There are some basic tenets of programming that must not be ignored. You should never put code into production without first having it reviewed by one or more other developers, and performing tests on both the individual programs in your application and the overall application.

Benefits

By setting clear standards and guidelines for at least the areas we listed above (tools, SQL, error handling, and code review and testing), you ensure a foundation that will allow you to be productive and to produce code of reasonable quality. We offer detailed advice on most of these areas later in the chapter.

Challenges

The deadline pressures of most applications mitigate against taking the time up front to establish standards, even though we all know that such standards are likely to save time down the line.

Name

DEV-02: Ask for help after 30 minutes on a problem

Synopsis

Following this simple piece of advice might have more impact on the quality of your code (and your productivity) than anything else in this book!

How many times have you stared at the screen for hours, trying this and that in a vain attempt to fix a problem in your code? Finally, exhausted and desperate, you call over your cubicle wall: "Hey, Melinda (or Jose or Farik or Lakshmi), could you come over here and look at this?" When Melinda reaches your cube she sees in an instant what you, after hours, still could not see (and she doesn't even know MySQL all that well!). Gosh, it's like magic!

Except it's not magic and it's not mysterious at all. Remember: humans write software, so an understanding of human psychology is crucial to setting up processes that encourage quality software. We humans like to get things right, like to solve our own problems, and do not like to admit that we don't know what is going on. Consequently, we tend to want to hide our ignorance and difficulties. This tendency leads to many wasted hours, high levels of frustration, and, usually, nasty, spaghetti code.

Team leaders and development managers need to cultivate an environment in which we are encouraged to admit what we do not know, and ask for help earlier rather than later. Ignorance isn't a problem unless it is hidden from view. And by asking for help, you validate the knowledge and experience of others, building the overall self-esteem and confidence of the team.

There is a good chance that if you have already spent 30 minutes fruitlessly analyzing your code, two more hours will not get you any further along to a solution. So get in the habit of sharing your difficulty with a coworker (preferably an assigned "buddy," so the line of communication between the two of you is officially acknowledged and doesn't represent in any way an acknowledgement of some sort of failure).

Example

Programmers are a proud and noble people. We don't like to ask for help; we like to bury our nose in our screen and create. So the biggest challenge to getting people to ask for help is to change behaviors. Here are some suggestions:

§ The team leader must set the example. When we have the privilege to manage a team of developers, we go out of our way to ask each and every person on that team for help on one issue or another. If you are a coach to other teams of developers, identify the programmer who is respected by all others for her expertise. Then convince her to seek out the advice of others. Once the leader (formal or informal) shows that it is OK to admit ignorance, everyone else will gladly join in.

§ Post reminders in work areas, perhaps even individual cubicles, such as "STUCK? ASK FOR HELP" and "IT'S OK NOT TO KNOW EVERYTHING." We need to be reminded about things that don't come naturally to us.

Benefits

Problems in code are identified and solved more rapidly. Fewer hours are wasted in a futile hunt for bugs.

Knowledge about the application and about the underlying software technology is shared more evenly across the development team.

Challenges

The main challenge to successful implementation of this best practice is psychological: don't be afraid to admit you don't know something or are having trouble figuring something out.

Resources

Peopleware: Productive Projects and Teams, by Tom DeMarco and Timothy Lister (Dorset House). This is a fantastic book that combines deep experience in project management with humor and common sense.

Name

DEV-03: Walk through each other's code

Synopsis

Software is written to be executed by a machine. These machines are very, very fast, but they aren't terribly smart. They simply do what they are told, following the instructions of the software we write, as well as the many other layers of software that control the CPU, storage, memory, etc.

It is extremely important, therefore, that we make sure the code we write does the right thing. Our computers can't tell us if we missed the mark ("garbage in, garbage out" or, unfortunately, "garbage in, gospel out"). The usual way we validate code is by running that code and checking the outcomes (well, actually, in most cases we have our users run the code and let us know about failures). Such tests are, of course, crucial and must be made. But they aren't enough.

It is certainly possible that our tests aren't comprehensive and leave errors undetected. It is also conceivable that the way in which our code was written produces the correct results in very undesirable ways. For instance, the code might work "by accident" (two errors cancel themselves out).

A crucial complement to formal testing of code is a formalized process of code review or walk-through. Code review involves having other developers actually read and review your source code. This review process can take many different forms, including:

§ The buddy system: Each programmer is assigned another programmer to be ready at any time to look at his buddy's code and to offer feedback.

§ Formal code walk-throughs: On a regular basis (and certainly as a "gate" before any program moves to production status), a developer presents or "walks through" her code before a group of programmers.

§ Pair programming: No one codes alone! Whenever you write software, you do it in pairs, where one person handles the tactical work (thinks about the specific code to be written and does the typing), while the second person takes the strategic role (keeps an eye on the overall architecture, looks out for possible bugs, and generally critiques—always constructively). Pair programming is an integral part of Extreme Programming. However, note that reports from the field are mixed with regard to pair programming —there are some indications that it relies too heavily on an intimate relationship between members of a pair that is rarely achieved.

Benefits

Overall quality of code increases dramatically. The architecture of the application tends to be sounder, and the number of bugs in production code goes way down. A further advantage is that of staff education—not just awareness of the project, but also an increase in technological proficiency due to the synergistic effect of working together.

Challenges

The development manager or team leader must take the initiative to set up the code review process and must give developers the time (and training) to do it right. Also, code review seems to be the first casualty of deadline crunch. Further, a new project involving MySQL stored programs might not have the language expertise available on the team to do complete, meaningful walk-throughs.

Resources

§ Handbook of Walkthroughs, Inspections, and Technical Reviews, by Daniel Freedman and Gerald M. Weinberg (Dorset House). Now in its third edition, this book uses a question-and-answer format to show you exactly how to implement reviews for all sorts of product and software development.

§ Extreme Programming Explained, by Kent Beck (Addison Wesley). The first book on Extreme Programming offers many insights into pair programming.

§ Extreme Programming Refactored, by Matt Stephens and Doug Rosenberg (APress). An often funny critical examination of Extreme Programming that argues against (in particular) pair programming.

Name

DEV-04: Use independent testers for functional sign-off

Synopsis

Individual developers should and must be responsible for defining and executing unit tests on the programs they write. Developers should not, on the other hand, be responsible for overall functional testing of their applications. There are several reasons for this:

§ We don't own the requirements. We don't decide when and if the system works properly. Our users or customers have this responsibility. They need to be intimately connected with, and drive, the functional tests.

§ Whenever we test our code, we follow the "pathways to success" without ever knowing it. In other words, the mindset we had when we wrote the code is the same mindset we have when testing the code. Other people, other eyes, need to run the software in complete ignorance of those pathways. It is no wonder that unit testing was so successful and yet integration testing has such problems.

To improve the quality of code that is handed over to customers for testing, your team leader or development manager should:

§ Work with the customer to define the set of tests that must be run successfully before an application is considered to be ready for production.

§ Establish a distinct testing group—either a devoted Quality Assurance organization or simply a bunch of developers who haven't written any of the software to be tested.

This extra layer of testing, based on the customer's own requirements and performed before the handoff to customers for their "sign off" test, will greatly improve code quality and customer confidence in the development team.

Example

We spend several days building a really slick application in PHP (or VB.NET or Java or . . . ). It allows users to manage data in a few different tables, request reports, and so on. We then devote most of a day to running the application through its paces. We click here, click there, enter good data, enter bad data, find a bunch of bugs, fix them, and finally hand it over to our main customer, Johanna. We feel confident in our application. We can no longer break it.

Imagine how crushed we feel (and we bet you can imagine it, because undoubtedly the same thing has happened to you) when Johanna sits down in front of the computer, starts up the application, and in no more than three clicks of the mouse causes an error window to pop up on the screen. The look she sends our way ("Why are you wasting my time?") is not rewarding.

There is no way for us to convince Johanna that we really, truly did spend hours testing the application. Why should she believe such a thing?

Benefits

Quality of code handed to users for testing is higher, which means the end result moved to production is of correspondingly higher quality.

Customer confidence in the development organization remains high. This confidence—and the respect that comes with it—makes it easier for developers to negotiate with customers over the time-versus-quality dilemma so many of us face in software development.

Challenges

Many small development groups can't afford (i.e., can't convince management to spend the money) to staff a separate QA organization. At a minimum, you must make sure that customers have defined a clear set of tests. Then distribute the functional testing load to the developers so that they do not test their own code.

Resources

http://www.well.com/~vision/sqa.html: A gathering place for references related to the theory and practice of Software Quality Assurance. This site is growing to include information on Standards and Development Procedures, Product Evaluation and Process Monitoring, Configuration Management Monitoring, the role of SQA in the Product Development Cycle, and Automated Testing Tools.

Name

DEV-05: Use source controlled files to maintain the "reference" copy of your stored routines

Synopsis

Source code control systems (SCCSs) allow us to keep copies of major revisions of our program source code, allowing us to roll back an application's source code to an earlier point in time or to examine the source code in use with an earlier version of the application (which might still be in use somewhere). Virtually all professional software developers could—or at least should—employ an SCCS to store their application code.

Unfortunately, developers often fail to source control the DDL code to create database objects and often neglect to include stored program code in the SCCS. To some extent, the ability to extract the source code for a stored program from the database encourages us to edit a stored program "in place"—even when we would never dream of editing PHP code "in place" (e.g., directly editing the .php files in the Apache document directory).

If your stored programs are part of an application, then the source program code is just as much a part of the application source code as code written in other languages such as PHP or Java. You should therefore keep the "reference" copy of your stored program code in your version control system (such as CVS, ClearCase, BitKeeper, etc.). This means saving your stored program code as a text file and performing explicit check-in and check-out from your version control system.

Think of that text file as the original source code for your procedure. Applying the source code to the MySQL server is analogous to compiling that source as a binary. Extracting it from a server for editing is equivalent to decompiling a binary and is usually not how you obtain a copy of the source for editing. Instead, you should perform an explicit check-out of the source code from the SCCS, edit it in the MySQL Query Browser or other tool, and then apply it to a test database for unit testing. Later you can deploy the source code for the stored program to a production database by running a script that executes it inside of the MySQL command-line client.

Coding Style and Conventions

Software developers are a very privileged bunch. We don't have to work in dangerous environments, and our jobs aren't physically taxing (though carpal tunnel syndrome is always a threat). We are paid to think about things, and then to write down our thoughts in the form of code. This code is then used and maintained by others, sometimes for decades. Now just think of your code as a form of poetry and rejoice in your fortunate circumstances!

Given this situation, we all have a responsibility to write code that can be easily understood and maintained (and, c'mon, let's admit our secret desires, admired) by developers who follow in our footsteps.

Tip

Steve McConnell's http://www.construx.com site, along with his book, Code Complete (Microsoft Press), offers checklists on coding style, naming conventions and rules, and module definitions.

Name

STYL-01: Adopt a consistent, readable format that is easy to maintain

Synopsis

Your code should have a "signature," a style that is consistent (all your programs look the same), readable (anyone can pick up your code and make sense of it), and maintainable (a minor change in the code shouldn't require 15 minutes of reformatting).

Ideally, everyone in your organization would adopt a similar style, so that everyone can easily understand everyone else's code. This can be tricky, as programmers sometimes take a dogmatic approach to such issues as size of indentation and use of whitespace. However, research and experience confirm that the benefit of adopting a similar style is not so much that any one standard confers a significant benefit over another, but rather that the use of a consistent standard throughout an organization improves efficiency and reduces maintenance costs.

Name

STYL-02: Adopt logical, consistent naming conventions for modules and data structures

Synopsis

Adopt and promote standard ways to define names of program elements. Choose a level of "formality" of naming conventions based on your needs. If, for example, you have a team of two developers working on a small code base, you can probably get away with naming conventions that don't go far beyond "use meaningful names." If you are building a massive application involving dozens of developers, you probably need to define more comprehensive rules.

Here are some general recommendations for conventions:

§ Identify the scope of a variable in its name: A global variable can be prefaced with v_, for example.

§ Use a prefix or suffix to identify the types of structures being defined: Consider, for example, declarations of cursors. A standard approach to declaring such a structure is <name> _csr. Cursors are quite different from variables; you should be able to identify the difference with a glance.

§ Use a readable format for your names: Since the stored program language isn't case sensitive, the "camel notation" (as in minBalanceRequired), for example, is probably not a good choice for constructing names. Instead, use separators such as _ (underscore) to improve readability (as in min_balance_required). While MySQL allows names to be extremely long (compared with other databases and/or languages), keep them short, as well as readable.

§ Consider portability: If you ever want to port your code to an alternate RDBMS (perish the thought!) you should consider adopting a naming convention that will work across RDBMS types. You can find a summary of the conventions for the "other" databases athttp://www.dbazine.com/db2/db2-disarticles/gulutzan5.

It isn't possible to provide a comprehensive list of naming conventions in this book. The particular conventions you choose, furthermore, aren't nearly as important as the fact that you set some standard for naming conventions.

Name

STYL-03: Self-document using block and loop labels

Synopsis

While block and loop labels are often necessary to allow for variable scoping or as targets for LEAVE or ITERATE statements, they can also be a big help in improving the readability of code.

Use a label directly in front of loops and nested blocks:

§ To name that portion of code and thereby self-document what it's doing

§ So that you can repeat that name with the END statement of that block or loop

§ To provide a target for a LEAVE or ITERATE statement

This recommendation is especially important when you have multiple nestings of loops (and possibly inconsistent indentation), as in the following:

WHILE condition DO

some code

WHILE condition DO

some code

END WHILE;

some code

END WHILE;

Example

In this example we use labels for a block and two nested loops, and then apply them in the appropriate END statements. We can now easily see which loop and block are ending, no matter how badly the code is indented!

CREATE PROCEDURE display_book_usage( )

READS SQL DATA

BEGIN

DECLARE v_month INT;

DECLARE v_x INT;

DECLARE yearly_analysis_csr CURSOR FOR SELECT ...;

DECLARE monthly_analysis_csr CURSOR FOR SELECT ...;

OPEN yearly_analysis_csr;

yearly_analysis:

LOOP

FETCH yearly_analysis_csr INTO v_month;

OPEN monthly_analysis_csr;

monthly_analysis:

LOOP

FETCH monthly_analysis_csr INTO v_x;

... Lots of monthly analysis code ...

END LOOP monthly_analysis;

...Lots of yearly analysis code

END LOOP yearly_analysis;

Benefits

If you use labels, it's much easier to read your code, especially if it contains loops and nested blocks that have long bodies (i.e., the loop starts on page 2 and ends on page 7, with three other loops inside that outer loop—not that we recommend this!).

Name

STYL-04: Express complex expressions unambiguously using parentheses

Synopsis

The rules of operator precedence in the MySQL stored program language follow the commonly accepted precedence of algebraic operators. The rules of precedence often make many parentheses unnecessary. When an uncommon combination of operators occurs, however, it may be helpful to add parentheses even when the precedence rules apply.

The rules of evaluation do specify left-to-right evaluation for operators that have the same precedence level. However, this is the most commonly overlooked rule of evaluation when checking expressions for correctness.

Many developers apply a consistent rule for improved readability in this area: always use parentheses around every Boolean expression, including IF, ELSEIF, and WHILE statements, as well as variable assignments, regardless of the simplicity of the expressions. So, rather than:

IF min_balance < 1000 THEN ...

you instead write:

IF ( min_balance < 1000 ) THEN ...

Example

You might not want a standard that requires you to always use parentheses, but in some situations, parentheses are all but required for readability. Consider the following expression:

5 + Y**3 MOD 10

MySQL will not be the least bit confused by this statement; it will apply its unambiguous rules and come up with an answer. Developers, however, may not have such an easy time of it. You are better off writing that same line of code as follows:

5 + ((Y ** 3) MOD 10)

Benefits

Everyone, including the author of the code, can more easily understand the logic and intent (which is crucial for maintenance) of complex expressions.

Name

STYL-05: Use vertical code alignment to emphasize vertical relationships

Synopsis

A common code formatting technique is vertical alignment. Here is an example in a SQL WHERE clause:

WHERE COM.company_id = SAL.company_id

AND COM.company_type_cd = TYP.company_type_cd

AND TYP.company_type_cd = CFG.company_type_cd

AND COM.region_cd = REG.region_cd

AND REG.status = RST.status;

You should use vertical alignment only when the elements that are lined up vertically have a relationship with each other that you want to express. In the WHERE clause shown here, however, there is no relationship between the right sides of the various expressions. The relationship is between the left and right sides of each individual expression. This is, therefore, a misuse of vertical alignment.

Example

Developers often (and justifiably) use vertical alignment with program parameter lists, as in:

CREATE PROCEDURE maximize_profits

(

IN advertising_budget NUMERIC(12,2),

INOUT bribery_budget NUMERIC(12,2),

IN merge_and_purge_on DATE ,

OUT obscene_bonus NUMERIC(12,2))

Vertical alignment allows you to easily see the different parameter modes and data types.

Vertical alignment is also handy when declaring many variables, as in:

CREATE PROCEDURE genAPI( )

DETERMINISTIC

BEGIN

DECLARE c_table CHAR(5) DEFAULT 'TABLE';

DECLARE c_column CHAR(6) DEFAULT 'COLUMN';

DECLARE c_genpky CHAR(6) DEFAULT 'GENPKY';

DECLARE c_genpkyonly CHAR(10) DEFAULT 'GENPKYONLY';

DECLARE c_sequence CHAR(7) DEFAULT 'SEQNAME';

DECLARE c_pkygenproc CHAR(10) DEFAULT 'PKYGENPROC';

DECLARE c_pkygenfunc CHAR(10) DEFAULT 'PKYGENFUNC';

DECLARE c_usingxmn CHAR(8) DEFAULT 'USINGXMN';

DECLARE c_fromod2k CHAR(8) DEFAULT 'FROMOD2K';

In this case, we want to be able to scan the list of values to make sure they are unique. I can also easily compare lengths of strings with the CHAR declarations, avoiding nuisance truncation exceptions on initialization.

Benefits

Careful and appropriate use of vertical alignment enhances readability. Used inappropriately, however, vertical alignment actually makes it harder to see what is really going on in your code.

Challenges

Vertical alignment is a "high maintenance" format. Add a new, long variable name, and you find yourself reformatting 20 other lines of code to match.

Name

STYL-06: Comment tersely with value-added information

Synopsis

The best way to explain what your code is doing is to let that code speak for itself. You can take advantage of many self-documentation techniques, including:

§ Use meaningful variable, procedure, and function names.

§ Use the language construct that best reflects the code you are writing (choose the right kind of loop for your logic, label loops and BEGIN-END blocks, etc.).

Whenever you find yourself adding a comment to your code, first consider whether it is possible to modify the code itself to express your comment. Good reasons to add comments include:

§ Program headers, explanations of workarounds, patches, operating-system dependencies, and other "exceptional" circumstances

§ Complex or opaque logic

Example

Let's follow a trail of unnecessarily commented code to self-documenting code. We start with:

-- If the first properties element is N...

IF properties1 = 'N'

Yikes! Our line of code was incomprehensible and our comment simply repeated the code using the English language, rather than the stored program language. No added value, no real assistance, yet not at all uncommon. The least we can do is use the comment to "translate" from computer-talk to business requirement:

-- If the customer is not eligible for a discount...

IF properties1 = 'N'

That's better, but we have created a redundancy: if our requirement ever changes, We have to change the comment and the code. Why not change the names of our variables and literals so that the code explains itself?

IF customer_discount_flag = const_ineligible

Much better! Now we no longer need a comment. Our remaining concern with this line of code is that it "exposes" a business rule; it shows how (at this moment in time) we determine whether a customer is eligible for a discount. Business rules are notorious for changing over time—and for being referenced in multiple places throughout our application. So our best bet is to hide the rule behind a self-documenting function call:

IF NOT eligible_for_discount (customer_id)

Variables

The MySQL stored program language is technically a strongly typed language in the sense that before you can work with a variable, you must first declare it. And when you declare it, you specify its type and, optionally, an initial or default value. Be aware, however, when not in "strict" mode (sql_mode contains neither STRICT_TRANS_TABLES nor STRICT_ALL_TABLES), MySQL will generate warnings only when you violate a variable's type or storage limits.

We strongly urge you, therefore, to take special care with declaring your variables.

Name

DAT-01: Use a consistent and meaningful variable naming style

Synopsis

All of us have a distinct variable naming style, often based on the conventions of our first programming language. In the very early days of programming, programmers were required to keep variable names short so as to reduce memory overhead. Various programming languages impose additional restrictions on the programmer: case-sensitivity, maximum lengths, and allowable characters, for instance.

However programmers might differ as regards the "one true style" that is optimal for a particular language, almost every programmer would agree that, above all, variable names should be meaningful, and whatever style might be employed, it should be employed consistently throughout your programs.

Meaningful variable names are those that clearly articulate the data that the variable holds. It's as simple as that. Avoid variable names that contain confusing or ambiguous abbreviations, and certainly avoid meaningless variable names such as v1, v2, etc.

Beyond being meaningful, conventions can help us understand the scope, data type, or some other property of our variables . In MySQL stored programs, we could use a convention that allows us to:

§ Determine the data type of a variable from its name.

§ Distinguish table column names from local variables or parameters.

§ Identify the type of data held in the variables: data from a cursor, intermediate data, bits of SQL for a dynamic SQL, etc.

We believe that most of the above items are matters of personal preference and that, while arguments can be made for and against any or all of these styles, you can write high-quality code regardless of the style you adopt. We feel that the following recommendations, however, should be followed:

§ You should generally identify local variables with a prefix or a suffix, especially if they are used to receive values from cursors. Creating local variables with the same name as a column returned by a cursor is dangerous.

§ Because variable names are case insensitive, "camel" notation—in which capitalization is used to separate "words" within a variable name—is possibly inappropriate since isNull and isnull will reference the same variable.

Example

If you're not sold on the value of meaningful variable names, try to work out what this assignment statement is doing:

SET ns=gs-tx+bn-fd;

Now try one with meaningful variable names:

SET net_salary=gross_salary-tax+bonus-fund401k;

Name

DAT-02: Avoid overriding variable declarations within "inner" blocks

Synopsis

It is possible to declare a variable inside an inner block that has the same name as a variable in the enclosing block. Nevertheless—though legal—this practice can be extremely confusing.

For instance, in the following example the v_counter variable is declared both within the inner block and within the outer block:

DECLARE v_counter INT DEFAULT 1;

. . . Lots of code . . .

inr_blk: BEGIN

DECLARE v_counter INT DEFAULT 2;

. . . Lots of code . . .

SET v_counter=v_counter+1;

END inr_blk;

There are two undesirable consequences to this practice:

§ Someone reading the code might be confused as to which variable is being updated.

§ It is not possible in the inner block to modify the value of a variable in the outer block. The SQL:2003 specification allows us to prefix a variable name with its block label, but this isn't supported in MySQL yet.

It's much better to ensure that every variable declared in a stored program has a unique name, regardless of its block scope.

Name

DAT-03: Replace complex expressions with functions

Synopsis

A Boolean function evaluates to one of three values: TRUE (1), FALSE (0), or NULL. You can use Boolean functions to hide complex expressions; the result is code that is virtually as readable as "straight" English—or whatever language you use to communicate with other human beings.

Example

Consider this code:

SELECT salary, status, hire_date

INTO v_total_sal, v_emp_status, v_hire_date

FROM employees

WHERE employee_id=in_employee_id;

IF (v_total_sal BETWEEN 10000 AND 50000)

AND v_emp_status = 'N'

AND DATEDIFF(NOW( ), v_hire_date)> 365

THEN

CALL give_raise (in_employee_id);

END IF;

Wow, that's hard to understand! It would be much easier if the code looked like this:

IF eligible_for_raise(in_employee_id) THEN

CALL give_raise(in_employee_id);

END IF;

Benefits

It will be much easier for anyone to read your code; you can literally read it. If you then need to understand how the Boolean expression is computed, you can look "under the covers."

This is a technique that can be applied (with care) to existing "spaghetti code." As you go into a program to fix or enhance it, look for opportunities to simplify and shorten executable sections by shifting complexity to other functions or procedures.

Challenges

Before you modify existing code, make sure you have solid unit test scripts in place so you can quickly verify that your changes haven't introduced bugs into the program.

Name

DAT-04: Remove unused variables and code

Synopsis

You should go through your programs and remove any part of your code that is no longer used. This is a relatively straightforward process for variables and named constants. Simply execute searches for a variable's name in that variable's scope. If you find that the only place it appears is in its declaration, delete the declaration and, by doing so, delete one more potential question mark from your code.

There is never a better time to review all the steps you took, and to understand the reasons you took them, than immediately upon completion of your program. If you wait, you will find it particularly difficult to remember those parts of the program that were needed at one point but were rendered unnecessary in the end. "Dead zones" in your code become sources of deep insecurity for maintenance programmers.

Example

The following block of code has several dead zones that could cause a variety of problems. Can you find them all?

CREATE PROCEDURE weekly_check (

in_isbn VARCHAR(20),

in_author VARCHAR(60)

)

BEGIN

DECLARE v_count INT;

DECLARE v_counter INT;

DECLARE v_available INT;

DECLARE v_new_location INT DEFAULT 1056;

DECLARE v_published_date DATE DEFAULT NOW( );

SET v_published_date=book_published_date(in_isbn);

IF DATE_SUB(NOW( ), INTERVAL 60 DAY) > v_published_date THEN

CALL review_usage( );

ELSEIF DATE_SUB(NOW( ), INTERVAL 24 DAY) > v_published_date

THEN

CALL check_availability (in_isbn, v_available, v_count);

IF v_available

AND /* Turn off due to Req A12.6 */ FALSE

THEN

CALL transfer_book (in_isbn, v_count - 1, v_new_location);

END IF;

-- Check for reserves

-- CALL analyze_requests (isbn_in);

END IF;

END$$

Here are a few potential dead spots:

§ The in_author parameter is declared but never used. It doesn't even have a default value, so you have to pass in an ignored value.

§ v_counter is declared but not used.

§ v_published_date is assigned a default value of NOW( ), which is immediately overridden by the call to book_published_date.

§ The call to transfer_book has been turned off with the addition of AND FALSE.

§ The call to analyze_requests has been commented out.

Benefits

It's much easier to maintain, debug, and enhance code that doesn't have "dead zones."

Challenges

There are sometimes valid reasons for keeping dead code in place. You may want to turn off code temporarily. Also, you may need to comment out some logic but still show that this action was done and explain why. In such cases, make sure that you include the necessary documentation in the code. Even better, use problem-tracking or bug-reporting software to keep a comprehensive history of any changes made to code.

Name

DAT-05: Don't assume that the result of an expression is TRUE or FALSE; it could be NULL

Synopsis

Three-valued logic—the logic that includes NULLs—is an essential part of the relational database model. However, the tendency of humans to think in terms of two-valued logic—an expression is either TRUE or FALSE—can lead to serious logic bugs.

For instance, consider the following logic, which is intended to retire employees older than 65 years and older, and keep those younger than 65 years:

IF v_date_of_birth > DATE_SUB(NOW( ), INTERVAL 65 YEAR) THEN

CALL keep_employee( v_employee_id);

ELSE

CALL retire_employee( v_employee_id);

END IF;

This logic seems valid from a two-valued logic perspective, but what if v_date_of_birth is NULL? If the date of birth is NULL, then the date comparison will return NULL, rather than TRUE or FALSE. Consequently, the ELSE condition will be executed and we will retire an employee, although in fact we have no idea how old the employee is.

NULL values can be handled in a couple of ways:

§ Explicitly check that a value is NOT NULL before attempting a comparison.

§ Explicitly check each condition: don't assume that an expression that is not TRUE, is necessarily FALSE.

If we are worried about the date of birth being NULL in the above example, we might recode it as follows:

SET v_age_in_years=DATEDIFF(NOW( ), v_date_of_birth)/365.25;

IF v_age_in_years > 65 THEN

CALL retire_employee( v_employee_id);

ELSEIF v_age_in_years <= 65 THEN

CALL keep_employee( v_employee_id);

ELSE

CALL invalid_dob_error( v_employee_id);

END IF;

Name

DAT-06: Employ "user" variables for global data sparingly

Synopsis

A global variable is a data structure that can be referenced outside the scope or block in which it's declared. In MySQL, we can use "user" variables—which can be recognized by being prefixed with @—to set values that are available to any program within the current session.

In the following procedure, for example, we store the number of customers into the user variable @customer_count:

CREATE PROCEDURE sp_customer_count( )

SELECT COUNT(*)

INTO @customer_count

FROM customers;

Other procedures can examine the @customer_count and make decisions without having to recalculate the value. For instance, in this procedure we use the session variable in our setup logic:

CREATE PROCEDURE sp_crm_setup ( )

BEGIN

IF @customer_count IS NULL THEN

CALL sp_customer_count( );

END IF;

IF @customer_count > 1000 THEN

. . . Logic for larger enterprises . . ..

There is no doubt that the use of global variables can create easy solutions for difficult problems. However, the modern consensus is that global variables create their own problems and that these problems generally overwhelm any of the advantages they might confer.

Global variables defeat modularity and hinder code reuse, because any module that uses a global variable becomes dependent on some other module that creates or initializes the global variable. In the case of MySQL user variables—which don't require a formal declaration—there is also the chance that two programmers might create identical "global" variables of the same name, thus causing subtle bugs that might occur only when modules are called in a certain order.

References

Code Complete, by Steve McConnell (Microsoft Press) contains an excellent discussion on the pros and cons of global variables.

Name

DAT-07: Create stored programs in strict mode to avoid invalid data assignments

Synopsis

Stored program type checking is very dependent on the setting of the sql_mode configuration variable. If a program is created when the sql_mode variable includes one of the "strict" settings (STRICT_TRANS_TABLES or STRICT_ALL_TABLES), then the program will reject invalid variable assignments with an error. If neither of the strict modes is in effect, then the stored program will generate a warning when invalid data assignments occur, but will continue execution.

For instance, in the following program, we accidentally declared a variable as CHAR(1) instead of INT:

CREATE PROCEDURE TenPlusTen( )

BEGIN

DECLARE a INTEGER DEFAULT 10;

DECLARE b CHAR(1) DEFAULT 10;

DECLARE c INTEGER;

SET c=a+b;

SELECT c ;

END;

If created in "non-strict" mode, this program generates a warning, but continues execution and returns the wrong result (10+10=11?):

mysql> CALL TenPlusTen( );

+------+

| C |

+------+

| 11 |

+------+

1 row in set (0.00 sec)

Query OK, 0 rows affected, 1 warning (0.01 sec)

mysql> SHOW WARNINGS;

+---------+------+----------------------------------------+

| Level | Code | Message |

+---------+------+----------------------------------------+

| Warning | 1265 | Data truncated for column 'B' at row 1 |

+---------+------+----------------------------------------+

1 row in set (0.00 sec)

If created in strict mode, the program generates an error during execution, which is clearly better than returning the wrong result:

mysql> CALL TenPlusTen( );

ERROR 1406 (22001): Data too long for column 'b' at row 1

Non-strict stored program behavior can lead to unexpected and subtle bugs, and we recommend that you use strict mode when creating your stored programs. To enable strict mode, assign one of STRICT_TRANS_TABLES or STRICT_ALL_TABLES to your sql_mode variable:

SET sql_mode='STRICT_TRANS_TABLES';

Remember, it is the sql_mode that was in effect when the program is created that determines program behavior.

Conditional Logic

Follow the best practices in this section when you are using IF or CASE statements in stored programs.

Name

IF-01: Use ELSEIF with mutually exclusive clauses

Synopsis

When you need to write conditional logic that has several mutually exclusive clauses (in other words, if one clause is TRUE, no other clause evaluates to TRUE), use the ELSEIF construct:

IF condA THEN

...

ELSEIF condB THEN

...

ELSEIF condN THEN

...

ELSE

...

END IF;

Example

At first glance, the following statement makes sense, but on closer examination, it's a mess:

CREATE PROCEDURE process_lineitem(line_in INT)

BEGIN

IF line_in = 1 THEN

CALL process_line1( );

END IF;

IF line_in = 2 THEN

CALL process_line2( );

END IF;

...

IF line_in = 2045 THEN

CALL process_line2045( );

END IF;

END$$

Every IF statement is executed and each condition is evaluated. You should rewrite such logic as follows:

CREATE PROCEDURE process_lineitem(line_in INT)

BEGIN

IF line_in = 1 THEN

CALL process_line1( );

ELSEIF line_in = 2 THEN

CALL process_line2( );

/*... */

ELSEIF line_in = 2045 THEN

CALL process_line2045( );

END IF;

END$$

Benefits

This structure clearly expresses the underlying "reality" of your business logic: if one condition is TRUE, no others can be TRUE.

ELSEIF offers the most efficient implementation for processing mutually exclusive clauses. When one clause evaluates to TRUE, all subsequent clauses are ignored.

Name

IF-02: Use IF...ELSEIF only to test a single, simple condition

Synopsis

The real world is very complicated; the software we write is supposed to map those complexities into applications. The result is that we often end up needing to deal with convoluted logical expressions.

You should write your IF statements in such a way as to keep them as straightforward and understandable as possible. For example, expressions are often more readable and understandable when they are stated in a positive form. Consequently, you are probably better off avoiding the NOToperator in conditional expressions.

Example

It's not at all uncommon to write or maintain code that is structured like this:

IF condA AND NOT (condB OR condC) THEN

CALL proc1;

ELSEIF condA AND (condB OR condC) THEN

CALL proc2;

ELSEIF NOT condA AND condD THEN

CALL proc3;

END IF;

It's also fairly common to get a headache trying to make sense of all of that. You can often reduce the trauma by trading off the simplicity of the IF statement itself (one level of IF and ELSEIF conditions) for the simplicity of clauses within multiple levels:

IF condA THEN

IF (condB OR condC) THEN

CALL proc2;

ELSE

CALL proc1;

END IF;

ELSEIF condD THEN

CALL proc3

END IF;

Don't forget, by the way, to take into account the possibility of your expressions evaluating to NULL. This can throw a monkey wrench into your conditional processing.

Benefits

Following this best practice will make your code easier to read and maintain.

Breaking an expression into smaller pieces can aid maintainability; if and when the logic changes, you can change one IF clause without affecting the logic of others.

Challenges

Multiple levels of nested IF statements can also decrease readability. You need to strive for a workable balance.

Name

IF-03: Make sure that a CASE statement is inclusive, or construct a handler to catch any unmatched cases

Synopsis

If none of the CASE statements match as the input condition, CASE will raise MySQL error 1339 (Case not found for CASE statement). You should either construct an error handler to ignore this error, or ensure that the exception never occurs by including an ELSE clause in everyCASE statement (the easier solution).

Example

In the following example, the CASE statement will fail if the customer status is not one of 'PLATINUM', 'GOLD', 'SILVER', or 'BRONZE':

CASE customer_status

WHEN 'PLATINUM' THEN

CALL apply_discount(sale_id,20); /* 20% discount */

WHEN 'GOLD' THEN

CALL apply_discount(sale_id,15); /* 15% discount */

WHEN 'SILVER' THEN

CALL apply_discount(sale_id,10); /* 10% discount */

WHEN 'BRONZE' THEN

CALL apply_discount(sale_id,5); /* 5% discount*/

END CASE;

Here we add an ELSE clause to avoid the error. Since we don't have anything for the ELSE clause to do, we use a dummy SET statement.

CASE customer_status

WHEN 'PLATINUM' THEN

CALL apply_discount(sale_id,20); /* 20% discount */

WHEN 'GOLD' THEN

CALL apply_discount(sale_id,15); /* 15% discount */

WHEN 'SILVER' THEN

CALL apply_discount(sale_id,10); /* 10% discount */

WHEN 'BRONZE' THEN

CALL apply_discount(sale_id,5); /* 5% discount */

ELSE

SET dummy=dummy;

END CASE;

In this alternative solution, we construct a handler to allow the error to be ignored:

DECLARE not_found INT DEFAULT 0;

DECLARE no_matching_case CONDITION FOR 1339;

DECLARE CONTINUE HANDLER FOR no_matching_case SET not_found=1

CASE

WHEN (sale_value>200) THEN

CALL free_shipping(sale_id);

CASE customer_status

WHEN 'PLATINUM' THEN

CALL apply_discount(sale_id,20);

WHEN 'GOLD' THEN

CALL apply_discount(sale_id,15);

WHEN 'SILVER' THEN

CALL apply_discount(sale_id,10);

WHEN 'BRONZE' THEN

CALL apply_discount(sale_id,5);

END CASE;

See Chapter 5 for more details.

Name

IF-04: Use CASE and IF consistently

Synopsis

Any conditional statement that can be expressed as an IF statement can also be expressed as a CASE statement—and vice versa. While you might heatedly debate the relative benefits of each over a few beers after work, it's fairly clear that you can write high-quality code no matter which statement you employ.

However, randomly alternating between the two statements does not lead to high-quality code. It's harder to compare the logic of two routines if—for instance—one expresses its branching logic with the CASE statement while the other uses IF. So try not to mix IF and CASE arbitrarily within your programs.

Loop Processing

Follow the best practices in this section when you are performing iterative processing in stored programs using the various looping controls: LOOP, WHILE, and REPEAT.

Name

LOOP-01: Make sure the loop will terminate

Synopsis

One of the most annoying and potentially disruptive bugs that can be created in any language is the inadvertent infinite loop.

Making sure that a loop will terminate requires that you simulate all possible paths through the loop and assure yourself that the loop will always encounter an exit condition. If the loop does not terminate, it will likely consume excessive CPU and/or memory resources until it is manually terminated by the system administrator. In a worst-case scenario, the MySQL server itself may be terminated.

Example

The following stored procedure calculates the number of prime numbers less than the supplied input parameter. It's part of a larger routine that we plan to put in action when we're next contacted by extraterrestrial intelligences that announce their presence by broadcasting prime numbers at planet Earth.

CREATE PROCEDURE check_for_primes(in_limit INT)

BEGIN

DECLARE i INT DEFAULT 2;

DECLARE j INT DEFAULT 1;

DECLARE n_primes INT DEFAULT 0;

DECLARE is_prime INT DEFAULT 0;

REPEAT

-- See if i is a prime number

SET j=2;

SET is_prime=1;

divisors: WHILE(j< i) DO

IF MOD(i,j)=0 THEN

SET is_prime=0;

LEAVE divisors;

END IF;

SET j=j+1;

END WHILE;

IF is_prime THEN

SET n_primes=n_primes+1;

END IF;

-- Move onto the next number

IF (MOD(i,2)=0) THEN

SET i=i+1;

ELSE

-- Next number is even, no need

-- to check for it as a prime

SET i=i+2;

END IF;

UNTIL (i=in_limit) END REPEAT;

SELECT CONCAT(n_primes,' prime numbers <= ',in_limit);

END$$

Unfortunately, this routine has a bug that will lead to an infinite loop if the input number is even. A clever programmer altered the loop increment value so that even numbers—which can never be prime—were skipped as the loop incremented. Unfortunately, the UNTIL loop contains an equality check, i=in_limit, that will never be satisfied if the input parameter is even, and hence the loop will never terminate.

This bug could have been detected or averted in a number of ways:

§ Walk-through of the program's algorithm

§ Testing of the routine with a variety of inputs (including, of course, even numbers)

§ Adoption of a defensive programming philosophy that could have led to the inclusion of a more robust i>in_limit condition in the UNTIL clause

Name

LOOP-02: Make the termination conditions of a loop obvious

Synopsis

Loop logic is easier to determine if all the control logic is in one place, either in the WHILE or UNTIL clauses or in a LEAVE statement within the loop. It's particularly confusing to include a RETURN statement within a loop.

To that end, we suggest that you avoid LEAVE or RETURN statements within WHILE or REPEAT UNTIL loops.

Example

In the following example, borrowed from the prime number routine in the preceding section, a WHILE loop contains a LEAVE clause—there are two ways for the loop to terminate, and this makes the code harder to analyze and trace:

SET j=2;

SET is_prime=1;

divisors: WHILE(j< i) DO

IF MOD(i,j)=0 THEN

SET is_prime=0;

LEAVE divisors;

END IF;

SET j=j+1;

END WHILE;

One way to improve the readability of the loop would be to move all of the termination logic into the WHILE clause:

SET j=2;

SET is_prime=1;

divisors: WHILE(j< i AND is_prime=1) DO

IF MOD(i,j)=0 THEN

SET is_prime=0;

END IF;

SET j=j+1;

END WHILE;

Alternatively, we could employ a simple loop and place all termination logic within the loop.

Name

LOOP-03: Use a single LEAVE in simple loops

Synopsis

This best practice is another variation on "one way in, one way out." It suggests that, whenever possible, you consolidate all exit logic in your simple loop to a single LEAVE statement.

Example

Here is another variant on our prime counting loop. It contains some new logic to handle the special cases of 1 and 2 (1 is not prime; 2 is prime).

SET j=2;

SET is_prime=1;

divisors: LOOP

IF (j=1) THEN

SET is_prime=0;

LEAVE divisors;

END IF;

IF (j=2) THEN

SET is_prime=1;

LEAVE divisors;

END IF;

IF MOD(i,j)=0 THEN

SET is_prime=0;

END IF;

SET j=j+1;

IF (is_prime=0 OR j>=i ) THEN

LEAVE divisors;

END IF;

END LOOP divisors;

The multiple LEAVE statements make it difficult for us to work out which segments of the code are actually executed for any given number. A rewrite that relies on a single LEAVE looks like this:

SET j=2;

SET is_prime=1;

divisors: LOOP

IF (i=1) THEN

SET is_prime=0;

ELSEIF (i=2) THEN

SET is_prime=1;

ELSEIF MOD(i,j)=0 THEN

SET is_prime=0;

SELECT i,'is divisible by',j;

END IF;

IF (i=2 OR is_prime=0 OR j+1>=i ) THEN

LEAVE divisors;

END IF;

SET j=j+1;

END LOOP divisors;

Now we have a single place in the code where we make the decision to leave the loop, and, consequently, our code is more readable and robust.

Name

LOOP-04: Use a simple loop to avoid redundant code required by a WHILE or REPEAT UNTIL loop

Synopsis

This guideline is particularly relevant when you are writing cursor loops.

The structure of MySQL cursors, and the necessity of setting an indicator variable to detect the end of the cursor, means that you usually want to execute the cursor loop at least once. You will then continue executing the loop until the indicator variable changes.

This sounds like a perfect opportunity to apply the REPEAT UNTIL loop. So as you start to create the program, you create a structure that looks like this:

DECLARE dept_csr CURSOR FOR

SELECT department_name

FROM departments;

DECLARE CONTINUE HANDLER FOR NOT FOUND SET no_more_departments=1;

OPEN dept_csr;

REPEAT

FETCH dept_csr INTO v_department_name;

UNTIL (no_more_departments) END REPEAT;

CLOSE dept_csr;

SET no_more_departments=0;

Of course, you always want to do something with the data fetched from a cursor, but you need to make sure that you don't try to process data after the last row has been returned. So in order to keep the REPEAT loop, you create an IF structure to enclose your processing:

DECLARE dept_csr CURSOR FOR

SELECT department_name

FROM departments;

DECLARE CONTINUE HANDLER FOR NOT FOUND SET no_more_departments=1;

OPEN dept_csr;

REPEAT

FETCH dept_csr INTO v_department_name;

IF (no_more_departments=0) THEN

SET v_count= v_count+1;

END IF;

UNTIL (no_more_departments) END REPEAT;

CLOSE dept_csr;

SET no_more_departments=0;

The problem with this solution is that you now have redundant tests to determine if you have reached the end of the cursor. If you change the CONTINUE handler, you will have to change your code in two places.

The code would be simpler and more maintainable if the test were conducted only once:

DECLARE CONTINUE HANDLER FOR NOT FOUND SET no_more_departments=1;

OPEN dept_csr;

dept_loop: LOOP

FETCH dept_csr INTO v_department_name;

IF (no_more_departments) THEN

LEAVE dept_loop;

END IF;

SET v_count= v_count+1;

END LOOP;

CLOSE dept_csr;

SET no_more_departments=0;

Exception Handling

Even if you write such amazing code that it contains no errors and never acts inappropriately, your users might still use your program incorrectly. The result? Situations that cause programs to fail. MySQL provides exceptions to help you catch and handle error conditions.

Name

EXC-01: Handle exceptions that cannot be avoided but can be anticipated

Synopsis

If you are writing a program in which you can predict that a certain error will occur, you should include a handler in your code for that error, allowing for a graceful and informative failure.

Example

This recommendation is easily demonstrated with a simple, single-row lookup cursor. An error that often occurs is No data to FETCH, which indicates that the cursor didn't identify any rows. Consider the following function that returns the name of a department for its ID:

CREATE FUNCTION department_name(in_dept_id INT) RETURNS VARCHAR(30)

READS SQL DATA

BEGIN

DECLARE v_dept_name VARCHAR(30);

DECLARE dept_csr CURSOR FOR

SELECT department_name

FROM departments

WHERE department_id=in_dept_id;

OPEN dept_csr;

FETCH dept_csr INTO v_dept_name;

CLOSE dept_csr;

RETURN v_dept_name;

END;

As currently coded, this function will raise the No data to FETCH error if an invalid department ID is passed in.

mysql> SELECT department_name(1);

+--------------------+

| department_name(1) |

+--------------------+

| DUPLIN |

+--------------------+

1 row in set (0.00 sec)

mysql> SELECT department_name(60);

ERROR 1329 (02000): No data to FETCH

That may be fine for some scenarios, but in this particular case, we simply want to return a special string (No such Department). The program that calls department_name can then decide for itself if it wants or needs to raise an error or simply proceed. In this case, the solution is to add a simple CONTINUE handler:

CREATE FUNCTION department_name(in_dept_id INT) RETURNS VARCHAR(30)

READS SQL DATA

BEGIN

DECLARE v_dept_name VARCHAR(30);

DECLARE dept_csr CURSOR FOR

SELECT department_name

FROM departments

WHERE department_id=in_dept_id;

DECLARE CONTINUE HANDLER FOR NOT FOUND

SET v_dept_name='No such Department';

OPEN dept_csr;

FETCH dept_csr INTO v_dept_name;

CLOSE dept_csr;

RETURN v_dept_name;

END;

Name

EXC-02: Use named conditions to improve code readability

Synopsis

Any MySQL programmer worth her salt knows all the MySQL error codes by heart, right? Wrong!

Exception handlers defined against MySQL error codes might work, but they will almost never be easy to read.

The best way to improve the readability of your exception handling routines is to define a named condition for every MySQL error code that you might be anticipating. So instead of the following declaration:

DECLARE CONTINUE HANDLER FOR 1216 mysql_statements;

you should use the following, more readable pair of declarations:

DECLARE foreign_key_error CONDITION FOR 1216;

DECLARE CONTINUE HANDLER FOR foreign_key_error mysql_statements;

Name

EXC-03: Be consistent in your use of SQLSTATE and MySQL error codes in exception handlers

Synopsis

You often have the choice between a MySQL error code and an ANSI-standard SQLSTATE code when creating your exception handler. Be as consistent as possible in your choice between the two. In some cases, an explicit SQLSTATE code might not be available for the error you are trying to catch, and you will want to use a MySQL error code. Unless portability is your primary concern—and in reality, this will rarely be the case—we recommend that you use MySQL error codes exclusively in your stored programs.

Name

EXC-04: Avoid global SQLEXCEPTION handlers until MySQL implements SIGNAL and SQLCODE features

Synopsis

In the initial 5.0 release of MySQL, it is not possible to access the MySQL error code or SQLSTATE code that caused a handler to be invoked. You also can't raise your own exceptions (the SIGNAL/RESIGNAL statements are not yet supported). What this means is that unless your handler is very specific, you won't know exactly why it was raised. Furthermore, you won't have a reliable mechanism for propagating the exception to the calling program.

Under normal circumstances, it can be very helpful to implement a general-purpose exception handler. This handler would acquire all kinds of handy information about the current state. If, however, you are unable to determine the error that was raised, this kind of general-purpose handler is of little use, and it can even cause a loss of useful information. For instance, in the following example, a general-purpose hander is invoked but cannot report accurately the reason it fired:

DECLARE CONTINUE HANDLER FOR SQLEXCEPTION

BEGIN

SET v_status=-1;

SET v_message='Some sort of error detected somewhere in the application';

END;

Given these restrictions, it is best not to create general SQLEXCEPTION handlers. Rather, you should handle only specific, foreseeable errors, and let the calling program handle any unexpected errors.

SQL in Stored Programs

One area in which the MySQL stored program language really shines is the ease with which you can include SQL inside of stored program code (this was, after all, one of the key motivations for the introduction of this functionality in MySQL). While you might occasionally write stored programs without any SQL, it would be almost completely pointless to use stored programs if it weren't for their ability to issue SQL.

Best practices related to SQL inside of MySQL stored programs are, therefore, among the most important in this chapter.

Name

SQL-01: Start a transaction explicitly with the START TRANSACTION statement

Synopsis

Although MySQL will automatically initiate a transaction on your behalf when you issue DML statements, you should issue an explicit START TRANSACTION statement in your program to mark the beginning of your transaction.

It's possible that your stored program might be run within a server in which autocommit is set to TRUE, and by issuing an explicit START TRANSACTION statement you ensure that autocommit does not remain enabled during your transaction. START TRANSACTION also aids readability by clearly delineating the scope of your transactional code.

Name

SQL-02: Don't leave transactions "dangling"

Synopsis

Once you start a transaction, you should take responsibility for completing the transaction. Since transactions lock rows and potentially block other transactions, you need to ensure that transactions do not persist indefinitely. Generally, you should place the START TRANSACTION and COMMITor ROLLBACK statements in the same stored program. This program may also call other programs, and you need to make sure that these called programs do not contain transactional code.

There are some exceptions to this recommendation. In particular, modular design might prompt you to break down a transaction into separate modules and control the overall transaction state from a master procedure.

Name

SQL-03: Avoid use of savepoints—they can obscure program logic and reduce program efficiency

Synopsis

Savepoints allow you to define a point within a transaction to which you can roll back without losing all of the changes made by the transaction. In essence, a savepoint facilitates the "partial rollback" of a transaction.

Indiscriminate use of savepoints can lead to inefficient and hard-to-maintain code. This is because when you roll back to a savepoint, your program flow is harder to follow, and you have almost by definition wasted system resources by issuing DML that you later aborted.

Quite often, you will find that instead of rolling back to a savepoint, you can simply issue a SELECT statement to validate an operation prior to actually issuing the DML. This technique was demonstrated in Chapter 8.

A valid use of a savepoint is within a stored program that you are using to execute a "nested" transaction without affecting the status of a transaction that may be in progress in the calling program. The "nested" program creates a savepoint and rolls back to that savepoint if any errors occur. In this way the procedure could be safely called by a program that has an open transaction, since any rollback issued in the nested program would affect only statements issued in that program.

Name

SQL-04: Use an appropriate locking strategy

Synopsis

There are two major patterns in transaction management: the optimistic locking strategy and the pessimistic locking strategy.

The pessimistic locking strategy assumes that concurrent updates are quite likely. To prevent this, the transaction locks rows as they are read. Other transactions that want to update the row must wait until the pessimistic transaction ends.

The optimistic locking strategy assumes that in the period of time between a user reading and then updating a row, it is unlikely that another user will attempt to update that same row. Of course, optimism in and of itself is not sufficient; when following this locking strategy, the program should check to ensure that the row has not been updated, immediately prior to the update. If the row has been updated, then the transaction is aborted.

Each locking strategy is based on assumptions regarding the behavior of other transactions or application users. Each has different implications for the duration of any locks acquired during the transaction and the possibility that a transaction will be aborted. Make sure that you weigh carefully the implications of the two strategies and pick the approach that best suits your application.

Name

SQL-05: Keep transactions small

Synopsis

The larger the transaction, the more likely it is that the transaction will lock rows needed by another transaction, and the greater the chance that a deadlock might occur. Transactions should therefore usually be no larger than is absolutely necessary.

Name

SQL-06: Always reset the NOT FOUND variable after completing a cursor loop

Synopsis

You should usually terminate a cursor loop when a CONTINUE handler for the NOT FOUND condition fires and modifies the value of a status variable. For instance, in the following fragment, the CONTINUE handler sets the v_last_row_fetched variable to 1, and we test this value after eachFETCH call:

DECLARE CONTINUE HANDLER FOR NOT FOUND SET v_last_row_fetched=1;

OPEN cursor1;

cursor_loop:LOOP

FETCH cursor1 INTO v_customer_name, v_contact_surname, v_contact_firstname;

IF v_last_row_fetched=1 THEN

LEAVE cursor_loop;

END IF;

-- Do something with the row fetched.

END LOOP cursor_loop;

CLOSE cursor1;

SET v_last_row_fetched=0;

It is important to reset this status value to 0 after the cursor loop terminates; otherwise, subsequent or nested cursor loops may terminate prematurely.

The following code incorrectly fetches employees for only a single department, because after the first cursor loop, the status variable continues to indicate that the last row has been fetched:

DECLARE CONTINUE HANDLER FOR NOT FOUND

SET v_not_found=1;

SET v_dept_id=1;

WHILE( v_dept_id<=10) DO

OPEN dept_emp_csr;

emp_loop:LOOP

FETCH dept_emp_csr INTO v_employee_id;

IF v_not_found THEN

LEAVE emp_loop;

END IF;

CALL process_employee( v_employee_id);

END LOOP;

CLOSE dept_emp_csr;

SET v_dept_id= v_dept_id+1;

END WHILE;

Name

SQL-07: Use SELECT FOR UPDATE when retrieving rows for later update

Synopsis

Use the SELECT FOR UPDATE statement to request that locks be placed on all rows identified by the query. You should do this whenever you expect to change some or all of those rows, and you don't want another session to change them out from under you. Any other session trying to update the rows, or lock the rows (perhaps using FOR UPDATE), will have to wait.

Example

Here we are processing a special bonus payment for needy employees. We issue the FOR UPDATE clause so that the rows concerned are locked until our transaction completes:

CREATE PROCEDURE needy_bonus( )

BEGIN

DECLARE v_employee_id INT;

DECLARE v_salary NUMERIC(8,2);

DECLARE v_last_emp INT DEFAULT 0;

DECLARE emp_csr CURSOR FOR

SELECT employee_id,salary

FROM employees

WHERE salary <45000

FOR UPDATE;

DECLARE CONTINUE HANDLER FOR NOT FOUND SET v_last_emp=1;

START TRANSACTION;

OPEN emp_csr;

emp_loop:LOOP

FETCH emp_csr INTO v_employee_id, v_salary;

IF v_last_emp THEN

LEAVE emp_loop;

END IF;

CALL grant_raise( v_employee_id, v_salary);

END LOOP emp_loop;

CLOSE emp_csr;

SET v_last_emp=0;

COMMIT;

END;

You can also use the LOCK IN SHARE MODE clause to lock the rows against update but continue to allow reads.

Name

SQL-08: Avoid including SQL in functions that may be used in SQL

Synopsis

You are free to include SQL statements within stored functions (with the exception of SQL statements that return result sets to the calling program). You should, however, be very wary of doing so if you think that your stored function might itself be called inside a SQL statement.

When you use a function that contains SQL in a SQL statement, you are effectively "nesting" two SQL statements. For every row returned by the "outer" SQL, you will have to execute the "inner" SQL. Such nested SQL statements can exhibit extremely unpredictable or undesirable performance.

For instance, consider the simple stored function below:

CREATE FUNCTION cust_contact_name (in_customer_id INT)

RETURNS VARCHAR(100)

READS SQL DATA

BEGIN

DECLARE v_contact_name VARCHAR(100);

SELECT CONCAT(contact_firstname,' ',contact_surname)

INTO v_contact_name

FROM customers

WHERE customer_id=in_customer_id ;

RETURN( v_contact_name);

END$

It contains an efficient query, but nevertheless, if we include it in a query against the customers table as follows:

SELECT cust_contact_name(customer_id) FROM customers

our execution time is about five times greater than if we performed the same operation within the SQL itself:

SELECT CONCAT(contact_firstname,' ', contact_surname) FROM customers

The situation becomes even worse if the SQL inside the function is not completely optimized. In Chapter 10 we provide an example in which the use of a stored function inside a SQL statement lengthens execution time by a factor of 300!

Dynamic SQL

"Dynamic" means that the SQL statement that you execute is constructed, parsed, and compiled at runtime, not at the time the code is compiled. Dynamic SQL offers a tremendous amount of flexibility—but also complexity and more than a little risk.

In the MySQL stored program language, you can process dynamic SQL by using the MySQL prepared statement feature. You can create a prepared statement with the PREPARE statement, supplying the SQL text in a session variable. The SQL can then be executed with the EXECUTEstatement.

Name

DYN-01: Bind, do not concatenate, variable values into dynamic SQL strings

Synopsis

When you bind a variable value into a dynamic SQL string, you can insert a "placeholder" into the string. This allows MySQL to parse a "generic" version of that SQL statement, which can be used over and over again, regardless of the actual value of the variable, without repeated parsing.

This technique also makes your code more resistant to SQL injection attacks (see Chapter 18), since the value supplied to placeholders cannot include SQL fragments.

Example

Here's an example of binding with the PREPARE and EXECUTE statements. This program updates any numeric column in the specified table, based on the supplied name:

CREATE PROCEDURE update_anything

(in_table VARCHAR(60),

in_where_col VARCHAR(60),

in_set_col VARCHAR(60),

in_where_val VARCHAR(60),

in_set_val VARCHAR(60))

BEGIN

SET @dyn_sql=CONCAT(

'UPDATE ' , in_table ,

' SET ' , in_set_col, ' = ?

WHERE ' , in_where_col, ' = ?');

PREPARE s1 FROM @dyn_sql;

SET @where_val=in_where_val;

SET @set_val=in_set_val;

EXECUTE s1 USING @where_val,@set_val;

DEALLOCATE PREPARE s1;

END$$

If you want to update the salary of employee #1 to $100,000, you might call this stored procedure as follows:

CALL update_anything_g('employees','employee_id','salary', 1,100000)

The dynamic SQL generated will look like this:

'UPDATE employees SET salary = ? WHERE employee_id = ?'

The ? characters indicate placeholders that will be replaced with the values for salary and employee_id. Those values are provided in the USING clause of the EXECUTE statement. Attempts to "inject" SQL into these values will fail (although injection into the table or column name parameters is still possible—we'll address that in the next best practice).

Name

DYN-02: Carefully validate any parameter values that might be used to construct dynamic SQL

Synopsis

Whenever you create a dynamic SQL statement based on parameters to a procedure or user inputs, you should always guard carefully against SQL injection (see Chapter 18). SQL injection allows the user to provide fragments of SQL as parameters to your stored programs, potentially subverting the resulting dynamic SQL.

Therefore, you should always carefully validate the inputs to your stored programs if they contribute to your dynamic SQL.

In the previous example, we prevented SQL injection through the careful use of placeholders. Variable binding could not, however, address the potential vulnerability of concatenating in the names of tables and columns.

In the modified version below, we perform a SQL query to confirm that the parameter inputs do, in fact, represent valid table and column names. Once we validate the inputs, we then construct and execute the dynamic SQL:

CREATE PROCEDURE update_anything_2

(in_table VARCHAR(60),

in_where_col VARCHAR(60),

in_set_col VARCHAR(60),

in_where_val VARCHAR(60),

in_set_val VARCHAR(60))

BEGIN

DECLARE v_count INT;

SELECT COUNT(*)

INTO v_count

FROM information_schema.columns

WHERE table_name=in_table

AND column_name IN (in_set_col,in_where_col);

IF ( v_count <2 ) THEN

SELECT 'Invalid table or column names provided';

ELSE

SET @dyn_sql=CONCAT(

'UPDATE ' , in_table ,

' SET ' ,in_set_col, ' = ?

WHERE ' , in_where_col, ' = ?');

SELECT @dyn_sql;

PREPARE s1 FROM @dyn_sql;

SET @where_val=in_where_val;

SET @set_val=in_set_val;

EXECUTE s1 USING @where_val,@set_val;

DEALLOCATE PREPARE s1;

END IF;

END;

Name

DYN-03: Consider the invoker rights method for stored code that executes dynamic SQL

Synopsis

The definer rights model—in which stored programs execute with the permissions of the creator rather than the invoker—generally confers significant security advantages, since you can allow access to database objects only under the controlled conditions implemented in your stored programs.

However, in the case of stored programs that contain dynamic SQL, the definer rights model can create security concerns, since these programs can conceivably be vulnerable to SQL injection, as described in Chapter 18. Since the creator of the stored program is almost always a highly privileged user, the implications of SQL injection into a definer rights procedure is potentially very serious indeed.

Whenever you create a stored program that processes a dynamic SQL statement, you should consider defining the program with the invoker rights model. Do this by adding the following clause to the program header:

SQL SECURITY INVOKER

This clause ensures that the dynamic SQL string is parsed under the authority of the account currently running the program.

Without the SQL SECURITY INVOKER clause, the stored program will execute with the privileges of the user that created the stored program. Since—by definition—you don't know exactly the full text of the dynamic SQL to be executed, you almost always want the SQL to be rejected if the user does not have sufficient privileges.

Using the alternative definer rights model also magnifies the potential vulnerabilities created should your stored program be susceptible to SQL injection.

Example

In the previous examples, we created a stored program that would update the value of any column in any table. Since we omitted the SQL SECURITY clause, a user can use the stored program to update tables to which she wouldn't normally have access. We didn't intend that!

So we should have defined the stored program with invoker rights, as follows:

CREATE PROCEDURE update_anything_2

(in_table VARCHAR(60),

in_where_col VARCHAR(60),

in_set_col VARCHAR(60),

in_where_val VARCHAR(60),

in_set_val VARCHAR(60))

SQL SECURITY INVOKER

BEGIN

. . . .

Program Construction

There are as many ways to write and structure a program as there are programmers—or so it sometimes seems. We offer suggestions on how to structure your programs and how best to design parameter lists that we have found effective.

Name

PRG-01: Encapsulate business rules and formulas behind accurately named functions

Synopsis

This might be one of the most important best practices you will ever read—and, we hope, follow. There is only one aspect of every software project that never changes: the fact that everything always changes. Business requirements, data structures, user interfaces: all of these things change and change frequently. Your job as a programmer is to write code that adapts easily to these changes.

So whenever you need to express a business rule (such as, "Is this string a valid ISBN?"), put it inside a subroutine that hides the individual steps (which might change) and returns the results (if any).

And whenever you need a formula (such as, "the total fine for an overdue book is the number of days overdue times $.50"), express that formula inside its own function.

Example

Suppose that you must be at least 10 years old to borrow books from the library. This is a simple formula and very unlikely to change. We set about building the application by creating the following logic:

IF v_dob > DATE_SUB(now( ), INTERVAL 10 YEAR) THEN

SELECT 'Borrower must be at least 10 yrs old';

ELSE

INSERT INTO borrower

(firstname,surname,date_of_birth)

VALUES( v_firstname, v_surname, v_dob);

END IF;

Later, while building a batch-processing script that checks and loads over 10,000 borrower applications, we include the following check in the program:

load_data:BEGIN

IF DATEDIFF(now( ), v_dob)/365 < 10 THEN

select ('Borrower is not ten years old.');

ELSE

. . . load data . . .

END IF;

END load_data;

And so on from there. We are left, unfortunately, with a real job on our hands when we get a memo that says: "In order to support a new city-wide initiative to increase literacy, the minimum age for a library card has been changed from 10 to 8." And then, of course and by the way, there is the minor bug we introduced into our second construction of the rule (we forgot about leap years).

If only we had created a simple function the first time we needed to calculate minimum valid age! It would be something like this:

CREATE FUNCTION borrower_old_enough (in_dob DATE)

RETURNS INT

NO SQL

BEGIN

DECLARE v_retval INT DEFAULT 0;

IF (in_dob < DATE_SUB(NOW( ), INTERVAL 10 YEAR)) THEN

SET v_retval=1;

ELSE

SET v_retval=0;

END IF;

RETURN( v_retval);

END;

And this function copes correctly with a NULL input, for which we forgot to check in those other programs. We can correct the age calculation logic in one place and easily change the minimum age from 10 to 8:

Benefits

You can update business rules and formulas in your code about as quickly and as often as users change that which was supposedly "cast in stone." Developers apply those rules consistently throughout the application base, since they are simply calling a program.

Your code is much easier to understand, since developers don't have to wade through complex logic to understand which business rule is being implemented.

Challenges

It's mostly a matter of discipline and advance planning. Before you start building your application, create a set of programs to hold business rules and formulas for distinct areas of functionality. Make sure that the names of the programs clearly identify their purpose. Then promote and use them rigorously throughout the development organization.

Name

PRG-02: Standardize module structure using function and procedure templates

Synopsis

Once you adopt a set of guidelines for how developers should write procedures and functions, you need to help those developers follow their best practices. The bottom line is that guidelines will be followed if you make it easier to follow them than to ignore them.

For module standards, you can use either of the following approaches:

§ Create a static template file that contains the generic logical structure for a procedure and/or function. Developers then copy that file to their own file, "de-genericize" the template by performing search-and-replace operations on placeholder strings with their own specific values (such as table names), and modify it from there.

§ Use a program (one that you've written or a commercially available tool) that generates the code you want. This approach can be more flexible and can save you time, depending on how sophisticated a generator you use/create.

Example

Here's a simple function template that reinforces the single RETURN recommendation and encourages a standard header.

CREATE FUNCTION f_<name>

(IN in_<parm> <datatype>)

RETURNS <datatype>

DETERMINISTIC

BEGIN

|| STANDARD COPYRIGHT STATEMENT HERE

|| Author:

|| File:

|| Modification history:

DECLARE retval <datatype> DEFAULT <value>

-- Put your code here

RETURN retval;

END

Some third-party products (Toad for MySQL, for instance) allow you to define such a template and have it automatically applied to new stored programs.

Benefits

The quality of each individual program is higher, since it's more likely to conform to best practices.

Programs are more consistent across the team and therefore easier to maintain and enhance.

Name

PRG-03: Limit execution section sizes to a single page (50-60 lines) using modularization

Synopsis

Sure, you're laughing out loud. You write code for the real world. It's really complicated. Only 50 or 60 lines? You're lucky if your programs are less than 500 lines! Well, it's not a matter of complexity; it's more an issue of how you handle that complexity.

If your executable sections go on for hundreds of lines, with a loop starting on page 2 and ending on page 6, and so on, you will have a hard time "grasping the whole" and following the logic of the program.

An alternative is to use step-wise refinement (a.k.a. "top down decomposition"): don't dive into all the details immediately. Instead, start with a general description (written in actual code, mind you) of what your program is supposed to do. Then implement all subprogram calls in that description following the same method.

The result is that at any given level of refinement, you can take in and easily comprehend the full underlying logic at that level. This technique is also referred to as "divide and conquer."

Example

Consider the following procedure. The entire program might be hundreds of lines long, but the main body of assign_workload (starting with BEGIN /*main*/) is only 24 lines long. Not only that, you can read it pretty much as an exciting novel: "For every telesales rep, if that person's case load is less than his department's average, assign the next open case to that person and schedule the next appointment for that case" (well, maybe not that exciting).

CREATE PROCEDURE assign_workload( )

BEGIN /*main*/

DECLARE v_last_row INT DEFAULT 0;

DECLARE v_case_id, v_telesales_id, v_department_id INT;

DECLARE telesales_cur CURSOR FOR

SELECT telesales_id,department_id FROM telesales;

DECLARE CONTINUE HANDLER FOR NOT FOUND SET v_last_row=1;

OPEN telesales_cur;

ts_loop:LOOP

FETCH telesales_cur INTO v_telesales_id, v_department_id;

IF v_last_row THEN LEAVE ts_loop; END IF;

IF analysis_caseload( v_telesales_id)<

analysis_avg_cases( v_department_id) THEN

SET v_case_id=assign_next_open_case( v_telesales_id);

CALL schedule_case( v_case_id);

END IF;

END LOOP;

CLOSE telesales_cur;

SET v_last_row=0;

END$$

Benefits

You can implement complicated functionality with a minimum number of bugs by using step-wise refinement. A developer can understand and maintain a program with confidence if he can read and grasp the logic of the code.

Challenges

You have to be disciplined about holding off writing the low-level implementation of functionality. Instead, come up with accurate, descriptive names for procedures and functions that contain the implementations themselves.

Resources

http://www.construx.com: Contains lots of good advice on writing modular code.

Name

PRG-04: Avoid side-effects in your programs

Synopsis

Build lots of individual programs. Design each program so that it has a single, clearly defined purpose. That purpose should, of course, be expressed in the program's name, as well as in the program header.

Avoid throwing extraneous functionality inside a program. Such statements are called side-effects and can cause lots of problems for people using your code—which means your code won't get used, except perhaps as source for a cut-and-paste session (or—in hardcopy form—for kindling).

Example

Here's a program that by name and "core" functionality displays information about all books published within a certain date range:

CREATE PROCEDURE book_details (

in_start_date DATE,

in_end_date DATE)

BEGIN

DECLARE v_title, v_author VARCHAR(60);

DECLARE v_last_book, v_book_id INT DEFAULT 0;

DECLARE book_cur CURSOR FOR

SELECT book_id,title,author

FROM books

WHERE date_published BETWEEN in_start_date

AND in_end_date;

OPEN book_cur;

book_loop:LOOP

FETCH book_cur INTO v_book_id, v_title, v_author;

IF v_last_book THEN LEAVE book_loop; END IF;

CALL details_show( v_title, v_author);

CALL update_borrow_history ( v_book_id);

END LOOP;

END$$

Notice, however, that it also updates the borrowing history for that book. Now, it might well be that at this point in time the display_book_info procedure is called only when the borrowing history also needs to be updated, justifying to some extent the way this program is written.

However, regardless of current requirements, the name of the program is clearly misleading; there is no way to know that display_book_info also updates borrowing history. This is a hidden side-effect, and one that can cause serious problems over time.

Benefits

Your code can be used with greater confidence, since it does only what it says (via its name, for the most part) it's going to do. Developers will call and combine single-purpose programs as needed to get their jobs done.

Name

PRG-05: Avoid deep nesting of conditionals and loops

Synopsis

Many studies have confirmed that excessive nesting of IF, CASE, or LOOP structures leads to code that is difficult to understand. More than two or three levels of nesting is probably undesirable.

Consider the following logic:

IF v_state='CA' THEN

IF v_quantity > 100 THEN

IF v_customer_status='A' THEN

IF v_product_code='X' THEN

SET v_discount=.04;

ELSEIF v_product_code='Y' THEN

SET v_discount=.04;

ELSE

SET v_discount=.01;

END IF;

ELSE

SET v_discount=0;

END IF;

ELSEIF v_quantity > 50 THEN

SET v_discount=.1;

. . . More logic . . .

END IF;

It's fairly difficult to determine which set of conditions is applied to any particular discount. For instance, consider the highlighted line above—it takes a bit of puzzling to work out which states, quantities, and so on are associated with this discount: and that is with the vast majority of the logic removed. There are a few possible solutions to this deep nesting:

§ Including multiple conditions in each IF or ELSEIF clause: For instance, we might test for a specific combination of state, quantity, and status on the one line.

§ Removing parts of the logic to separate subroutines: For instance, we might create separate subroutines that calculate discounts for each state.

§ Creating a data-driven solution: For instance, in the above example it would probably be preferable to create a table that includes the discount for each combination of values.

Name

PRG-06: Limit functions to a single RETURN statement in the executable section

Synopsis

A good general rule to follow as you write your stored programs is: "one way in and one way out." In other words, there should be just one way to enter or call a program (there is; you don't have any choice in this matter). And there should be one way out, one exit path from a program (or loop) on successful termination. By following this rule, you end up with code that is much easier to trace, debug, and maintain.

For a function, this means you should think of the executable section as a funnel; all the lines of code narrow down to the last executable statement:

RETURN return value;

Example

Here's a simple function that relies on multiple RETURNs:

CREATE FUNCTION status_desc (in_cd CHAR(1))

RETURNS VARCHAR(20)

DETERMINISTIC

BEGIN

IF in_cd = 'C' THEN

RETURN 'CLOSED';

ELSEIF in_cd = 'O' THEN

RETURN 'OPEN';

ELSEIF in_cd = 'I' THEN

RETURN 'INACTIVE';

END IF;

END;

At first glance, this function looks very reasonable. Yet this function has a deep flaw, due to the reliance upon separate RETURNs: if you don't pass in "C", "O", or "I" for the cd_in argument, the function raises:

mysql> SELECT status_desc('A');

ERROR 1321 (2F005): FUNCTION status_desc ended without RETURN

Here's a rewrite that relies upon a single RETURN at the end of the function:

CREATE FUNCTION status_desc (in_cd CHAR(1))

RETURNS VARCHAR(20)

DETERMINISTIC

BEGIN

DECLARE v_status VARCHAR(20) ;

IF in_cd = 'C' THEN

SET v_status='CLOSED';

ELSEIF in_cd = 'O' THEN

SET v_status='OPEN';

ELSEIF in_cd = 'I' THEN

SET v_status='INACTIVE';

END IF;

RETURN v_status;

END$$

This program also safely and correctly returns NULL if the program doesn't receive a value of "C", "O", or "I", unlike the first implementation.

Benefits

You're less likely to write a function that raises the exception ERROR 1321 (2F005): FUNCTION %s ended without RETURN—a nasty and embarrassing error.

A single RETURN function is easier to trace and debug, since you don't have to worry about multiple exit pathways from the function.

Name

PRG-07: Use stored programs to implement code common to multiple triggers

Synopsis

Because you often need to create both an UPDATE and an INSERT trigger to maintain a derived or denormalized column, you might find yourself replicating the same logic in each trigger. For instance, in a previous example we created BEFORE UPDATE and BEFORE INSERT triggers to calculate free shipping and discount rate. If the logic is nontrivial, you should implement the logic in a stored procedure or function and call that routine from your trigger.

Example

Imagine that we are trying to automate the maintenance of a superannuation (18K plan) for our employees. We might create a trigger as follows to automate this processing upon insertion of a new employee row:

CREATE TRIGGER employees_bu

BEFORE UPDATE

ON employees

FOR EACH ROW

BEGIN

DECLARE v_18k_contrib NUMERIC(4,2);

IF NEW.salary <20000 THEN

SET NEW.contrib_18k=0;

ELSEIF NEW.salary <40000 THEN

SET NEW.contrib_18k=NEW.salary*.015;

ELSEIF NEW.salary<55000 THEN

SET NEW.contrib_18k=NEW.salary*.02;

ELSE

SET NEW.contrib_18k=NEW.salary*.025;

END IF;

END$$

But we need to ensure that this column is maintained when we create a new employee row. Instead of performing a copy-and-paste into a BEFORE INSERT trigger, we should locate this logic in a stored function as follows:

CREATE FUNCTION emp18k_contrib(in_salary NUMERIC(10,2))

RETURNS INT

DETERMINISTIC

BEGIN

DECLARE v_contrib NUMERIC(10,2);

IF in_salary <20000 THEN

SET v_contrib=0;

ELSEIF in_salary <40000 THEN

SET v_contrib=in_salary*.015;

ELSEIF in_salary<55000 THEN

SET v_contrib=in_salary*.02;

ELSE

SET v_contrib=in_salary*.025;

END IF;

RETURN( v_contrib);

END;

Now we can use that function in both the INSERT and the UPDATE triggers. If the logic changes, we can modify the logic in one place and can therefore eliminate the risk of any inconsistency between inserted and updated rows.

DROP TRIGGER employees_bu$$

CREATE TRIGGER employees_bu

BEFORE UPDATE

ON employees

FOR EACH ROW

BEGIN

SET NEW.contrib_18k=emp18k_contrib(NEW.salary);

END;

Performance

Most of the best practices outlined so far concentrate on the maintainability and correctness of our stored programs. The following practices concentrate on the performance of stored programs.

Name

PER-01: Concentrate on tuning SQL to improve stored program performance

Synopsis

There are many ways to improve stored program performance, but none of these are likely to have much effect if the SQL within the stored program is inefficient.

Most stored programs contain SQL, and for almost all of those stored programs, the SQL makes up the vast majority of stored program elapsed time. Attempts to tune the stored program by other means (loop tuning, for instance) should only be attempted once the SQL in the stored program has been tuned.

Name

PER-02: Carefully create the best set of indexes for your application

Synopsis

The primary purpose of indexes is to allow MySQL to rapidly retrieve the information you need. Just as the index in this book allows you to find some information without having to read the entire book, an index allows MySQL to get rows from the table without reading the entire table.

Determining the optimal set of indexes for your application is, therefore, probably the single most important step you can take to optimize MySQL stored program performance. In general, you should create indexes that support WHERE clause conditions and joins. You should also create multicolumn (concatenated) indexes, so that a single index can support all of the columns in the WHERE clause or all of the columns required to join two tables.

You should create indexes to support joins, since without an appropriate index, joins will degrade rapidly as the row counts in the involved tables increase.

Name

PER-03: Avoid accidental table scans

Synopsis

One of the most common causes of poor application performance is the "accidental" full table scan. An accidental table scan occurs when the nature of the query, or the expectations of the programmer, suggests that the query will be satisfied using an index, but instead a full table scan is performed.

Accidental table scans can occur under the following circumstances:

§ The index that you believe supports the query does not exist.

§ You have an index that includes the columns in the query, but you don't include the foremost, "leading" columns in your query.

§ You suppress an index by enclosing the column concerned with a function or an expression.

§ You specify a nonleading substring as the search condition. For instance, you try to find all employees whose name ends in "STONE" (WHERE name LIKE '%STONE').

Most accidental table scans can be resolved by creating a new index or rewording the SQL so that the index is not suppressed. See Chapters 20 and 21 for more details.

Name

PER-04: Optimize necessary table scans

Synopsis

Using an index to retrieve rows from a table is worthwhile only when you are retrieving a relatively small subset of rows from the table. Over a certain proportion of the table (say 5-20%), it is more efficient to read every row from the table. However, it is still possible to optimize these "necessary" table scans. For instance:

§ You can move long, infrequently accessed columns to a secondary table.

§ You can create an index on all of the columns required for the query. MySQL can then scan the entire index to resolve the query. Since the index will normally be smaller than the table, it ought to be quicker to scan the index.

These techniques are discussed in detail in Chapter 21.

Name

PER-05: Avoid using stored programs for computationally expensive routines

Synopsis

Like most stored program implementations, MySQL stored programs are optimized for database access, not computational speed. If you have a choice, place your most computationally expensive routines in client or middle-tier code. For instance, you might want to implement your most expensive calculations in PHP or Java rather than in stored programs.

Name

PER-06: Move loop invariant expressions outside of loops

Synopsis

Whenever you set out to tune your stored programs (having completed your SQL optimization), you should first take a look at your loops. Any inefficiency inside a loop's body will be magnified by the multiple executions of that code.

A common mistake is to put execute code within the body of a loop that has the same result with each iteration of the loop. When you identify such a scenario, extract the static code, assign the outcomes of that code to one or more variables in advance of the loop, and then reference those variables inside the loop.

Example

At first glance, this loop block seems sensible enough, but in reality it is quite inefficient:

WHILE (i<=1000) do

SET j=1;

WHILE (j<=1000) do

SET counter=counter+1;

SET sumroot=sumroot+sqrt(i)+sqrt(j);

SET j=j+1;

END WHILE;

SET i=i+1;

END WHILE;

This code contains two loops: we calculate the square root of i inside of the inner loop, even though it only changes for each iteration of the outer loop. Consequently, we calculate the square root 1,000,000 times, even though we have only 1,000 distinct values.

Here's the optimized version of that same code:

WHILE (i<=@i) do

SET rooti=sqrt(i);

SET counter=counter+1;

SET j=1;

WHILE (j<=@j) do

SET sumroot=sumroot+rooti+sqrt(j);

SET j=j+1;

END WHILE;

SET i=i+1;

END WHILE;

A small change, but one that will have a massive effect on performance.

Name

PER-07: Optimize conditional structures

Synopsis

The performance of IF and CASE statements is highly dependent on the number of comparisons that the statement must execute. The number of comparisons can be optimized in two ways:

§ By placing the comparisons that are most frequently true earliest in the set of comparisons, you reduce the number of comparisons that must be executed.

§ If any comparison is repeated in multiple expressions within the CASE or IF statement, you can extract that comparison and "nest" multiple CASE or IF statements. The inner comparisons need only be executed when the outer comparison evaluates to TRUE.

Name

PER-08: Structure IF and CASE statements so more likely expressions appear earliest in the list

Synopsis

When MySQL processes a CASE or an IF statement, it works through every ELSEIF or WHEN condition in the statement until if finds a condition that returns TRUE. If you place the condition that is most likely to evaluate to TRUE at the beginning of your conditional statement, you will improve the overall efficiency of your program.

Your primary concern, however, should be the readability of your IF and CASE statement. Don't worry about reorganizing the clauses of your IF and CASE statements unless you have identified them as a bottleneck in application performance.

Example

In this example the most likely condition is tested last:

IF (percentage>95) THEN

SET Above95=Above95+1;

ELSEIF (percentage >=90) THEN

SET Range90to95=Range90to95+1;

ELSEIF (percentage >=75) THEN

SET Range75to89=Range75to89+1;

ELSE

SET LessThan75=LessThan75+1;

END IF;

To optimize the statement, we can reword it so that in most cases, only one comparison is necessary:

IF (percentage<75) THEN

SET LessThan75=LessThan75+1;

ELSEIF (percentage >=75 AND percentage<90) THEN

SET Range75to89=Range75to89+1;

ELSEIF (percentage >=90 and percentage <=95) THEN

SET Range90to95=Range90to95+1;

ELSE

SET Above95=Above95+1;

END IF;

Conclusion

In this final chapter, we've attempted to enumerate coding practices that will result in efficient, robust, and easily maintainable stored programs. These practices are based on lessons learned in various development environments—including Oracle and SQL Server stored procedures—as well as from our experiences with the MySQL stored program language. We hope that you find these practices worthy of consideration. We do not, however, hope or expect that you will automatically adopt every recommendation. As always, you should exercise your judgment, tempered by your unique understanding of your own specific requirements, before adopting any standard, recommendation, or practice.