Using MySQL Stored Programs in Applications - Using MySQL Stored Programs in Applications - MySQL Stored Procedure Programming (2009)

MySQL Stored Procedure Programming (2009)

Part III. Using MySQL Stored Programs in Applications

Stored programs can be used for a variety of purposes, including the implementation of utility routines for MySQL DBAs and developers. However, the most important use of stored programs is within applications, as we describe in this part of the book. Stored programs allow us to move some of our application code into the database server itself; if we do this wisely, we may benefit from applications that are more secure, efficient, and maintainable. In Chapter 12 we consider the merits of, and best practices for, using stored programs inside modern—typically web-based—applications. In the subsequent chapters, Chapters 13 through 17, we show how to use stored procedures and functions from within the development languages most commonly used in conjunction with MySQL: PHP, Java, Perl, Python, and .NET languages such as C# and VB.NET.

Chapter 12, Using MySQL Stored Programs in Applications

Chapter 13, Using MySQL Stored Programs with PHP

Chapter 14, Using MySQL Stored Programs with Java

Chapter 15, Using MySQL Stored Programs with Perl

Chapter 16, Using MySQL Stored Programs with Python

Chapter 17, Using MySQL Stored Programs with .NET

Chapter 12. Using MySQL Stored Programs in Applications

In the next few chapters we are going to show you how to use stored programs in a variety of external programming environments—PHP , Java , Perl , Python , and .NET . In those chapters we'll describe how to use the MySQL drivers provided with these languages to execute stored programs, retrieve the output of stored programs, and handle any error conditions that may arise during execution. Before we delve into those specific environments, we'll start with a general discussion of using MySQL stored programs in applications .

The purpose of this preliminary chapter is twofold:

§ To present the overall benefits of using stored programs in your applications.

§ To outline the general principles and program flow considerations that apply when using stored programs from any programming environment. Chapters 13 through 17 will describe the details for specific programming environments.

The Pros and Cons of Stored Programs in Modern Applications

There is a persistent—and often lively—debate in the programming community about the benefits and appropriateness of using stored programs in applications.

Database stored programs first came to prominence in the late 1980s and early 1990s during what might be called the client/server revolution. In the client/server environment of that time, stored programs had some obvious advantages (aspects of which persist in N-tier and Internet-based architectures):

§ Client/server applications typically had to carefully balance processing load between the client PC and the (relatively) more powerful server machine. Using stored programs was one way to reduce the load on the client, which might otherwise be overloaded.

§ Network bandwidth was often a serious constraint on client/server applications ; execution of multiple server-side operations in a single stored program could reduce network traffic.

§ Maintaining correct versions of client software in a client/server environment was often problematic. Centralizing at least some of the processing on the server allowed a greater measure of control over core logic.

§ Stored programs offered clear security advantages, because in the client/server paradigm, end users typically connected directly to the database to run the application. By restricting access to stored programs only, users would not be able to perform ad hoc operations against tables and other database structures.

The use of stored programs in client/server applications was, and is, most prevalent in applications that use Microsoft SQL Server (and its technological predecessor, Sybase) and Oracle. The Microsoft SQL Server and Oracle stored program languages (Transact-SQL and PL/SQL, respectively) have substantially different characteristics—especially regarding the ability of a stored program to return a result set. The differences between the two languages have resulted in somewhat different usage patterns:

SQL Server-based applications

For these applications, the dominant pattern is to encapsulate all database interaction between client and server—including queries—into stored programs. This is cited as providing better security and reduced network traffic.

Oracle-based applications

For these applications, it was initially impossible to return a result set from a stored program and, although this became possible in later releases, it was never particularly convenient or easy to do so. As a result, Oracle-based applications tended to use stored programs to implement transaction processing, but would use native SQL to retrieve result sets.

With the emergence of three-tier architectures and web applications, many of the incentives to use stored programs from within applications disappeared. Application clients are now often browser-based; security is predominantly handled by a middle tier; and the middle tier possesses the ability to encapsulate business logic. Most of the functions for which stored programs were used in client/server applications can now be implemented in middle-tier code (e.g., in PHP, Java, C#, etc.). Transferring processing to the middle tier can also enhance load balancing and scalability.

Even so, many of the original advantages of stored programs (such as enhanced security and reduction in network traffic) still apply, if to a reduced degree. The use of stored programs is still regarded as a "best practice" by many application developers and architects.

Today, there are three schools of thought regarding the use of stored programs in applications:

All stored programs, all the time

This segment of the development community continues to believe that stored programs should be used for all interaction between the client (now the middle tier) and the database. They argue that this pattern provides more security to the database, and also provides a level of abstraction between the underlying data model and the business logic in the middle tier.

Stored programs only when absolutely necessary

This segment believes that stored programs should play only a minor role in a modern application development. They argue that stored programs add additional and unnecessary complexity to the application design; that they fragment the logic between the middle tier and the database; and that they get in the way of object-relational mapping schemes such as Java J2EE's CMP and Hibernate.

Use what works

This segment (probably the quiet majority) is fairly pragmatic—they use stored programs selectively when the use of a stored program seems warranted, but they tend to use native SQL when it is easier and more convenient to do so.

It's up to you to decide which model works best for you and your application. In the next few sections we will try to provide you with as much information as we can to help you make an informed decision. To sum up our personal feelings on the matter, we do think that an application that encapsulates all database interaction within stored programs is employing a valid and effective pattern. In particular, this kind of application can be made virtually immune to SQL injection attacks, and will be much less vulnerable to exploits based on compromised passwords. We also believe in separating data access logic from business logic, and the use of stored programs is a good way to do this. However, stored programs are not a natural choice for all applications; for instance, using stored programs exclusively tends to interfere with object-relational mapping schemes such as J2EE CMP and Hibernate.

In the next few sections, we'll look in some detail at the advantages stored programs offer an application and compare those to possible disadvantages. To summarize here, stored programs offer these advantages:

§ Stored programs can improve the security of your database server.

§ Stored programs offer a mechanism to abstract data access routines, hiding your implementation behind a procedural interface and making it easier to evolve your data structures over time.

§ Stored programs can reduce network traffic.

§ Stored programs can be used to implement functionality that is needed—and can be called—from multiple applications, and from multiple places within a single application. This can be handy when applications written in frameworks that don't interoperate very well (.NET and Java for instance) access the same database.

§ Stored programs allow for a convenient division of duties between those whose skills are database-centric and those whose skills are programming-centric.

§ You can often improve the portability of your application code by moving logic into stored programs.

Against these possible advantages, consider the following disadvantages:

§ Stored programs might be slower—especially for computationally expensive operations—than equivalent middle-tier code.

§ The use of stored programs can lead to fragmentation of your application logic—logic may be split between the database and the application server tier, making it difficult to track down design flaws or implementation bugs.

§ Stored programs are usually written in a different language from your application server tier, requiring a wider range of skills in your development team.

§ Stored programs can be more difficult to debug (depending on the implementation: MySQL does not yet offer an integrated stored program debugger).

§ Most object-relational mapping systems (e.g., J2EE CMP and Hibernate) cannot seamlessly exploit stored programs.

§ While stored program calls may sometimes be more portable than native SQL, in practice this is not true for all implementations. Of the "big four," only DB2 and MySQL implement the ANSI standard for stored programs. As a result, MySQL stored program calls often look and act substantially different from calls made in Oracle or SQL Server.

Advantages of Stored Programs

Let's look at each of the advantages of stored programs in turn.

They Enhance Database Security

We'll see in Chapter 18 how the default security mode of stored programs (SQL SECURITY DEFINER) permits a stored program to execute SQL statements even if the calling database account lacks the security privileges to execute these statements as native SQL. By granting a database account access to stored programs only—without granting direct permissions on underlying tables—we can ensure that access to the database occurs only in the manner defined by our stored programs. We can also ensure that these SQL statements are surrounded by whatever business rule validation or logging is required. This concept is explained in more detail in Chapter 18.

In the event that an application account is compromised (for instance, the password is "cracked"), the attacker will still only be able to execute our stored programs, as opposed to being able to run any ad hoc SQL. While such a situation constitutes a severe security breach, at least we are assured that the hacker will be subject to the same checks and logging as a normal application user. The hacker will also be denied the opportunity to retrieve information about the underlying database schema, which will hinder attempts to perform further malicious activities.

The security advantages of stored programs are a powerful motivation to include stored programs in our applications, especially with today's increasing focus on securing the underlying database. However, the security advantages of stored programs can only be realized if stored programs are used exclusively within an application. This is because, to be fully effective, this strategy requires that the database connection account have no direct access to the underlying database tables; hence, this account must perform operations only through stored programs. One alternative to this approach is to grant read-only access to the underlying tables, and then use stored programs exclusively for update operations. At least then, a malicious user will not be able to make arbitrary changes to the data.

Another security advantage inherent in stored programs is their resistance to SQL injection attacks. As we will see in Chapter 18, a SQL injection attack can occur when a malicious user manages to "inject" SQL code into the SQL code being constructed by the application. Stored programs do not offer the only protection against SQL injection attacks, but applications that rely exclusively on stored programs to interact with the database are virtually immune to this type of attack (provided that those stored programs do not themselves build dynamic SQL strings without fully validating their inputs).

They Provide a Mechanism for Data Abstraction

It is generally a good practice to separate your data access code from your business logic and presentation logic. Data access routines are often used by multiple program modules, and are likely to be maintained by a separate group of developers. A very common scenario requires changes to the underlying data structures, while minimizing the impact on higher-level logic. Data abstraction makes this much easier to accomplish.

The use of stored programs provides a convenient way of implementing a data access layer. By creating a set of stored programs that implement, all of the data access routines required by the application, we are effectively building an API for the application to use for all database interactions.

They Reduce Network Traffic

Stored programs can radically improve application performance by reducing network traffic in certain situations. Several such situations are described in this section.

One scenario involves an application that may need to accept input from the end user, read some data in the database, decide what statement to execute next, retrieve a result, make a decision, execute some SQL, and so on. If the application code is written entirely outside of the database, each of these steps would require a network round trip between the database and the application. The time taken to perform these network trips can easily dominate overall user response time.

Consider a typical interaction between a bank customer and an ATM machine. The user requests a transfer of funds between two accounts. The application must retrieve the balance of each account from the database, check withdrawal limits and possibly other policy information, issue the relevant UPDATE statements, and finally issue a COMMIT—all before advising the customer that the transaction has succeeded. Even for this relatively simple interaction, at least six separate database queries must be issued, each with its own network round trip between the application server and the database. Figure 12-1 shows the sequence of interactions that would be required without a stored program.

Network round trips without a stored program

Figure 12-1. Network round trips without a stored program

On the other hand, if a stored program is used to implement the funds transfer logic, only a single database interaction is required. The stored program takes responsibility for checking balances, withdrawal limits, and so on. Figure 12-2 illustrates the reduction in network round trips that occurs as a result.

Network round trips involving a stored program

Figure 12-2. Network round trips involving a stored program

Network round trips can also become significant when an application is required to perform some kind of aggregate processing on very large record sets in the database. If the application needs to (for instance) retrieve millions of rows in order to calculate some sort of business metric that cannot easily be computed using native SQL (average time to complete an order, for instance), then a very large number of round trips can result. In such a case, the network delay may again become the dominant factor in application response time. Performing the calculations in a stored program will reduce network overhead, which might reduce overall response time—but make sure you take into account the considerations outlined in the section "They Can Be Computationally Inferior" later in this chapter. We provide an example of a stored program reducing network traffic in Chapter 22.

They Allow for Common Routines Across Multiple Application Types

While it is commonplace for a MySQL database to be at the service of a single application, it is not at all uncommon for multiple applications to share a single database. These applications may run on different machines and be written in different languages; it may be hard—or impossible—for these applications to share code. Implementing common code in stored programs may allow these applications to share critical common routines.

For instance, in Chapter 8 we created a procedure called txfer_funds that performed a transactional-safe, logged transfer of funds between two accounts. Some versions of the stored procedure contained code for handling deadlocks and an optimistic locking strategy. Now, in a banking application, a transfer of funds transactions might originate from multiple sources, including a bank teller's console, an Internet browser, an ATM, or a phone banking application. Each of these applications could conceivably have its own database access code written in largely incompatible languages, and, without stored programs, we might have to replicate the transaction logic—including logging, deadlock handling, and optimistic locking strategies—in multiple places in multiple languages.

They Facilitate Division of Duties

It is reasonably commonplace for the responsibility for coding application logic to be held by one set of developers and the responsibility for database design and access routines to be held by a different set of developers. These two groups may have different skill sets, and application development efficiency may be enhanced if the database developers are able to implement the data access routines directly in MySQL using the stored program language.

They May Provide Portability

While all relational databases implement a common set of SQL syntax—typically SQL99 entry-level or similar—each RDBMS offers proprietary extensions to this standard SQL. If you are attempting to write an application that is designed to be independent of the underlying RDBMS vendor, or if you want to avoid RDBMS vendor lock-in, you will probably want to avoid these extensions in your application. However, using these extensions is highly desirable if you want to optimize your use of the underlying database. For instance, in MySQL, you will often want to employ MySQL hints, execute non-ANSI statements such as LOCK TABLES, or use the REPLACE statement.

Using stored programs can help you avoid RDBMS-dependent code in your application layer while allowing you to continue to take advantage of RDBMS-specific optimizations. In theory—but only sometimes in practice—stored program calls against different databases can be made to look and behave identically from the application's perspective. Of course, the underlying stored program code will need to be rewritten for each RDBMS, but at least your application code will be relatively portable.

Unfortunately, not all RDBMSs implement stored programs in a consistent manner. This limits the portability that stored programs can offer. We discuss this in more detail in the section "They Do Not Provide Portability" later in this chapter.

Disadvantages of Stored Programs

So far, we've seen that stored programs can offer some significant advantages. Now let's look at the downside of using stored programs.

They Can Be Computationally Inferior

In Chapter 22 we compare the performance of MySQL stored programs and other languages when performing computationally intensive routines. Our conclusion is that stored programs, in general, and MySQL stored programs, in particular, are slower than languages such as PHP, Java, and Perl when executing "number crunching" algorithms, complex string manipulation, and the like.

Most of the time, stored programs are dominated by database access time—where stored programs have a natural performance advantage over other programming languages because of their lower network overhead. However, if you are writing a number-crunching routine—and you have a choice between implementing it in the stored program language or in another language such as Java—you may wisely decide against using the stored program solution.

They Can Lead to Logic Fragmentation

While it is generally useful to encapsulate data access logic inside stored programs, it is usually inadvisable to "fragment" business and application logic by implementing some of it in stored programs and the rest of it in the middle tier or the application client.

Debugging application errors that involve interactions between stored program code and other application code may be many times more difficult than debugging code that is completely encapsulated in the application layer. For instance, there is currently no debugger that can trace program flow from the application code into the MySQL stored program code.

They Do Not Provide Portability

We said earlier that stored programs could be used to build RDBMS-independent applications by encapsulating RDBMS-dependent SQL in stored program calls. Unfortunately, this is only possible for RDBMS types that support similar semantics for processing parameters and returning result sets.

The stored programs implemented by MySQL, DB2, and Microsoft SQL Server all behave in a very similar way—all can return multiple result sets, and for most languages, the calls for accessing these result sets are compatible.

Unfortunately, Oracle is an exception in this regard; Oracle stored programs can return result sets, but they are returned as references in output parameters, rather than as result sets in their own right. In order to retrieve these result sets, you have to write application code that is highly Oracle specific.

So while applications that use only stored programs are reasonably portable between MySQL and either DB2 or SQL Server, if portability between MySQL and Oracle is your objective, you are probably better advised to use ANSI-standard SQL calls, rather than stored program calls, at least when implementing calls that will return result sets.

Calling Stored Programs from Application Code

Most languages used to build applications that interact with MySQL are able to fully exploit stored programs , although in some languages, support for advanced features such as multiple result sets is a recent addition. In the following chapters we will explain in detail how to use stored programs from within PHP, Java, Perl, Python, and the .NET languages VB.NET and C#. In this section we want to give you an introduction to the general process of calling a stored program from an external programming language.

In general, the techniques for using stored programs differ from those for standard SQL statements in two significant respects:

§ While SQL statement calls may take parameters , stored programs can also have OUT or INOUT parameters. This means that you need to understand how to access the value of an OUT or INOUT parameter once the stored program execution completes.

§ A SELECT statement can return only one result set, while a stored program can return any number of result sets, and you might not be able to anticipate the number or structure of these result sets.

So, calling a stored program requires a slightly different program flow from standard SQL processing. The overall sequence of events is shown in the UML "retro" diagram (e.g., flowchart) in Figure 12-3.

Here's a brief description of each of these steps. Remember that in the next five chapters, we will be showing you how to follow these steps in various languages.

Preparing a Stored Program Call for Execution

We'll normally want to call a stored program more than once in our application. Typically, we first create a statement handle for the stored program. We then iteratively execute the program, perhaps providing different values for the program's parameters with each execution.

It's usually possible to bypass the preparation stage and execute a stored program directly—at least if the stored program returns no result sets. However, if the stored program takes parameters and you execute the stored program more than once in your program, we recommend that you go to the extra effort of preparing the statement that includes your stored program call.

Registering Parameters

We can pass parameters into stored programs that require them as literals (e.g., concatenate the text of the parameter values into the stored program CALL statement).

General processing flow when calling a stored program from an external language

Figure 12-3. General processing flow when calling a stored program from an external language

However, in all of the languages we discuss in subsequent chapters, there are specific parameter-handling methods that allow us to re-execute a stored program with new parameters without having to re-prepare the stored program call. As we said previously, it's best to use these explicit methods if you are going to execute the stored program more than once—both because it is slightly more efficient and because, in some cases, only the prepared statement methods offer full support for bidirectional parameters and multiple result sets.

The methods for passing parameters to stored programs are usually the same as the methods used to pass parameters (or "bind variables") to normal SQL statements.

Setting Output Parameters

Some languages allow us to specifically define and process output parameters . In other languages, we can only access the values of OUT or INOUT parameters by employing "user variables" (variables prefixed with @) to set and retrieve the parameter values.

Both techniques—the direct API calls provided by .NET and JDBC and the session variable solution required by other languages—are documented in the relevant language-specific chapters that follow.

Executing the Stored Program

Once the input parameters are set and—in the case of .NET and Java—once the output parameters are registered, we can execute the stored program. The method for executing a stored program is usually the same as the method for executing a standard SQL statement.

If the stored program returns no result sets , output parameters can immediately be accessed. If the stored program returns one or more result sets, all of those result sets must be processed before the output parameter values can be retrieved.

Retrieving Result Sets

The process of retrieving a single result set from a stored program is identical to the process of retrieving a result set from other SQL statements—such as SELECT or SHOW—that return result sets.

However, unlike SELECT and SHOW statements, a stored program may return multiple result sets, and this requires a different flow of control in our application. To correctly process all of the result sets that may be returned from a stored program, the programming language API must include a method to switch to the "next" result set and possibly a separate method for determining if there are any more result sets to return.

JDBC and ADO.NET languages have included these methods since their earliest incarnations (for use with SQL Server and other RDBMSs that support multiple result sets), and these interfaces have been fully implemented for use with MySQL stored programs . Methods exist to retrieve multiple result sets in PHP, Perl, and Python, but these methods are relatively immature—in some cases, they were implemented only in response to the need to support stored programs in MySQL 5.0.

Retrieving Output Parameters

Once all result sets have been retrieved, we are able to retrieve any stored program output parameters. Not all languages provide methods for directly retrieving the values of output parameters—see the "Setting Output Parameters" section earlier for a description of a language-independent method of retrieving output parameters indirectly through user variables.

JDBC and ADO.NET provide specific calls that allow you to directly retrieve the value of an output parameter.

Closing or Re-Executing the Stored Program

Now that we have retrieved the output parameters, the current stored program execution is complete. If we are sure that we are not going to re-execute the stored program, we should close it using language-specific methods to release all resources associated with the stored program execution. This usually means closing the prepared statement object associated with the stored program call. If we want to re-execute the stored program, we can modify the input parameters and use the language-specific execute method to run the stored program as many times as needed. Then you should close the prepared statement and release resources.

Calling Stored Functions

In some languages—JDBC and .NET, in particular—stored functions can be invoked directly, and you have language-specific techniques for obtaining the stored function return value. However, in other languages, you would normally need to embed the stored function in a statement that supports an appropriate expression such as a single-line SELECT statement.

Conclusion

There is no "one-size-fits-all" answer to the question "Should I use stored programs in my application?" There are those who believe that virtually all of an application's database interactions should be made through stored program calls, and those who believe that stored programs should be used only in very special circumstances. You will need to make your own determination as to the value of using MySQL stored programs in your application.

As we've discussed in this chapter, the use of stored programs can provide significant advantages:

§ Stored programs can substantially improve the security of your application.

§ Stored programs can be used to provide an abstract data access layer that can improve the separation between business logic and data access logic (of course, stored programs are not required to do this—they are just one means to do so).

§ Stored programs can reduce network traffic.

§ Stored programs can be used to implement common routines accessible from multiple applications.

§ Stored programs allow for a convenient division of duties between those whose skills are database-centric and those whose skills are programming-centric.

§ The use of stored programs can (sometimes) improve application portability.

But you also need to consider the potential disadvantages of using stored programs:

§ Stored programs are often slower—especially for computationally expensive operations—than equivalent middle-tier code.

§ The use of stored programs can lead to fragmentation of your application logic—logic may be split between the database and application server tier, making it difficult to track down design flaws or implementation bugs.

§ The use of stored programs usually results in your application's leveraging more than one programming language, requiring additional skills in your development team.

§ Most object-relational mapping systems (e.g., J2EE CMP and Hibernate) do not know how to work with stored programs.

§ Although stored program calls may sometimes be more portable than native SQL, in practice this is not true for all implementations. In particular, Oracle stored program calls often look and act substantially different from calls made in MySQL, DB2, or Microsoft SQL Server.

In this chapter we briefly reviewed the general programming logic involved in calling stored programs from external programming languages. In subsequent chapters we will explain the detailed techniques for handling stored program calls in PHP, Java, Perl, Python, C#, and VB.NET.