The Basic Building Block: The Subroutine - Tutorial - Programming the 65816 Including the 6502, 65C02, and 65802 (1986)

Programming the 65816 Including the 6502, 65C02, and 65802 (1986)

Part III. Tutorial

12. The Basic Building Block: The Subroutine

The feature essential to any processor to support efficient, compact code, as well as modular or top-down programming methods, is a means of defining a subroutine. A subroutine is a block of code that can be entered (called) repeatedly from various parts of a main program, and that can automatically return control to the instruction following the calling instruction, wherever it may be. The 65x jump-to-subroutine instruction provides just such a capability.

When a jump-to-subroutine, or JSR, instruction is encountered, the processor first pushes its current location onto the stack for purposes of returning, then jumps to the beginning of the subroutine code. At the end of the subroutine code, a return-from-subroutine (RTS) instruction tells the processor to return from the subroutine to the instruction after the subroutine call, which it locates by pulling the previously saved return location from the stack.

Because subroutines let you write a recurring section of program code just once and call it from each place that it's needed, they are the basis of top-down, structured programming. Common subroutines are often collected together by programmers to form a library, from which they can be selected and reused as needed.

Chapter 8, Flow of Control, introduced the 65x jump instructions—those flow-of-control instructions which do not use the stack for return purposes. But discussion of the jump-to-subroutine instructions was put off to this chapter.

Table 12.1 lists the instructions to be explained in this chapter. In addition, this chapter will use the simple example of a negation routine to illustrate how library routines (and routines in general) are written and documented, and it examines the question of when to code a subroutine and when to use in-line code. Finally, methods of passing information (or parameters) to and from subroutines are compared and illustrated.

Table 12.1. Subroutine Instructions.

The Jump-to-Subroutine Instruction

There is just one addressing mode available to the JSR instruction on the 6502 and 65C02—absolute addressing. This mode lets you code a subroutine call to a known location. When used on the 65816, that location must be within the current program bank. It uses the absolute addressing syntax introduced earlier:

In the second case, the assembler determines the address of subroutine SUBR1.

The processor, upon encountering a jump-to-subroutine instruction, first saves a return address. The address saved is the address of the last byte of the JSR instruction (the address of the last byte of the operand), not the address of the next instruction as is the case with some other processors. The address is pushed onto the stack in standard 65x order—the low byte in the lower address, the high byte in the higher address—and done in standard 65x fashion—the first byte is stored at the location pointed to by the stack pointer, the stack pointer is decremented, the second byte is stored, and the stack pointer is decremented again. Once the return address has been saved onto the stack, the processor loads the program counter with the operand value, thus jumping to the operand location, as shown in Figure 12.1. Jumping to a subroutine has no effect on the status register flags.

The Return-from-Subroutine Instruction

At the end of each subroutine you write, the one-byte RTS, or return-from-subroutine, instruction must be coded. When the return-from-subroutine instruction is executed, the processor pulls the stored address from the stack, incrementing the stack pointer by one before retrieving each of the two bytes to which it points. But the return address that was stored on the stack was the address of the third byte of the JSR instruction. When the processor pulls the return address off the stack, it automatically increments the address by one so that it points to the instruction following the JSR instruction which should be executed when the subroutine is done. The processor loads this incremented return address into the program counter and continues execution from the instruction following the original JSR instruction, as Figure 12.2 shows.

Figure 12.1. JSR.

The processor assumes that the two bytes at the top of the stack are a return address stored by a JSR instruction and that these bytes got there as the result of a previous JSR. But as a result, if the subroutine used the stack and left it pointing to data other than the return address, the RTS instruction will pull two irrelevant data bytes as the address to return to. Cleaning up the stack after using it within a subroutine is therefore imperative.

The useful side of the processor's inability to discern whether the address at the top of the stack was pushed there by a JSR instruction is that you can write a reentrant indirect jump using the RTS instruction. First formulate the address to be jumped to, then decrement it by one (or better, start with an already-decremented address), push it onto the stack (pushing first high byte, then low byte, so that it is in correct 65x order on the stack) and, finally, code an RTS instruction. The return-from-subroutine pulls the address back off the stack, increments it, and loads the result into the program counter to cause a jump to the location, as Fragment 12.1 illustrates.

Fragment 12.1.

Reentrancy is the ability of a section of code to be interrupted, then executed by the interrupting routine, and still execute properly both for the interrupting routine and for the original routine when control is returned to it. The interruption may be a result of a hardware interrupt (as described in the next chapter), or the result of the routine calling itself, in which case the routine is said to be recursive. The keys to reentrancy are, first, to be sure you save all important registers before reentering and, second, to use no fixed memory locations in the reentrant code. (There will be more on interrupts and reentrancy in the next chapters.)

Figure 12.2. RTS.

The indirect jump using RTS qualifies for reentrancy: While normally you would code an indirect jump by forming the address to jump to and storing it to an absolute address, then jumping indirect through the address, this jump by use of RTS uses only registers and stack.

A subroutine can have more than one RTS instruction. It's common for subroutines to return from internal loops upon certain error conditions, in addition to returning normally from one or more locations. Some structured programming purists would object to this practice, but the efficiency of having multiple exit points is unquestionable.

Returning from a subroutine does not affect the status flags.

JSR Using Absolute Indexed Indirect Addressing

The 65802/65816 gives JSR another addressing mode—absolute indexed indirect (covered in the last chapter) which lets your program select, on the basis of the index in the X register, a subroutine location from a table of such locations and call it:

The array TABLE must be located in the program bank. The addressing mode assumes that a table of locations of routines would be part of the program itself and would be loaded, right along with the routines, into the bank holding the program. The indirect address (the address with which the program counter will be loaded), a sixteen-bit value, is concatenated with the program bank register, resulting in a transfer within the current program bank. If the addition of X causes a result greater than $FFFF, the effective address will wrap, remaining in the current program bank, unlike the indexing across banks that occurs for data accesses.

This addressing mode also lets you do an indirect jump-to-subroutine through a single double-byte cell by first loading the X register with zero. You must remember in coding this use for the 65816, however, that the cell holding the indirect address is in the program bank, notbank zero as with absolute indirect jumps.

The indexed indirect jump-to-subroutine is executed in virtually the same manner as the absolute jump-to-subroutine: the processor pushes the address of the final byte of the instruction onto the stack as a return address; then the address in the double-byte cell pointed to by the sum of the operand and the X index register is loaded into the program counter.

There is no difference between returning from a subroutine called by this instruction and returning from a subroutine called by an absolute JSR. You code an RTS instruction which, when executed, causes the address on the top of the stack to be pulled and incremented to point to the instruction following the JSR, then to be loaded into the program counter to give control to that instruction.

The Long Jump to Subroutine

A third jump-to-subroutine addressing mode is provided for programming in the 16-megabyte address space of the 65816—absolute long addressing. Jump-to-subroutine absolute long is a four-byte instruction, the operand a 24-bit address in standard 65x order (the low byte of the 24-bit address is in the lowest memory location immediately following the opcode and the high byte is next, followed by the bank byte):

This time a three-byte (long) return address is pushed onto the stack. Again it is not the address of the next instruction but rather the address of the last byte of the JSR instruction which is pushed onto the stack (the address of the fourth byte of the JSR instruction in this case). As Figure 12.3 shows, the address is pushed onto the stack in standard 65x order: low byte in the lower address, high byte in the higher address, bank byte in the highest address (which also means the bank byte is the first of the three pushed, the low byte last).

Jumping long to a bank zero subroutine requires the greater-than (>) sign, as explained in the last chapter:

The greater-than sign forces long addressing to bank zero, voiding the assembler's normal assumption to use absolute addressing to jump to a subroutine at $3456 in the current program bank.

To avoid this confusion altogether, there is an equivalent standard mnemonic for jump-to-subroutine long—JSL:

Using an alternate mnemonic is particularly appropriate for jump-to-subroutine long, since this instruction requires you to use an entirely different return-from-subroutine instruction—RTL, or return-from-subroutine long.

Figure 12.3. JSL.

Return from Subroutine Long

The return from subroutine instruction pops two bytes off the stack as an absolute address, increments it, and jumps there. But the jump to subroutine long instruction pushes a three-byte address onto the stack—a long return address that points to the original code, and is typically in a bank different from the subroutine bank.

So the 65816 provides a return from subroutine long instruction, RTL. This return instruction first pulls, increments, and loads the program counter, just as RTS does; then it pulls and loads a third byte, the program bank register, to jump long to the return address. This is illustrated in Figure 12.4.

Branch to Subroutine

One of the glaring deficiencies of the 6502 was its lack of support for writing relocatable code; the 65802 and 65816 address this deficiency, but still lack the branch-to-subroutine instruction some other processors provide. There is no instruction that lets you call a subroutine with an operand that is program counter relative, not an absolute address. Yet, to write relocatable code easily, a BSR instruction is required: suppose a relocatable program assembled at $0 has an often-called multiply subroutine at $07FE; if the program is later loaded at $7000, that subroutine is at $77FE; obviously, a JSR to $07FE will fail.

The 65802 and 65816 can synthesize the BSR function using their PER instruction. You use PER to compute and push the current run-time return address; since its operand is the return address' relative offset (from the current address of the PER instruction), PER provides relocatability. As Fragment 12.2 shows, once the correct return address is on the stack, a BRA or BRL completes the synthesized BSR operation.

Figure 12.4. RTL.

Fragment 12.2.

In this case, you specify as the assembler operand the symbolic location of the routine you want to return to minus one. Remember that the return address on the stack is pulled, then incremented, before control is passed to it. The assembler transforms the source code operand,RETURN-1, into the instruction's object code operand, a relative displacement from the next instruction to RETURN-1. In this case, the displacement is $0002, the difference between the first byte of the BRL instruction and its last byte. (Remember, PER works the same as the BRLinstruction; in both cases, the assembler turns the location you specify into a relative displacement from the program counter.) When the instruction is executed, the processor adds the displacement ($0002, in this case) to the current program counter address (the address of the BRLinstruction); the resulting sum is the current absolute address of RETURN-1, which is what is pushed onto the stack.

If at run-time the PER instruction is at $1000, then the BRL instruction will be at $1003, and RETURN at $1006. Execution of PER pushes $1005 onto the stack, and the program branches to SUBR1. The RTS at the end of the subroutine causes the $1005 to be pulled from the stack, incremented to $1006 (the address of RETURN), and loaded into the program counter.

If, on the other hand, the instructions are at $2000, $2003, and $2006, then $2005 is pushed onto the stack by execution of PER, then pulled off again when RTS is encountered, incremented to $2006 (the current runtime address of RETURN), and loaded into the program counter.

If a macro assembler is available, synthetic instructions such as this are best dealt with by burying this code in a single macro call.

Coding a Subroutine: How and When

The uses of subroutines are many. At the simplest level, they let you compact in a single location instructions that would otherwise be repeated if coded in-line. Programmers often build up libraries of general subroutines from which they can pluck the routine they want for use in a particular program; even if the routine is only called once, this allows quick coding of commonly used functions.

The next few pages will look at a simple logic function for the 65x processors—forming the negation (two's complement) of eight- and sixteen-bit numbers—and how such a routine is written. Also covered is how subroutines in general (and library routines in particular) should be documented.

The 65x processors have no negate instruction, so the two's complement is formed by complementing the number (one's complement) and adding one.

6502 Eight-Bit Negation—A Library Example

If the value to be negated is an eight-bit value, the routine in Listing 12.1 will yield the desired result.

Listing 12.1.

It is extremely important to clearly document library routines. Perhaps the best approach is to begin with a block comment at the head of the routine, describing its name, what the routine does, what it expects as input, what direct page locations it uses during execution, if the contents of any registers or any memory special locations are modified during execution, and how and where results are returned.

By documenting the entry and exit conditions as part of the header, as in the example, when the routine is used from a library you won't have to reread the code to get this information. Although this example is quite simple, when applied to larger, more complex subroutines, the principle is the same: Document the entry and exit conditions, the function performed, and any side effects.

As a subroutine, this code to negate the accumulator takes six bytes. Each JSR instruction takes three. So calling it twice from a single program requires 12 bytes of code; if called three times, 15 bytes; if four, 18 bytes.

On the other hand, if this code were in-line once, it would take only five bytes, but each additional time it is needed would require another five bytes, so using it twice takes 10 bytes, three times takes 15, and four times takes 20. You can see that only if you need to negate the accumulator four or more times does calling this code as a subroutine make sense in view of object byte economy.

65C02, 65802, and 65816 Eight-Bit Negation

The addition of the accumulator addressing mode for the INC increment instruction on the 65C02, 65802, and 65816 means no subroutine is required for negating an eight-bit value in the accumulator on these processors: the in-line code in Fragment 12.3 takes only three bytes.

Fragment 12.3.

Since the in-line code takes the same number of bytes as the JSR instruction, you would lose four bytes (the number in the subroutine itself) by calling it as a subroutine.

6502 Sixteen-Bit Negation

Negating sixteen-bit values makes even more sense as a subroutine on the 6502. One method, given the previously-coded routine NEGACC, is shown in Listing 12.2.

Listing 12.2.

Here, one subroutine (NEGXA) calls another (the subroutine described previously that negates eight bits).

65802 and 65816 Sixteen-Bit Negation

Fragment 12.4 shows that on the 65802 and 65816, the sixteen-bit accumulator can be negated in-line in only four bytes. As a result, a subroutine to negate the sixteen-bit accumulator would be inefficient, requiring five calls to catch up with the one-byte difference; in addition, you should note that there is a speed penalty associated with calling a subroutine—the time required to execute the JSR and RTS instructions.

Fragment 12.4.

Parameter Passing

When dealing with subroutines, which by definition are generalized pieces of code used over and over again, the question of how to give the subroutine the information needed to perform its function must be considered. Values passed to or from subroutines are referred to as theparameters of the subroutine. Parameters can include values to be acted upon, such as two numbers to be multiplied, or may be information that defines the context or range of activity of the subroutine. For example, a subroutine parameter could be the address of a region of memory to work on or in, rather than the actual data itself.

The preceding examples demonstrated one of the simplest methods of parameter-passing, by using the registers. Since many of the operations that are coded as subroutines in assembly language are primitives that operate on a single element, like "print a character on the output device" or "convert this character from binary to hexadecimal," passing parameters in registers is probably the approach most commonly found.

A natural extension of this approach, which is particularly appropriate for the 65802 and 65816, but also possible on the 6502 and 65C02, is to pass the address of a parameter list in a register (or, on the 6502 and 65C02, in two registers). Listing 12.3 gives an example.

Listing 12.3.

By loading the X register with the address of a string constant, the subroutine PRSTRNG has all the information it needs to print the string at that address each time it is called. The data at the address passed in a register could also be a more complex data structure than a string constant.

On the 6502 and 65C02, a sixteen-bit address has to be passed in two registers. Because of this, parameters are often passed in fixed memory locations. Typically, these might be direct page addresses. Listing 12.4 gives an example of this method.

Listing 12.4.

Unfortunately, it takes eight bytes to set up PARAM each time PRSTRNG is called. As a result, a frequently used method of passing parameters to a subroutine is to code the data in-line, immediately following the subroutine call. This technique (see Fragment 12.5) uses no registers and no data memory, only program memory.

Fragment 12.5.

This method looks, at first glance, bizarre. Normally, when a subroutine returns to the calling section of code, the instruction immediately following the JSR is executed. Obviously, in this example, the data stored at that location is not executable code, but string data. Execution should resume instead at the label RETURN, which is exactly what happens using the PRSTRNG coded in Listing 12.5. The return address pushed onto the stack by the JSR is not a return address at all; it is, rather, the parameter to PRSTRNG.

Listing 12.5.

The parameter address on the stack need only be pulled and incremented once, and the data can then be accessed in the same manner as in the foregoing example. Since the loop terminates when the zero end-of-string marker is reached, pushing its address in the X register onto the stack gives RTS a correct return value—RETURN-1—the byte before the location where execution should resume. Note that the data bank is assumed to equal the program bank.

The advantage of this method is in bytes used: there is no need for any explicit parameter-passing by the calling code, and the JSR mechanism makes the required information available to the subroutine automatically. In fact, for most applications on all four 65x microprocessors, this method uses fewer bytes for passing a single parameter than any other.

One slight disadvantage of this method is that if the string is to be output more than once, it and its preceding JSR must be made into a subroutine that is called to output the string.

A second disadvantage to this method comes in calling routines to which more than one parameter must be passed. This last example demonstrated how a parameter (the address of the string) can be implicitly passed on the stack. But there is no way to extend the principle so two parameters could be implicitly passed, for instance, to a routine that compares two strings. On the other hand, parameters can also be explicitly passed on the stack. The push effective address instructions and stack-relative addressing modes make this all the easier, as Fragment 12.6 andListing 12.6 show.

Fragment 12.6.

Listing 12.6.

This example, which compares two strings to see if they are equal up to the length of the shorter of the two strings, uses parameters that have been explicitly passed on the stack. This approach, since it explicitly passes the addresses of the strings, lets them be located anywhere and referred to any number of times. Its problem is that when the subroutine returns, the parameters are left on the stack. Clearly, the subroutine should clean up the stack before returning; however, it can't simply pull the parameters off, because the return address is sitting on top of the stack (which explains why stack offsets of three and five, rather than one and three, are used).

Perhaps the cleanest way to pass parameters on the stack prior to a subroutine call is to decompose the JSR instruction into two: one to push the return address, the other to transfer to the subroutine. The push effective address instructions again come in handy. Fragment 12.7shows how the parameters to the routine in Listing 12.7 are passed.

Fragment 12.7.

Listing 12.7.

Since the return address was pushed first, the parameter addresses on the stack are accessed via offsets of one and three. Before returning, two pull instructions pop the parameters off the stack, then the RTS is executed, which returns control to the main program with the stack in order.

Passing parameters on the stack is particularly well-suited for both recursive routines (routines that call themselves) and reentrant routines (routines that can be interrupted and used successfully both by the interrupting code and the original call) because new memory is automatically allocated for parameters for each invocation of the subroutine. This is the method generally used by most high-level languages that support recursion.

Fragment 12.8 sets up multiple parameters implicitly passed on the stack by coding after the JSR, not data, but pointers to data. The routine called is in Listing 12.8.

Fragment 12.8.

While this subroutine, unlike the previous one, uses a dozen bytes just getting itself ready to start, each call requires only seven bytes (three for the JSR, and two each for the parameters), while each call to the previous routine required twelve bytes (three PERs at three bytes each plus three for the JMP).

Apple Computer's ProDOS operating system takes this method a step further: all operating system routines are called via a JSR to a single ProDOS entry point. One of the parameters that follows the JSR specifies the routine to be called, the second parameter specifies the address of the routine's parameter block. This method allows the entry points of the internal ProDOS routines to “float" from one version of ProDOS to the next; user programs don't need to know where any given routine is located.

Listing 12.8.