First Examples: Moving Data - Tutorial - Programming the 65816 Including the 6502, 65C02, and 65802 (1986)

Programming the 65816 Including the 6502, 65C02, and 65802 (1986)

Part III. Tutorial

6. First Examples: Moving Data

Most people associate what a computer does with arithmetic calculations and computations. That is only part of the story. A great deal of compute time in any application is devoted to simply moving data around the system: from here to there in memory, from memory into the processor to perform some operation, and from the processor to memory to store a result or to temporarily save an intermediate value. Data movement is one of the easiest computer operations to grasp and is ideal for learning the various addressing modes (there are more addressing modes available to the data movement operations than to any other class of instructions). It, therefore, presents a natural point of entry for learning to program the 65x instruction set.

On the 65x series of processors—the eight-bit 6502 and 65C02 and their sixteen-bit successors, the 65802 and 65816—you move data almost entirely using the microprocessor registers.

This chapter discusses how to load the registers with data and store data from the registers to memory (using one of the simple addressing modes as an example), how to transfer and exchange data between registers, how to move information onto and off of the stack, and how to move blocks (or strings) of data from one memory location to another (see Table 6-1).

Table 6.1. Data Movement Instructions.

When programming the 6502, whether you're storing a constant value to memory or moving data from one memory location to another, one of the registers is always intermediate. The same is generally true for the other 65x processors, with a few exceptions: the 65816's two block move instructions, three of its push instructions, and an instruction first introduced on the 65C02 to store zero to memory.

As a result, two instructions are required for most data movement: one to load a register either with a constant value from program memory or with a variable value from data memory; the second to store the value to a new memory location.

Most data is moved via the accumulator. This is true for several reasons. First, the accumulator can access memory using more addressing modes than any of the other registers. Second, with a few exceptions, it's only in the accumulator that you can arithmetically or logically operate on data (although the index registers, in keeping with their role as loop counters and array pointers, can be incremented, decremented, and compared). Third, data movement often takes place inside of loops, program structures in which the index registers are often dedicated to serving as counters and pointers.

Loading and Storing Registers

To provide examples of the six basic data-movement instructions—LDA, LDX, LDY (load accumulator or index registers) and STA, STX, and STY (store accumulator or index registers)—requires introducing at least one of the 65x addressing modes. Except for certain instructions—such as push and pull, which use forms of stack addressing—the absolute addressing mode will generally be used in this chapter. Absolute addressing, available on all four 65x processors, is one of the simplest modes to understand. It accesses data at a known, fixed memory location.

For example, to move a byte from one absolute memory location to another, load a register from the first location, then store that register to the other location. In Listing 6.1, the eight-bit value $77 stored at the absolute location identified by the label SOURCE is first loaded into the accumulator, then saved to the absolute location labeled DEST. Note the inclusion of the mode-switching code described in the previous chapter.

The code generated by the assembler, when linked, will begin at the default origin location, $2000. The example generates 13 ($0D) bytes of actual code (the address of the RTS instruction is at memory location $200C). The assembler then automatically assigns the next available memory location, $200D, to the label on the following line, SOURCE. This line contains a DC (define constant) assembler directive, which causes the hexadecimal value $77 to be stored at that location in the code file ($200D). Since only one byte of storage is used, the data storage location reserved for the label DEST on the next line is $200E.

The syntax for absolute addressing lets you code, as an instruction's operand, either a symbolic label or an actual value. The assembler converts a symbolic operand to its correct absolute value, determines from its context that absolute addressing is intended, and generates the correct opcode for the instruction using absolute addressing. The assembler-generated hexadecimal object code listed to the left of the source code shows that the assembler filled in addresses $000D and $000E as the operands for the LDA and STA instructions, respectively (they are, of course, in the 65x's standard low-high order and relative to the $0000 start address the assembler assigns to its relocatable modules; the linker will modify these addresses to $200D and $200E when creating the final loadable object).

As Chapter 4 explained, the 65816's accumulator can be toggled to deal with either eight-bit or sixteen-bit quantities, as can its index registers, by setting or resetting the m (memory/accumulator select) or x (index register select) flag bits of the status register. You don't need to execute a SEP or REP instruction before every instruction or every memory move, provided you know the register you intend to use is already set correctly. But always be careful to avoid making invalid assumptions about the modes currently in force, particularly when transferring control from code in one location to code in another.

Listing 6.1.

The load and store instructions in Listing 6.1 will as easily move a double byte as they did a byte, if the register you use is in sixteen-bit mode, as in Listing 6.2.

Note that the source data in the define constant statement is now two bytes long, as is storage reserved by the define storage statement that follows. If you look at the interlisted hexadecimal code generated by the assembler, you will see that the address of the label DEST is now $200F. The assembler has automatically adjusted for the increase in the size of the data at SOURCE, which is the great advantage of using symbolic labels rather than fixed addresses in writing assembler programs.

The load and store instructions are paired here to demonstrate that, when using identical addressing modes, the load and store operations are symmetrical. In many cases, though, a value loaded into a register will be stored many instructions later, or never at all, or stored using an addressing mode different from that of the load instruction.

Listing 6.2.

Effect of Load and Store Operations on Status Flags

One of the results of the register load operations—LDA, LDY, and LDX —is their effect on certain status flags in the status register. When a register is loaded, the n and z flags are changed to reflect two conditions: whether the value loaded has its high bit set (is negative when considered as a signed, two's-complement number); and whether the number is equal to zero. The n flag is set when the value loaded is negative and cleared otherwise. The z flag is set when the value loaded is zero and cleared otherwise. How you use these status flags will be covered in detail in Chapter 8, Flow of Control.

The store operation does not change any flags, unlike the Motorola 68xx store instructions. On the other hand, Intel 808x programmers will discover the 65x processors use load and store instructions instead of the 808x's all-encompassing MOV instruction. The 808x move instruction changes no flags whatsoever, unlike the 65x load instruction, which does.

Moving Data Using the Stack

All of the 65x processors have a single stack pointer. (This is a typical processor design, although there are designs that feature other stack implementations, such as providing separate stack pointers for the system supervisor and the user.) This single stack is therefore used both by the system for automatic storage of address information during subroutine calls and of address and register information during interrupts, and by user programs for temporary storage of data. Stack use by the system will be covered in later chapters.

As the architecture chapters in Part II discussed, the S register (stack pointer) points to the next available stack location; that is, S holds the address of the next available stack location. Instructions using stack addressing locate their data storage either at or relative to the next available stack location.

The stack pointers of the 6502 and 65C02 are only eight bits wide; the eight-bit value in the stack pointer is added to an implied base of $100, giving the actual stack memory of $100 to $1FF; the stack is confined to page one. The 65816's native mode stack pointer, on the other hand, is sixteen bits wide, and may point to any location in bank zero (the first 64K of memory). The difference is illustrated in Figure 6.1.

Push

Push instructions store data, generally located in a register, onto the stack. Regardless of a register's size, the instruction that pushes it takes only a single byte.

When a byte is pushed onto the stack, it is stored to the location pointed to by the stack pointer, after which the stack pointer is automatically decremented to point to the next available location.

When double-byte data or a sixteen-bit address is pushed onto the stack, first its high-order byte is stored to the location pointed to by the stack pointer, the stack pointer is decremented, the low byte is stored to the new location pointed to by the stack pointer, and finally the stack pointer is decremented once again, pointing past both bytes of pushed data. The sixteen-bit value ends up on the stack in the usual 65x memory order: low byte in the lower address, high byte in the higher address.

In both cases, the stack grows downward, and the stack pointer points to the next available (unused) location at the end of the operation.

Figure 6.1. Stack Memory.

Pushing the Basic 65x Registers

On the 6502, only the contents of the accumulator and the status register can be pushed directly onto the stack in a single operation, using the PHA and PHP instructions, respectively. The 65C02 adds instructions to push the index registers onto the stack: PHX and PHY.

The 65816 and 65802 let double-byte data as well as single bytes be pushed onto the stack. Figure 6.2 shows the results of both. In the case of the accumulator and index registers, the size of the data pushed onto the stack depends on the settings of the m memory/accumulator select and x index register select flags. Since the accumulator and index registers are of variable size (eight bits or sixteen), the PHA, PHX, and PHY instructions have correspondingly variable effects.

Pull

Pull instructions reverse the effects of the push instructions, but there are fewer pull instructions, all of them single-byte instructions that pull a value off the stack into a register. Unlike the Motorola and Intel processors (68xx and 808x), the 65x pull instructions set the n and z flags. So programmers used to using pull instructions between a test and a branch on the other processors should exercise caution with the 65x pull instructions.

Pulling the Basic 65x Registers

The 6502 pull instructions completely complement its push instructions. PLP increments the stack pointer, then loads the processor status register (the flags) from the page one address pointed to by the offset in the stack pointer (of course, this destroys the previous contents of the status register). PLA pulls a byte from the stack into the accumulator, which affects the n and z flags in the status register just as a load accumulator instruction does.

As instructions for pushing the index registers were added to the 65C02, complementary pull instructions were added, too—that is, PLX and PLY. The pull index register instructions also affect the n and z flags.

On the 65802 and 65816, the push and pull instructions for the primary user registers—A, X, and Y—have been augmented to handle sixteen-bit data when the appropriate select flag (memory/accumulator or index register) is clear. Code these three pull instructions carefully since the stack pointer will be incremented one or two bytes per pull depending on the current settings of the m and x flags.

Pushing and Pulling the 65816’s Additional Registers

The 65816 adds one-byte push instructions for all its new registers, and pull instructions for all but one of them. In fact, the bank registers can only be accessed using the stack.

PHB pushes the contents of the data bank register, an eight-bit register, onto the stack. PLB pulls an eight-bit value from the stack into the data bank register. The two most common uses for PHB are, first, to let a program determine the currently active data bank, and second, to save the current data bank prior to switching to another bank.

Figure 6.2. Push.

Fragment 6.1 is a 65816 code fragment which switches between two data banks. While OTHBNK is declared just once, it represents two different memory cells, both with the same sixteen-bit address of $FFF3, but in two different 64K banks: one is in the data bank that is current when the code fragment is entered; the second is in the data bank switched to by the code fragment. The code fragment could be executed a second time and the data bank would be switched back to the original bank.

Fragment 6.1.

Similar to PHB, the PHK instruction pushes the value in the eight-bit program counter bank register onto the stack. Again, the instruction can be used to let you locate the current bank; this is useful in writing bank-independent code, which can be executed out of any arbitrarily assigned bank.

You're less likely to use PHK to preserve the current bank prior to changing banks (as in the case of PHB above) because the jump to subroutine long instruction automatically pushes the program counter bank as it changes it, and because there is no complementary pull instruction. The only way to change the value in the program counter bank register is to execute a long jump instruction, an interrupt, or a return from subroutine or interrupt. However, you can use PHK to synthesize more complex call and return sequences, or to set the data bank equal to the program bank.

Finally, the PHD instruction pushes the sixteen-bit direct page register onto the stack, and PLD pulls a sixteen-bit value from the stack into the direct page register. PHD is useful primarily for preserving the direct page location before changing it, while PLD is an easy way to change or restore it. Note that PLB and PLD also affect the n and z flags.

Pushing Effective Addresses

The 65816 also provides three instructions which can push data onto the stack without altering any registers. These three push effective address instructions—PEA, PEI, and PER —push absolute, indirect, and relative sixteen-bit addresses or data directly onto the stack from memory. Their use will be explained when their addressing modes are presented in detail in Chapter 11 (Complex Addressing Modes).

Other Attributes of Push and Pull

The types of data that can be pushed but not pulled are effective addresses and the K (or more commonly PBR) program bank register.

PLD and PLB are typically used to restore values from a previous state.

Finally, you should note that even though the push and pull operations are largely symmetrical, data that is pushed onto the stack from one register does not need to be pulled off the stack into the same register. As far as the processor is concerned, data pulled off the stack does not have to be the same size as was pushed onto it. But needless to say, the stack can quickly become garbled if you are not extremely careful.

Moving Data Between Registers

Transfers

The accumulator is the most powerful of the user registers, both in the addressing modes available to accumulator operations and in its arithmetic and logic capabilities. As a result, addresses and indexes that must be used in one of the index registers must often be calculated in the accumulator. A typical problem on the 6502 and 65C02, since their registers are only eight bits wide, is that sixteen-bit values such as addresses must be added or otherwise manipulated eight bits at a time. The other half of the value, the high or low byte, must meanwhile be stored awayfor easy retrieval and quick temporary storage of register contents in a currently unused register is desirable.

For these reasons as well as to transfer a value to a register where a different operation or addressing mode is available, all 65x processors implement a set of one-byte implied operand instructions which transfer data from one register to another:

TAX

transfers the contents of the accumulator to the X index register

TAY

transfers the contents of the accumulator to the Y index register

TSX

transfers the contents of the stack pointer to the X index register

TXS

transfers the contents of the X index register to the stack pointer

TXA

transfers the contents of the X index register to the accumulator

TYA

transfers the contents of the Y index register to the accumulator

Like the load instructions, all of these transfer operations except TXS set both the n and z flags. (TXS does not affect the flags because setting the stack is considered an operation in which the data transferred is fully known and will not be further manipulated.)

The availability of these instructions on the 65802/65816, with its dual-word-size architecture, naturally leads to some questions when you consider transfer of data between registers of different sizes. For example, you may have set the accumulator word size to sixteen bits, and the index register size to eight. What happens when you execute a TAY (transfer A to Y) instruction?

The first rule to remember is that the nature of the transfer is determined by the destination register. In this case, only the low-order eight bits of the accumulator will be transferred to the eight-bit Y register. A second rule also applies here: when the index registers are eight bits (because the index register select flag is set), the high byte of each index register is always forced to zero upon return to sixteen-bit size, and the low-order value of each sixteen-bit index register contains its previous eight-bit value.

Listing 6.3 illustrates these rules with TAY. In this example, the value stored at the location DATA2 is $0033; only the low order byte has been transferred from the accumulator, while the high byte has been zeroed.

The accumulator, on the other hand, operates differently. When the accumulator word size is switched from sixteen bits to eight, the high-order byte is preserved in a "hidden" accumulator, B. It can even be accessed without changing modes back to the sixteen-bit accumulator size by executing the XBA (exchange B with A) instruction, described in the following section. Listing 6.4 illustrates this persistence of the accumulator's high byte. After running it, the contents of locations RESULT. RESULT + 1 will be $7F33, or 33 7F, in low-high memory order. In other words, the value in the high byte of the sixteen-bit accumulator, $7F, was preserved across the mode switch to eight-bit word size.

Listing 6.3.

Now consider the case where the sixteen-bit Y register is transferred to an eight-bit accumulator, as shown in Listing 6.5. The result in this case is $33FF, making it clear that the high byte of the Y register has not been transferred into the inactive high-order byte of the accumulator. The rule is that operations on the eight-bit A accumulator affect only the low-order byte in A, not the hidden high byte in B. Transfers into the A accumulator fall within the rule.

Figure 6.3 summarizes the effects of transfers between registers of different sizes.

There are also rules for transfers from an eight-bit to a sixteen-bit register. Transfers out of the eight-bit accumulator into a sixteen-bit index register transfer both eight-bit accumulators.

In Listing 6.6, the value saved to RESULT is $7FFF, showing that not only is the eight-bit A accumulator transferred to become the low byte of the sixteen-bit index register, but the hidden B accumulator is transferred to become the high byte of the index register. This means you can form a sixteen-bit index in the eight-bit accumulator one byte at a time, then transfer the whole thing to the index register without having to switch the accumulator to sixteen bits first. However, take care not to inadvertently transfer an unknown hidden value when doing transfers from the eight-bit accumulator to a sixteen-bit index register.

Listing 6.4.

Transfers from an eight-bit index register to the sixteen-bit accumulator result in the index register being transferred into the accumulator's low byte while the accumulator's high byte is zeroed. This is consistent with the zeroing of the high byte when eight-bit index registers are switched to sixteen bits.

In Listing 6.7, the result is $0033, demonstrating that when an eight-bit index register is transferred to the sixteen-bit accumulator, a zero is concatenated as the high byte of the new accumulator value.

Listing 6.5.

In the 65816, transfers between index registers and the stack also depend on the setting of the destination register. For example, transferring the sixteen-bit stack to an eight-bit index register, as in Fragment 6.2, results in the transfer of just the low byte. Obviously, though, you'll find few reasons to transfer only the low byte of the sixteen-bit stack pointer. As always, you need to be watchful of the current modes in force in each of your routines.

The 65816 also adds new transfer operations to accommodate direct transfer of data to and from the new 65816 environment-setting registers (the direct page register and the sixteen-bit stack register), and also to complete the set of possible register transfer instructions for the basic 65x user register set:

Figure 6.3. Register Transfers Between Different-Sized Registers.

TCD

transfers the contents of the sixteen-bit accumulator C to the D direct page register. The use of the letter C in this instruction's mnemonic to refer to the accumulator indicates that this operation is always a sixteen-bit transfer, regardless of the setting of the memory select flag. For such a transfer to be meaningful, of course, the high-order byte of the accumulator must contain a valid value.

TDC

transfers the contents of the D direct page register to the sixteen-bit accumulator. Again, the use of the letter C in the

mnemonic to name the accumulator indicates that the sixteen-bit accumulator is always used, regardless of the setting of the memory select flag. Thus, sixteen bits are always transferred, even if the accumulator size is eight bits, in which case the high byte is stored to the hidden B accumulator.

TCS

transfers the contents of the sixteen-bit C accumulator to the S stack pointer register, thereby relocating the stack. Since sixteen bits will be transferred regardless of the accumulator word size, the high byte of the accumulator must contain valid data.

TSC

transfers the contents of the sixteen-bit S stack pointer register to the sixteen-bit accumulator, C, regardless of the accumulator word size.

Listing 6.6.

Listing 6.7.

Fragment 6.2.

TXY

transfers the contents of the X index register to the Y index register. Since X and Y will always have the same register size, there is no ambiguity.

TYX

transfers the contents of the Y index register to the X index register. Both will always be the same size.

Transfer instructions take only one byte, with the source and destination both specified in the opcode itself. In all transfers, the data remains intact in the original register as well as being copied into the new register.

Using TCS and TCD can be dangerous when the accumulator is in eight-bit mode, unless the accumulator was recently loaded in sixteen-bit mode so that the high byte, hidden when the switch was made to eight-bit mode, is still known. Transferring an indeterminate hidden high byte of the accumulator along with its known low byte into a sixteen-bit environment register such as the stack pointer will generally result in disaster.

As always, you need to be watchful of the modes currently in force in each of your routines.

Exchanges

The 65802 and 65816 also implement two exchange instructions, neither available on the 6502 or 65C02. An exchange differs from a transfer in that two values are swapped, rather than one value being copied to a new location.

The first of the two exchange instructions, XBA, swaps the high and low bytes of the sixteen-bit accumulator (the C accumulator).

The terminology used to describe the various components of the eight-or-sixteen bit accumulator is: to use A to name the accumulator as a register that may be optionally eight or sixteen bits wide (depending on the m memory/accumulator select flag); to use C when the accumulator is considered to be sixteen bits regardless of the setting of the m flag; and, when A is used in eight-bit mode to describe the low byte only, to use B to describe the hidden high byte of the sixteen-bit accumulator. In the latter case, when the accumulator size is set to eight bits, only the XBA instruction can directly access the high byte of the sixteen-bit "double accumulator", B. This replacement of A for B and B for A can be used to simulate two eight-bit accumulators, each of which, by swapping, "shares" the actual A accumulator. It can also be used in the sixteen-bit mode for inverting a double-byte value. The XBA instruction is exceptional in that the n flag is always set on the basis of bit seven of the resulting accumulator A, even if the accumulator is sixteen bits.

The second exchange instruction, XCE, is the 65816's only method for toggling between 6502 emulation mode and 65816 native mode. Rather than exchanging register values, it exchanges two bits—the carry flag, which is bit zero of the status register, and the e bit, which should be considered a kind of appendage to the status register and which determines the use of several of the other flags.

Fragment 6.3 sets the processor to 6502 emulation mode. Conversely, native mode can be set by replacing the SEC with a CLC clear carry instruction.

Fragment 6.3.

Because the exchange stores the previous emulation flag setting into the carry, it can be saved and restored later. It can also be evaluated with the branch-on-condition instructions to be discussed in Chapter 8 (Flow of Control) to determine which mode the processor was just in. A device driver routine that needs to set the emulation bit, for example, can save its previous value for restoration before returning.

The selection of the carry flag for the e bit exchange instruction is in no way connected to the normal use of the carry flag in arithmetic operations. It was selected because it is easy to set and reset, it is less frequently used than the sign and zero flags, and there are branch-on-condition instructions which test it. The primary use of the SEC and CLC instructions for arithmetic will be covered in upcoming chapters.

Storing Zero to Memory

The STZ instruction, introduced on the 65C02, lets you clear either a single or double byte memory word to zero, depending, as usual, on the current memory/accumulator select flag word size. Zero has long been recognized as one of the most commonly stored values, so a “dedicated" instruction to store zero to memory can improve the efficiency of many 65x programs. Furthermore, the STZ instruction lets you clear memory without having to first load one of the registers with zero. Using STZ results in fewer bytes of code, faster execution, and undisturbed registers.

Block Moves

The two block move instructions, available only on the 65802 and the 65816, let entire blocks (or strings) of memory be moved at once.

Before using either instruction, all three user registers (C, X, and Y) must be set up with values which serve as parameters.

The C accumulator holds the count of the number of bytes to be moved, minus one. It may take some getting used to, but this "count" is numbered from zero rather than one. The C accumulator is always sixteen bits: if the m mode flag is set to eight bits, the count is still the sixteen-bit value in C, the concatenation of B and A.

X and Y specify either the top or the bottom addresses of the two blocks, depending on which of the two versions of the instruction you choose. In Listing 6.8, $2000 bytes of data are moved from location $2000 to $4000.

Listing 6.8.

The MVN instruction uses X and Y to specify the bottom (or beginning) addresses of the two blocks of memory. The first byte is moved from the address in X to the address in Y; then X and Y are incremented, C is decremented, and the next byte is moved, and so on, until the number of bytes specified by the value in C is moved (that is, until C reaches $FFFF). If C is zero, a single first byte is moved, X and Y are each incremented once, and C is decremented to $FFFF.

The MVP instruction assumes X and Y specify the top (or ending) addresses of the two blocks of memory. The first byte is moved from the address in X to the address in Y; then X, Y and C are decremented, the next byte is moved, and so on, until the number of bytes specified by the value in C is moved (until C reaches $FFFF).

The need for two distinct block move instructions becomes apparent when the problem of memory overlap is considered. Typically, when a block of memory starting at location X is to be moved to location Y, the intention is to replace the memory locations from Y to Y + C with the identical contents of the range X through X + C. However, if these two ranges overlap, it is possible that as the processor blindly transfers memory one byte at a time, it may overwrite a value in the source range before that value has been transferred.

The rule of thumb is, when the destination range is a lower memory address than the source range, the MVN instruction should be used (thus "Move Next") to avoid overwriting source bytes before they have been copied to the destination. When the destination range is a higher memory location than the source range, the MVP instruction should be used ("Move Previous").

While you could conceivably move blocks with the index registers set to eight bits (your only option in emulation mode), you could only move blocks in page zero to other page zero locations. For all practical purposes, you must reset the x mode flag to sixteen bits before setting up and executing a block move.

Notice that assembling an MVN or MVP instruction generates not only an opcode, but also two bytes of operand. The operand bytes specify the 64K bank from which and to which data is moved. When operating in the 65816's sixteen-megabyte memory space, this supports the transfer of up to 64K of memory from one bank to another. In the object code, the first byte following the opcode is the bank address of the destination and the second byte is the bank address of the source.

But while this order provides microprocessor efficiency, assembler syntax has always been the more logical left to right, source to destination (TAY, for example, transfers the accumulator to the Y index register). As a result, the recommended assembler syntax is to follow the mnemonic first with a 24-bit source address then with a 24-bit destination address—or more commonly with labels representing code or data addresses. The assembler strips the bank byte from each address (ignoring the rest) and inserts them in the correct object code sequence. (Destination bank, source bank.) For example:

The bank byte of the label SOURCE is 02 while the bank byte of the label DEST is 01. As always, the assembler does the work of converting the more human-friendly assembly code to the correct object code format for the processor.

If the source and destination banks are not specified, some assemblers will provide a user-specified default bank value.

The assembler will translate the opcode to object code, then supply its bank value for both of the operand bytes:

440000

MVP

If either bank is different from the default value, both must be specified.