Thumb Instruction Set - ARM Systems and Development - Professional Embedded ARM Development (2014)

Professional Embedded ARM Development (2014)

Part I. ARM Systems and Development

Chapter 6. Thumb Instruction Set

WHAT’S IN THIS CHAPTER?

Presenting Thumb

What is Thumb used for?

What cores run Thumb?

What are the advantages of Thumb?

How to switch between ARM and Thumb

Writing for Thumb

Many of the most popular 32-bit processors for mobile devices use Reduced Instruction Set Computer (RISC) technology. Unlike Complete Instruction Set Computer processors (CISC), Reduced Instruction Set Computer engines generally execute each instruction in a single cycle, often resulting in faster program execution using the same clock speed.

Increased performance, however, comes at a price: a RISC processor typically needs more memory than a CISC does to store the same program. To achieve the same results as a single CISC instruction, RISC engines often require two, three, or more simpler instructions. For most embedded devices, memory constraints are more important than execution speed, so reducing code size is important.

In 1995, ARM released the Thumb instruction set, used for the first time on the ARM7TDMI core. Thumb instructions are denser than their ARM counterparts, being 16-bits long in the original Thumb extension. All Thumb instructions map directly to ARM instructions, but to save space, the instructions were simplified.

Thumb was introduced not only for the denser code, but also for devices that did not have full 32-bit wide memory access. One of the first devices to use Thumb, the Game Boy Advance, had little memory that was accessible in 32-bits; most of the memory was 16-bits wide. Accessing a 32-bit instruction meant accessing the first 16-bits and then waiting for the next 16-bits, which was very slow. By using Thumb, the Game Boy Advance could keep the game instructions to 16-bits long and avoid the slowdown of accessing 32-bits.

Although using Thumb codes is often slightly slower to the ARM counterpart due to simplification, accessing 16-bit memory meant that the ARM7TDMI outperformed full 32-bit processors with 16-bit wide data channels.

Performance, energy efficiency, and memory footprint are the most important considerations for designers of embedded systems, and Thumb effectively addresses each requirement, making it perfect for mobile applications.

Thumb-2 technology was introduced in the ARM1156T2-S core in 2003. It extended the original Thumb instruction set by including 32-bit instructions, therefore making the instruction set variable-length. This addition made Thumb-2 on average 26 percent denser than the ARM instruction set, while providing a 25 percent performance boost over original Thumb code.

ThumbEE was introduced in 2005 and was marketed as Jazelle Runtime Compilation Target (RCT). By making small changes to the Thumb-2 instruction set, it was designed to target languages such as Python, Java, and Perl. ARM deprecated the use of ThumbEE with revision C of its ARMv7 architecture. ThumbEE was an addition to, not a replacement of, Thumb-2. New cores continue to support Thumb-2.

Of course, reducing a 32-bit instruction to 16-bits means making sacrifices, and 16-bit Thumb opcodes have less functionality. The condition codes were removed for everything except branching. Also, most of the opcodes can no longer access all the registers, but only the lower half. However, because Thumb instructions are mapped to ARM instructions in hardware, the result is an execution speed that is identical to ARM execution speed. However, only accessing one-half of the registers implies some slowdowns.

Because Thumb is designed as a target for C compilers, it is not designed to be used directly; rather, developers should use a higher language such as C. You must understand the principles behind the instruction set to write optimized code, but unlike the ARM ISA, almost all optimization should be done directly in C.

THUMB

Thumb was released in 1995 in the ARM7TDMI, one of the most versatile ARM cores produced. Most application code written in C can be compiled directly in Thumb; however, some driver code may need to be written in ARM code. For the Game Boy Advance, lower memory was 32-bit, in which ARM instructions could be run without degrading performance, and all the driver code was located in this portion of memory. The upper memory was 16-bits wide, and this is where Thumb code was located: the game code. The problem was how to switch between the two.

Originally created for the ARM7TDMI, Thumb is a “state,” and developers can switch between ARM state and Thumb state. In ARM state, executing a Thumb instruction results in an undefined instruction exception. The processor needs to be told that the following instructions are Thumb, and the best place for that is the Branch instruction.

The original Thumb ISA by itself isn’t enough for all core components of a bare-bones system. Although perfectly suited for applications, it does not have all the required instructions to handle a complete system. (For example, it has no way of interacting with a coprocessor.)

THUMB-2 TECHNOLOGY

Thumb-2 technology is a major enhancement to the already popular Thumb extension. It provides 32-bit instructions that can be intermixed with the existing 16-bit instructions, making the Thumb extension variable length. The ARM core automatically detects the length of the next instruction and fetches the entire instruction.

By adding 32-bit instructions, almost all the ARM ISA functionality is now covered and adds DSP functionality. Because Cortex-M processors support only the Thumb ISA and cannot use the ARM ISA, Thumb-2 added major functionality, including the possibility of performing advanced calculation on a microcontroller architecture.

Thumb-2 didn’t only introduce 32-bit instructions; there are also new 16-bit instructions, adding enhanced energy efficiency and making the instruction set closer to C. Thumb-2 also makes it possible to create an entire system, not just an application, using only the Thumb ISA.

HOW THUMB IS EXECUTED

The question that most people ask is, “Why does the ARM core have two separate instruction sets?” In reality, it has only one; Thumb can be considered shorthand for the ARM instruction set. Thumb was originally developed by looking at the most-used ARM instructions and creating 16-bit counterparts. Therefore, the Thumb instruction ADD r0, #1 is automatically “translated” in hardware to the ARM instruction ADDS r0, r0, #1, incurring no penalty to execution time. This is illustrated in Figure 6-1.

FIGURE 6-1: Thumb Decompressor

image

Originally, the ARM7TDMI had separate decompressor logic, but with the ARM9TDMI onwards, this was integrated into the pipeline’s decoder logic. This is illustrated in Figure 6-2. Its function remains the same — all Thumb instructions are mapped to an ARM instruction, only the logic has been moved to the pipeline to simplify the processor’s design and to increase efficiency.

FIGURE 6-2: 3-stage pipeline with Thumb decoder

image

ADVANTAGES OF USING THUMB

There are two major advantages to using Thumb ISA over ARM ISA. Thumb instructions are 16-bits wide, and Thumb-2 adds a mixture of 16 bit and 32 bits. Because more instructions can be written per word of memory, Thumb is denser. This is useful for systems with limited memory; although writing in Thumb sometimes means a few more instructions, the end result is often 40 percent denser than code written for the ARM ISA, meaning less memory requirements.

Another major advantage of using Thumb is also due to the instruction size. Since Thumb instructions are denser, the cache can contain more Thumb instructions than ARM instructions, so Thumb applications often cache better than their ARM equivalent.

Take, for instance, a simple division program. This program takes a number in r0 and divides it by the number provided in r1. It returns the quotient in r0 and the remainder in r1. In this unoptimized code, it simply subtracts r1 from r0 as many times as possible and returns the amount of times it can do so. In ARM, such a routine could be written like this:

MOV r3, #0

loop

SUBS r0, r0, r1

ADDGE r3, r3, #1

BGE loop

ADD r2, r0, r1

MOV r0, r3

MOV r1, r2

This code is 7 instructions long, and each instruction is 4 bytes long, so the code is 28 bytes long. This is a short subroutine, but it is a perfect example to show how Thumb can take up less space. In Thumb, taking into account the various subtleties of the instruction set, this could be written:

MOV r3, #0

loop

ADD r3, #1

SUB r0, r1

BGE loop

SUB r3, #1

ADD r2, r0, r1

MOV r0, r3

MOV r1, r2

There are more instructions in Thumb, and developers that are used to the ARM instruction set might be surprised at the way in which this code was written. The changes between Thumb and ARM are explained later in this chapter, in the section “Introduction to Thumb.”

This routine in Thumb is 8 instructions long, but because each instruction is only 2 bytes long, that means that the entire code is only 16 bytes long, whereas the ARM code was 28 bytes long.

One of the worries about using Thumb ISA is that Thumb is slower than ARM. Although it is sometimes slightly slower than ARM code, it isn’t for the reasons that you might first think. Thumb code is often slightly longer than ARM code, so more cycles might be necessary to obtain the same result. Also, fewer registers are readily available, which might impact some routines. However, the benefits outweigh the inconvenience. Since the code is denser, it uses less memory, meaning an embedded system can contain less memory, making them more cost efficient and power efficient. More instructions can fit into the same cache size, meaning that Thumb applications can out-perform ARM equivalent programs. To prove Thumb’s power, an entire family of ARM cores, the Cortex-M, relies solely on Thumb code to obtain powerful microcontrollers with incredibly low power consumption.

Thumb was also designed with rapid development in mind. Thumb was developed as a compiler target, meaning that all development can be done in a higher language such as C. Although Cortex-A and Cortex-R cores need at least some development in assembler for boot code, Cortex-M cores that use only the Thumb ISA can be developed entirely in C.

CORES USING THUMB

The Thumb ISA was introduced in the ARM7TDMI design. All ARM9 and later chips support the Thumb instruction set.

Thumb-2 technology was introduced in the ARM1156T2-S and extends the Thumb ISA. All ARMv7 cores and later include Thumb-2.

The Cortex-A and Cortex-R families, supporting the ARMv7 and later architecture, both support Thumb and Thumb-2. These processors use the ARM ISA but can switch to the Thumb ISA when needed.

Cortex-M processors are a little different. They do not support the ARM ISA and support only the Thumb/Thumb-2 ISA. The Cortex-M3 and Cortex-M4 cores belong to the ARMV7-M and ARMV7E-M architectures, respectively, and support the full subset of Thumb and Thumb-2.

The Cortex-M0, M0+, and M1 belong to the ARMv6-Marchitecture and have more restrictions. They support the Thumb ISA but do not implement three instructions: CBZ, CBNZ, or IT. They do however support Thumb-2, but only a limited subset: BL, DMB, DSB, ISB, MRS, and MSR. Table 6-1shows the different instructions supported by the different versions of Cortex-M cores. Instructions are listed according to their availability in Cortex-M cores, as well as the instruction length. All instructions available in one Cortex-M core are always available in later cores. For example, all of the instructions available in the Cortex-M0 core are available in the Cortex-M3.

TABLE 6-1: Cortex-M Core Instructions

image

Some Cortex-M4 cores have an optional Floating Point Unit, allowing for more advanced calculations to be done with hardware optimization.

Due to the differences between the ARMv6-M and ARMv7-M architectures, you must know both Thumb and the Thumb-2 extension. Although you might be tempted to start a new project directly with a Cortex-M4 to have access to the entire Thumb and Thumb-2 ISA, the Cortex-M0+ is still an excellent microcontroller core, in active production, and has advantages over the more recent Cortex-M4 core.

ARM-THUMB INTERWORKING

On processors supporting both the ARM and Thumb ISA, you can switch from one state to another, which is known as interworking. Changing state is a simple process that you can do without any penalty compared to a basic branch instruction.

When a Cortex-A/R or Classic ARM core is turned on (when it enters the RESET state), it automatically starts in ARM state. A specific instruction must be issued for the core to switch to Thumb state.

Cortex-M cores are different. Because these cores do not support the ARM ISA, the core is automatically in Thumb state.

With the exception of the Cortex-M, the core must be told to switch to Thumb state; it does not do it automatically. Attempting to execute Thumb instructions while in ARM state most often results in an illegal instruction.

To enter Thumb state from ARM state (or vice versa), use the BX/BLX instruction: Branch and Exchange Instruction (with possible link). ARM processors are natively aligned; they must have instructions on a certain boundary. Typically, the processor looks for instructions aligned on 32 bits, or in the case of Thumb, 16 bits. In either case, the last address bit is not used and should always be 0. Thumb-capable cores make use of this by using the least significant address bit of the branch destination to indicate the instruction set state. By setting the LSB to 1, instead of looking for an impossible address, this tells the core that the next instruction will be a Thumb instruction and that the core should change to Thumb state.

Note that when a non-Cortex-M core handles an exception, the processor is automatically returned to ARM mode, or on select processor, whatever state was configured in the SCTRL register. When returning to the previous code, the CPSR is restored and the state of the processor is contained in the CPSR, returning the processor to its original state.

INTRODUCTION TO THUMB-1

Thumb was created after an analysis of the most-used ARM instructions in compiled C code. To have a complete 16-bit instruction set, several sacrifices were made. Thumb instructions cannot access all registers; they cannot access status or coprocessor registers; and some functionality that might have been done in one instruction in ARM state can take two or three in Thumb state. However, Thumb is not an entirely new language; an ARM core does not have two distinct languages embedded into silicon. Thumb instructions are mapped directly to ARM-state counterparts, expanding 16-bit instructions to 32-bit equivalents.

Writing efficient code for Thumb is a little more challenging, and for those who already know the ARM ISA, Thumb is easy to learn. It isn’t a question of what can be done, but rather what can’t be done in Thumb. That doesn’t mean that Thumb is limited, it means that Thumb is designed differently.

Register Availability

The biggest change is in the register access. When operating in Thumb state, the application encounters a slightly different set of registers. As stated previously, most Thumb instructions can access only r0 to r7 (known as the low registers), not the entire set of registers. However, the SP, LR, and PC registers are still available, as shown in Figure 6-3.

FIGURE 6-3: Comparison between ARM and Thumb register availability

image

Three instructions enable access to the high registers: MOV, ADD, and CMP. The reason for this is the shortening of instructions from 32 bits. One of 16 registers can be expressed as a 4-bit number, and one register in 8 can be expressed as a 3-bit number; therefore, saving 1 bit. For instructions that require multiple registers as arguments, this saves lots of space.

Removed Instructions

Some instructions had to be removed to create Thumb ISA. Because Thumb was originally designed for application-level code and not system code, no instructions exist to access coprocessors. Swap instructions were removed because they were not used often enough to merit a specific opcode. Reverse subtractions were also removed (RSB and RSC) and of all the different multiplications that exist, only MUL remains.

For those developing full applications on a Cortex-M0+, or other cores belonging to the ARMv6-M specification, these cores support a small subset of the Thumb-2 ISA, including MUL/MLA instructions.

No Conditionals

Conditional execution has been removed from almost all instructions; only branch instructions still have condition flags. This makes code more in line with C code; different sections of code are now separated, and the C instruction if (a == b) now becomes a compare instruction and a conditional branch.

Set Flags

With Thumb, almost every operation automatically executes an Update Status Register, effectively adding the S suffix to every instruction. ADD now becomes ADDS, without having to add the S to the end of the opcode. This is a major change from the ARM ISA where the status flags were updated only on request, and multiple conditional instructions could be executed. In Thumb, this is no longer possible because sacrifices had to be made to keep instructions 16-bits long, but it does make the assembly code more in line with C.

No Barrel Shifter

Barrel shift operations can no longer be used directly inside instructions. The barrel shifter still exists, of course, and can still be used, but in Thumb, specific instructions now exist to perform shifts. Thumb therefore introduced some new instructions: ASR, LSL, LSR, and ROR now become instructions instead of operators on operands. What used to be done in a single instruction now takes two; a single shift instruction followed by another instruction.

Some decompilers will still show these instructions even in ARM, since they are Unified Assembly language instructions. However, in its true ARM form, these instructions are MOV instructions with a barrel shift used as a second operand. Unified Assembly Language will be discussed in the next chapter.

Reduced Immediates

With the ARM ISA, most instructions allowed for a second operand, which was either an immediate value or a (shifted) register. In Thumb, this is no longer the case because the instructions have been simplified and now resemble C assignment operators (for example, += and |=).

As with most rules, there are exceptions. ADD, SUB, MOV, and CMP can have both immediate values and second operands, making their usage close to the ARM ISA.

Stack Operations

Stack access has been greatly simplified; where ARM instructions could specify pre- or post-indexing and specify if the stack is ascending or descending, Thumb simplifies this with two simple instructions: PUSH and POP. These instructions, just like all Thumb instructions, are mapped to ARM instructions, and in this case they are mapped to STMDB and LDMIA.They do not specify a register; they assume that the stack pointer is r13 and update the SP automatically.

Stack operations work on the lower registers (r0 to r7) but also on the link register and program counter. The PUSH register list can also include the link register (LR), and POP can include the Program Counter (PC). Using this, you can return from a subroutine with the following sequence:

PUSH {r1, lr}

; subroutine

POP {r1, pc}

INTRODUCTION TO THUMB-2

Thumb-2 does not replace the original Thumb ISA; it enhances it. Code written in Thumb remains compatible in a Thumb-2 environment.

Thumb-2 adds both 16-bit and 32-bit instructions to the instruction set, making Thumb-2 a variable width instruction set. Some instructions have 16-bit and 32-bit versions, depending on the way they are implemented.

32-bit instructions were added to the Thumb ISA to provide support for exception handling in the Thumb state, as well as access to coprocessors and the addition of DSP instructions. This is a huge advantage to Cortex-M cores, adding processing power to an energy-efficient design. Cortex-M processors belonging to the ARMv7-M architecture support the entire Thumb-2 subset, and the Cortex-M4F also supports Thumb DSP instructions.

Thumb-2 also supports conditional execution, albeit in the form of a new instruction, IT, short for If-Then. Thumb-2 offers the best of both worlds: the performance of ARM and the memory footprint of Thumb.

New Instructions

Thumb-2 introduced a large amount of instructions to make Thumb-2 almost as complete as the ARM ISA and to enable Cortex-M cores to have full access to most of the ARM functionality. The following is a non-exhaustive list of instructions added to Thumb ISA in Thumb-2.

If Then

One of the new instructions is the IT instruction, short for If-Then. Just like the C equivalent, IT conditionally executes one portion of code or another but avoids the need for jumping. The benefit is that this avoids the penalty for taking a branch due to the nature of the pipeline.

The instruction is defined as follows:

IT<x><y><z> <cond>

The cond variable is any of the condition codes available in the ARM ISA: EQ, GT, LT, NE, and so on.

The variables x, y, and z are optional and are either T (for Then) or E (for Else). Depending on the amount of Ts and Es (including the first T in IT), the processor conditionally executes code. This can sound confusing, so here’s an example.

The easiest form is the IT instruction itself:

CMP r0, r1 ; r0 == r1?

IT EQ ; was the result EQ?

MOVEQ r0, r4 ; If r0 == r1, execute this instruction

This is the most basic form and can execute the MOV instruction if r0 equals r1. Note that inside an IT block, all instructions must have condition codes.

Up to four instructions can be linked to the IT command, in any order. For example, a typical C If/Then section could look like this:

CMP r0, r1 ; r0 == r1?

ITE EQ ; was the result EQ?

MOVEQ r0, r4 ; If r0 == r1, execute this instruction

MOVNE r0, r5 ; Else execute this instruction

Or for a more complete example, use this:

CMP r0, r1 ; r0 == r1?

ITEET EQ ; was the result EQ? Then / Else / Else / Then

MOVEQ r0, r4 ; If r0 == r1, execute this instruction

MOVNE r0, r5 ; Else execute this instruction

SUBNE r0, #1 ; Else execute this instruction too

BEQ label ; If r0 == r1, branch

People are often surprised at the instruction layout; you can specify a Then instruction, followed by an Else instruction, and then another Then instruction, something that isn’t possible in C. This is actually a clever design from ARM. IT code blocks have restrictions; each instruction inside the IT block must have condition codes. Also, it is not allowed to branch into an IT block unless returning from an exception. Branching from inside an IT block is allowed only if it is the last instruction inside the block, which is the case in this example.

Compare and Branch

Thumb-2 introduced two new branch methods: CBZ and CBNZ. CBZ, short for Compare and Branch if Zero, compares a register with zero, and branches to a label if the comparison EQ is true. CBNZ is the opposite and branches if the comparison NE is true.

CBZ and CBNZ have the advantage of not updating the condition code flags, potentially reducing code size. In ARM, or in Thumb, this portion of code compares a register and breaks if it equals zero:

CMP r0, #0

BEQ label

In Thumb-2, this is reduced to a single 2-byte instruction (with the difference that it does not update the condition flags).

CBZ r0, label

In C, this is the equivalent of: if x == 0 { function }.

Bitfield Operations

Bitfield instructions were added to Thumb-2, enabling the copy of portions of a register to another, or clearing and inserting portions of a register. The BFC and BFI instructions enable clearing and inserting data at a variable offset, whereas SBFX and UBFX enable signed and unsigned bitfield extraction.

Coprocessor

System coprocessor instructions can be executed in Thumb, using the MCR and MRC instructions. On Cortex-A/R cores, CP15 instructions can also be executed directly from Thumb-2.

DSP

Thumb-2 also introduced 32-bit Digital signal processing instructions, giving Thumb the calculation power that until now was reserved for ARM code, either enabling Thumb applications to access advanced calculation routines, or enabling more powerful Cortex-M cores. The Thumb-2 DSP instructions were added to the Cortex-M4.

FPU

Floating-point instructions were added to Thumb-2 and are supported by the optional Floating Point Unit in the Cortex-M4. Floating-point operations can now be done directly in hardware by the ARM core, in Thumb state. FPU extensions enable Thumb-2 to be used in intensive calculations such as audio compression and decompression, and automotive applications.

WRITING FOR THUMB

By default, a compiler automatically compiles for the ARM ISA. It needs to be told about the target CPU and what instruction set it will use.

Start with a basic program:

int myfunc(int a)

{

a = a + 7;

return a / 2;

}

This is a basic program that doesn’t access any system registers and could be written in both ARM and Thumb. Now, compile this for a Cortex-A8 processor, using default settings:

arm-none-eabi-gcc -c -mcpu=cortex-a8 test.c -o testarm.o

What exactly has the compiler made of this?

arm-none-eabi-objdump -S testarm.o

This is just a few lines of output:

28: e1a030c3 asr r3, r3, #1

2c: e1a00003 mov r0, r3

30: e28bd000 add sp, fp, #0

34: e8bd0800 ldmfd sp!, {fp}

38: e12fff1e bx lr

Those five instructions are all 32-bits long, and what is that LDMFD instruction? LDMFD can be used on the Cortex-M3 and Cortex-M4; however, fp is a high register, so this can’t be Thumb code. This code has been compiled for the ARM ISA. So how do you compile for Thumb? By telling the compiler:

arm-none-eabi-gcc -c -mcpu=cortex-a8 -mthumb test.c -o testthumb.o

By specifying the -mthumb option in the CodeSourcery compiler, the compiler now knows that it needs to compile in Thumb. Time to have a look at that file:

arm-none-eabi-objdump -S testthumb.o

Again, these are just a few lines of output:

1c: 4618 mov r0, r3

1e: f107 070c add.w r7, r7, #12

22: 46bd mov sp, r7

24: bc80 pop {r7}

26: 4770 bx lr

The first thing that can be seen is the instruction length: both 16-bits and 32-bits. After all, the Cortex-A8 supports both Thumb and the Thumb-2 extension, so why not make the most of it?

As for Thumb Interworking, the compiler and linker handle this transparently. Any ARM code calling the myfunc subroutine automatically results in a BLX instruction, telling the core to switch to Thumb mode. If called from Thumb code, the compiler makes no changes.

SUMMARY

The Thumb instruction set is designed to be a compiler target, and as such, it is advised to use a higher language, such as C or C++. Some of the changes made in Thumb, compared to ARM systems, make the code different. Porting code from ARM to Thumb at assembly level is often difficult because ARM code relies heavily on conditional execution.

Instructions in the original Thumb extension are 16-bits wide and can be used to reduce the memory footprint of an application and can potentially allow the data bus path to be reduced to 16-bits without a major performance penalty. Thumb-2 is an extension to Thumb, adding both 32-bit instructions and 16-bit instructions, and adding instructions for both DSP and floating-point calculations.

With this background, the chapter showed how to compile source code for the Thumb ISA by telling the compiler to use a specific instruction set.

In the next chapter, I will talk more about ARM Assembly Language, including the Unified Assembly Language that allows developers to write programs for both Thumb and ARM modes. I will talk more about each category of instruction, and give examples for some of the most common instructions.