ARM Embedded Systems - ARM Systems and Development - Professional Embedded ARM Development (2014)

Professional Embedded ARM Development (2014)

Part I. ARM Systems and Development

Chapter 2. ARM Embedded Systems


Understanding the concept of an embedded system

Understanding why ARM processors are used in embedded systems

Choosing the right processor for the job

Getting the necessary tools

Knowing the different products used for embedded development

Imagine you’re scheduled for a big project, one of the company’s strategic sales. You’ve just finished a project, and you take the weekend off to relax. On your way home, you go to the store to pick up one of the latest computer games you’ve heard about. On the box, you see the system requirements — a guide to what is needed to play the game in optimal conditions. Looks like your home computer is up to it, so you buy the game and go home. Sure enough, the game does run, but on the box it says that the game requires 4 gigabytes of memory, but 8 is recommended. You have only 4, and you can feel that it is a little slow from time to time. Never mind, the shop is only 5 minutes away! You return and buy another 4 gigabytes of memory. When you return, you open up the computer, install the memory, and turn the computer back on. The system beeps, the screen flickers, and within a few seconds, the screen tells you that you now have 8 gigabytes of memory. The operating system fires up, and after a few seconds, you can run your game. And yes, it is much faster; you have no excuse not getting past the first level!

A lot has happened here. A few months ago, a team of developers were creating this game. Someone, somewhere, had to make a choice. Out of all the possible combinations, how did you decide what the minimum requirements should be? The company probably has statistics from previous games to know what their users use. It might have done a survey to know what most people use. It might even have boldly said, “Everyone should have x amount of memory.” It has all happened before. On most systems, this isn’t a problem because memory can be swapped out, and more can be added. This is also mostly true for the CPU, the graphics card, or the hard disk, some of the other things that are normally noted on a system requirement sheet. In this case, you decided to add some memory, effectively doubling what you had.

Monday morning, and with a good dose of caffeine, it is time to go to the office to learn more about this new project. It is an aeronautic project; a major company wants to outsource one of its sensors on a new drone. Your bosses believe that your team is up for the job. This is a system that will be attached to an existing drone to monitor air quality coming from different sensors and to record it. You need input from air sensors, temperature sensors, a GPS unit to record where samples came from, and possibly a few other sources that will be added during the project. (This is the sort of phrase that can send shivers down your spine.) Your company hasn’t yet won the contract; this is only a research and development product. You are in competition with at least two other companies, so how do you win the contract? You will be judged on several factors:

Cost of the system — Naturally, it wants the cheapest solution possible. Using expensive components can result in beautifully fast products, but it all comes at a price. The question here is, “What is really needed?”

Size of the finished product — This product will go onto a drone that has only limited space.

Weight — Again, a drone can hold only a certain amount of weight.

Power Consumption — The less it uses, the more is available for other systems. The drone is also probably electric; although you don’t know that for sure. Making a system that uses the last amount of power possible could give you the winning edge.

Speed — You will be taking measurements, and the more measurements you can take, the better the system can perform.

With that in mind, the company has made some strategic decisions. There was probably a great deal of talk in meetings as to what was actually needed. What factor do you concentrate on? What is needed, and what factor can give you the edge, if you concentrate on it? Remember, most of the hardware will be decided at the start of the project. As a developer, you should know what to work on from the start. A few changes will probably be made (it is rare that hardware isn’t changed slightly during development), but you already know what you will use.

The specifications are simple: one ARM processor, 2 MB of RAM, and 2 MB of ROM, 16 digital input lines, 8 digital output lines. You will also use an SD card for data storage, but the CPU you chose doesn’t have a native SD controller; you must make your own. This isn’t actually a problem. It does mean more work for the developers, but the chips that came with an SD controller were more expensive and much larger. You can use some of your digital outputs and inputs for this task.

That’s it! Your team is ready to go. Now be careful how you program; the system has 2 MB of RAM and 2 MB of ROM, and that cannot change. Adding more memory would mean adding more chips, increasing size, weight, cost, and electrical use. Don’t worry; on this sort of system, 2 MB is more than enough and can even enable you to put extended features in. The same goes for the amount of input and output lines that were made available. In theory you need only half, but it is a good thing that you have more available because the client asked for a new feature halfway through the project. The data was to be sent by Wi-Fi to a receiving station when the drone flew over it. One of your competitors didn’t have enough input and output lines planned ahead and couldn’t continue the project without changing its hardware, and preferred to cancel the project.

The ideal processor does not exist; if it did, everyone would use it. The ideal processor would have an astronomical amount of calculation power, running on battery power for months, with little or no heat being produced. Unfortunately, that will not be happening any time soon. The embedded world is all about trade-offs. By sacrificing one characteristic, you can augment another.


There are various definitions of what an embedded system is. Some people talk about small factor systems; others talk about a system stripped of all the unwanted options. An intelligent water meter is an embedded system; it is lightweight and has only what is needed. It controls a larger system. More specifically, it does only one thing, and it does it well. This brings the question, “Is a mobile telephone an embedded system?” Engineers will argue over this. Some would say “yes” because the device is custom designed and made (no two mobile telephones have the same factor mainboard), and it has a specific task in mind: to place and receive telephone calls. Some would argue “no” because these devices have become so powerful, are like personal computers, and have complete operating systems, and end users can install software that places them in a new category: mobile devices.

Luckily for developers, the definition is much simpler. Unlike working on standard computer applications, the developer knows exactly what the application will run on.

An ARM-embedded system is an electronic system with an ARM-powered core, with fixed hardware specifications. The processor could be a standalone ARM-powered processor or possibly a system on chip. In both cases, the system is designed with a single task in mind, regardless of the electronic components used.

What Is a System on Chip?

Some embedded systems are as small as possible, containing only the absolutely essential components for the application. The advantage of such a system is often cost and often power conservation. For other designs, you can use a system on a chip (SoC); a single chip contains a processor and almost all the components needed for an entire system — and often much more than is required. Both are available, and both have their advantages. A few years ago, SoC systems were prohibitively expensive, but with today’s market and the amount of processors made, some SoC chips can be made relatively cheaply, and in some cases, cheaper than having a simple processor and adding hardware to meet your needs. The amount of research and development needed to create a printed circuit board with all the necessary components often outweighs the advantage of having a single chip (coupled with some memory). However, SoC chips mean more transistors, so greater power requirements. They also mean little R&D cost because large semi-conductor companies invest heavily in these chips; therefore, they also invest heavily in making software. Most SoC systems have complete support for at least one full operating system, often several. Installing an operating system onto a working board can often be done in little more than minutes.

ARM’s first attempt at SoC was the ARM250, based on the ARMv2a architecture. The ARM250 was used in budget versions of the Archimedes A3000 and A4000 computers. It did not have any cache, but it did contain the ARM core, a memory controller, a video controller, and an I/O controller integrated directly into one piece of silicon. The use of ARM250 meant that the mainboard was less complex, but initial supply problems meant that early machines had a mezzanine board above the CPU, essentially simulating an SoC.

Today, things are different. ARM still licenses cores, but the licensees often create amazing SoC systems, including literally everything needed for a single-board computer. Freescale’s iMX 6 series processor contains a DDR controller, four USB2.0 ports, gigabit Ethernet, PCI Express, a GPU, and much more, alongside a quad-core Cortex A9, all on one single chip. The Chinese device-maker Hiapad has created the Hi-802, a tiny complete system only just larger than a USB key, connecting directly to the HDMI port of a television or monitor, which has USB to plug in a keyboard and mouse and integrates Bluetooth, an SD slot, and Wi-Fi. Of course, there are also cheaper versions; a device called the U2 has been selling for as little as $20US, containing a 1.5 GHz Cortex A8, containing again a wireless network card, HDMI interface, USB, ports, and SD-card support.

Manufacturers often have different philosophies concerning SoC chips, and care must be taken when deciding which system to use. Freescale has always been known to make energy-efficient systems, at the cost of slightly degraded processing power. Nvidia, with its Tegra series, have always made multimedia its priority. Samsung makes some of the fastest SoC chips with its Exynos series. Each product range has its advantage, and time must be taken to analyze the best choice possible.

Another term that is sometimes employed is SiP, short for System in Package. SiP often includes several chips in one; combining the processor, random access memory, and flash memory.

If your project requires specific hardware and you cannot find a suitable solution on the market, there is always another solution. FPGA SoCs are chips that have an integrated ARM core and enough logic cells to complete your design. The advantage of this sort of platform is the ability to have as much logic as possible on a single chip and entirely adapted to your solution.

What’s the Difference between Embedded Systems and System Programming?

There is a big difference between the two previous systems. When creating a PC application, it is rarely known on what system it will be used. Perhaps this is a server application, and the client has given all the systems details of its servers, but nothing guarantees that this won’t change in time. Perhaps it is a desktop application for cataloging a film collection. It could be installed on anything, from an entry-level netbook to a high-end system. Or perhaps it is a game, requiring a fast system, but some clients won’t have the required configuration for it to run optimally. What should be considered to be the minimum?

Embedded systems are often different. They are defined beforehand and cannot be changed. Developers usually know exactly what processor will be used, the amount of memory that will be available, and all the external systems that will be connected. There will not be a memory upgrade; there will not be a processor change. Your computer system will probably change over time; a new graphics card might be added, or the processor might be upgraded. On the other hand, your mobile telephone will stay the same until the day you decide to change it. The only option you may have is to put in a bigger flash card, but that will change only the external storage amount and will not change the system itself. Your intelligent water meter will probably never change; it will have to do its job for decades.

An embedded system is designed with a particular use in mind, whereas a system is designed to be flexible and to meet a wide range of end-user needs. When designing a personal computer, it is impossible to know exactly what it will be used for, and therefore expansion possibilities must be designed. System programming is often less rigid, with fewer constraints. Embedded systems are different, since all the constraints are known right from the start.

Why Is Optimization So Important?

One of the main design criterion for an embedded system is its price. You can spend hundreds, if not thousands, of dollars for a computer system that will enable you to do everything you need today, and part of what you need tomorrow, but the tiny embedded computer inside your microwave will often be designed to be as cheap as possible, to the cent. To achieve this, studies are done to estimate the minimum amount of system resources necessary. Typically, you will not need the fastest processor, you will not need the fastest memory, and you will not need the largest amount of memory available.

During a job interview, I asked candidates some trick questions. Imagine that I wanted to design a space vehicle that could get me into orbit and land me safely again. What would you suggest? Most suggested medium- to high-end processors for their power and speed; with a decent amount of system memory. 500 MHz and 512 Mb were a common answer. Then I told them that the only computer system I had available was a 1 mega-hertz processor, with 512KB of memory. Any chance of getting into orbit? Most candidates shifted uneasily in their chairs, one or two laughed. No, it isn’t possible. The system specifications are far too low. The system would be horribly slow, and there just isn’t enough memory to keep all the calculations.

Ironically, the specifications that I gave are higher than the IBM AP-101, the flight computer used by the B-52 bomber and NASA’s space shuttle program. When you hear the word computer today, you immediately imagine a large system, with numerous expansion cards and subsystems. On the contrary, embedded systems should be as simple as possible, including only the hardware required to complete its task, and nothing more. Having a smaller program also means there is less to go wrong. We’ve all had to reboot our work computer because of a problem, but with a flight control computer, this isn’t an option. Put simply, it must work. To do more with less, you have to be careful and optimize.

A processor is all about crunching numbers — anything a processor does, such as reading input, decoding audio, encoding video, and copying memory. Everything is just a lot of numbers. Contrary to what you were told in school, some numbers are faster than others.

Modern CPUs can deal with a lot of different numbers. The most common is the unsigned integer. Integers can also be signed, but that changes the maximum and minimum values.

Other formats exist, like floating point numbers, but while some processors provide acceleration for floating-point numbers, some implement this entirely in software; you must either create your own libraries, or use existing libraries.

In embedded systems, it is vitally important to know exactly what type of number to use and what range you need. When dealing with a system designed to handle monetary transactions, you might be tempted to use floating-point numbers to deal with the decimal point, but this is overkill just to print $12.46. Also, surprisingly, floating-point numbers don’t necessarily have the precision required for reliable monetary transactions. In this case, you might prefer to use integer numbers for their speed and precision, and instead of counting pounds/euros/dollars, you can prefer to count pennies/cents. $12.46 will become 1246, and the name of the variable should reflect that.

This happens often; some sensors will return an integer: a digital representation of the loudest sound recorded from a microphone, or a pressure sensor that will return atmospheric pressure. In both cases, these devices will return data within a certain range, using integers instead of floating-point numbers. If the device has to output a floating-point number (for example, the atmospheric pressure in millibars), the programmer will have to specifically convert the sensor output.

Also, think about the size you need. An unsigned 32-bit integer’s maximum size is 4,294,967,295, but if you are making a vending machine, then $40 million is quite a lot of money for a chocolate bar and a soda. You could put the value into an unsigned 16-bit integer, with a maximum value of 65535, or maybe even an unsigned 8-bit integer uint8 for micro-transactions, with a maximum value of 255. However, this presents another problem: access. ARMs are good at reading in 32-bit values since the ALU is also 32-bits wide; reading 8-bits of data, involves masking and shifting to deal with overflow and maintain the correct sign, which can slow down the calculation. Although perfectly feasible, this comes at a small price. It is up to you what you need and what the technical constraints are.

Doing calculations on integers is extremely fast; in most cases, all operations are done in a single cycle. Integers cannot be used for everything, so for more precise mathematics, floating point numbers were also introduced.


In the 1950s, computers were used widely for banking and statistics. Banking was simply creating a list of transactions: a date and the amount. Memory was extremely expensive at the time, and no one could have imagined a single SD card that could hold more data than any single bank possessed. It might even have contained all the data of a country and still leave some room. Terabytes wasn’t even in the realm of dreams; most companies were using standard IBM punch cards, containing a staggering 80 characters. Having more memory on a card wasn’t an option; the IBM punch card became an industry standard, and at the time there was no need for anything bigger. Memory itself was expensive, and one of the machines used to read them, the IBM 1401, shipped in 1959 with a standard 2 K of RAM. 16 K versions were available, and few systems were upgraded to 32 K. Those that were upgraded were done so by special request only. The low-end 1401 shipped with 1 k of memory.

To maximize memory efficiency, repetitive numbers were omitted. One of the first numbers to disappear was the “19” in every date. Instead of writing 1960, operators would just write 60. Who would have thought that by doing this, they were creating a major international problem 40 years later?

In 1958, Bob Bemer, from IBM, tried to alert some major companies about this programming error and spent 20 years attempting to change the situation. No one listened. In the 1970s, people started talking about a future problem, but we had to wait until the mid-1990s to actually hear about it. Suddenly people started realizing that in a few years’ time, we would be in 2000, not in 1900. Computers, still presuming that the first two figures were 19 would switch back to 1900, or possibly 19100? A general panic ensued, with some people thinking that airplanes would fall out of the sky, that electrical generators would shut down (possibly exploding just before), and that life as we knew it would stop. In the end, nothing happened, apart from some humorous messages on the Internet, with clocks showing “Welcome! We are the 1st January, 1900.” Operating systems managed to cheat a little; computer vendors sold more computers than ever before; and today, in a world in which memory costs a fraction of the cost that it used to, we calculate dates using a different system. That doesn’t mean that we are safe. In 2038 we will be confronted by a different problem with more or less the same origins, but we won’t make the same mistake twice. Systems and programs will be changed long before.

What Is the Advantage of a RISC Architecture?

This is one of the most common debates, and one that has forced major companies in separate directions. What should you use? Reduced Instruction Set Computing (RISC) or Complex Instruction Set Computing (CISC)?

In the 1960s, computers weren’t what they are today. Academics and students rarely approached a computer; at the time it took armies of technicians to keep a computer the size of a room up and running. Academics would hand punch cards to computer operators and wait for the results, sometimes hours or days later. Punch cards could contain up to 80 characters, the equivalent of one line of code. For multiple lines of code, the academic would hand over multiple cards. The system operator would then feed these cards into the computer, wait for the result, and then return the results to the academic. The processor speed wasn’t an important factor; compared to the time it took to collect the cards, feed them into the system, get the results and return them to the programmer, execution time was a mere fraction.

When punch cards were replaced with other means such as floppy disks, more memory, or more hard drives, then the computer spent more time calculating, and suddenly the easiest way to increase the speed of a computer was to increase the speed of the processor or its capacity to execute instructions. Two philosophies were competing: one was to increase the number of instructions by adding more specialized instructions, and the other was to decrease processor complexity, therefore allowing it to run faster.

A common misunderstanding of the term “reduced instruction set computer” is the mistaken idea that the processor has a smaller set of instructions. Today, some RISC processors have larger instruction sets than their CISC counterparts. The term “reduced” was intended to describe the fact that the amount of work any single instruction accomplishes is reduced (typically one data element per cycle), compared to the “complex instructions” of CISC CPUs where instructions may take dozens of data memory cycles to complete a single instruction.

RISC processors will typically have fewer transistors dedicated to core logic, allowing designers more space to increase the size of the register set, and to increase internal parallelism. In 1982, the Berkeley RISC project delivered their first RISC-I processor. At 44,420 transistors (compared to over 100,000 for CISC counterparts) and 32 instructions, it outperformed any single-chip design at the time.

The trend continued, with more and more specialized instructions being added to processors. A single instruction could now handle extremely complex calculations and also specific calculations. In the 1990s, personal computers were used for just about anything. Complete systems were sold with TV acquisition cards, complex audio creation systems, 3-D graphics, and power calculations. A specific system may have targeted either the consumer market, the business market, or the server market, but the CPU inside remained mostly the same. Processors had to adapt to just about any situation, so more and more complex instructions were added. This created a rather interesting case; some systems never used some of the instructions. Adding instructions to a processor means adding transistors, making the processor more complicated, and therefore more expensive. The original Intel 8086 processor was the beginning of the modern PC era; it was released in 1978 and contained 29,000 transistors. Just more than 10 years later, in 1989, Intel released the 80486, with a total of 1,180,000 transistors. In 2000, Intel released the Pentium 4, containing 42,000,000 transistors. In 2011, Intel’s six-core Core i7 processor packed a whopping 2,270,000,000 transistors. However, these figures can be eclipsed by other systems. In 2011, AMD’s Tahiti graphics processor was comp0sed of 4,300,000,000 transistors, and in the same year, Xilinx’s Virtex-7 FPGA contained 6,800,000,000 transistors. Although the cost of fabricating processors has gone down drastically, the time needed to create such chips, the amount of waste created by defective silicon, and the time needed to rigorously test all the processor’s functions means that prices are still high. At the same time, a single Core i7 processor today has more calculation power than most countries in the 1960s.

Enter ARM, with its Reduced Instruction Set Computing (RISC) technology and philosophy. RISC may be seen as a step backward from the days when CISC helped, but the same criteria do not exist today. In the 1960s, DEC sold 12-kilobyte memory modules for $4,600, roughly $35,000 in 2012. With that amount of money, today you could have close to 7 terabytes of memory, if you could find a system that could support that much RAM.

ARM’s philosophy is radically different. ARM believes that having fewer instructions is better. Just like Lego, you can make some amazing things by using the simplest of building blocks. So, instead of highly specialized instructions, RISC systems have few instructions. Reducing the silicon on the chip means lower costs, but especially lower power consumption. So, if the constraints that were present in 1960 are no longer present, why is CISC still used? Well, one of the reasons is backward compatibility. No one expected the PC architecture to expand the way it did, and on today’s high-end Core i7 CPUs, you still have a heritage from the original 8086. It would take far too much engineering to suddenly re-create all the software available for PC computers and suddenly change them to a new architecture. There are, of course, exceptions. Apple Computer Inc. had machines running under the PowerPC architecture and switched to x86. Linux has packages for just about any MMU-enabled chip. The rest of the world will stick to the x86 architecture because it has served it well. The x86 architecture doesn’t face the same challenges as ARM does. For power consumption, you aren’t faced with the same challenges. Today, ultrabooks can boast 8 hours of battery life, which is normally more than enough for most uses. On a 14-hour flight, you probably won’t use a laptop for more than 4 hours and prefer to watch a film and try to catch up on some sleep. On that same flight, there is an Emergency Locator Transmitter (ELT). If something happens to the flight, the ELT can broadcast a distress message containing the coordinates, for several weeks, and it has to work right the first time. Power consumption is critical for this sort of application, and the code goes through rigorous testing.

Today, both RISC and CISC exist, and continue to grow. RISC dominates the embedded field (especially where power consumption is a major factor). CISC continues to dominate the desktop field. However, that trend seems to be changing. Intel is working on an x86 chip for the mobile phone sector, and several OEMs have expressed interest in an ARM-based server.


On embedded systems, it is critically important to know what your processing needs are. On mobile systems, it is just as important but sometimes even more difficult to establish.

For embedded systems, too much processing power is often as bad as too little. If your processor isn’t powerful enough, you can have a hard time getting your software to run. In the best case, you can spend a long time optimizing. In the worst case, it won’t run. Using a faster processor means more power consumption, more heat, and most likely, a more expensive solution.

Choosing a processor for a mobile device is often much harder. Some users are still locked into the “gigahertz syndrome,” wanting the fastest processors, but only judging them on their clock speed. Most consumers probably prefer a 1.6 GHz device over a 1.4 GHz device, even if some of them will never use a program that takes full advantage of the speed difference.

In today’s world, mobile devices are more and more present. How many people can spend a day without a mobile phone, or spend a long-haul flight without a tablet? The advances in CPUs over the last 40 years have been incredible. In 1971, the Intel 4004, the world’s first general-purpose microprocessor, ran at 108 kHz and was estimated at 0.06 MIPS. In 2011, Intel’s i7 3960X had a total of 177,730 MIPS, almost 3 million times that of the 4004. Of course, MIPS alone cannot accurately judge a microprocessor, but it shows just how much the technology has advanced. Unfortunately, battery technology has advanced, but not in line with processors.

The first line of mobiles phones was bulky. The first use of a mobile phone was in 1973, using a prototype that weighed more than a kilogram, which had only 30 minutes of talk time, after which a recharge of 10 hours was required. Although sufficient for that time (where most “mobile” phones were in fact car phones), it didn’t take long for users to need much better battery life and lighter batteries. Today, a consumer judges a mobile device by lots of criteria, including battery life. Few people want to buy a high-end tablet with a high-end CPU, lots of RAM, and a terabyte of storage if it lasts only 1 hour. CPUs, being the heart of modern systems, have also made great progress in power consumption. An Intel Pentium at 75 MHz consumes 8 Watts of power, about the same amount as an Intel Atom N550, a dual-core processor clocked at 2GHz. Of course, the Atom has far less processor power than an i7, but the processor was designed specifically for low-powered devices and has made it into an entire generation of netbooks. To achieve low power consumption, Intel invested heavily, reducing the thickness of the silicon wafers for all of its microprocessors, and also by changing the core design. An Atom is still compatible with previous x86 processors, so there is no software change needed. However, the core was heavily changed. Atom processors, like many other x86 processors, actually convert x86 instructions into micro-ops, effectively RISC-style instructions.

ARM processors, from the start, were designed to be simple. They were made to be simple, and the number of transistors in a single CPU has always been significantly lower than other comparable CPUs on the market. Fewer transistors means less power. They were also designed with mobility in mind, and that paid off because ARM6 was used on the Apple Newton Messagepad. Over the years, ARM has made improvements to maximize MIPS per watt and also to lower heat production.

There are several ways to reduce power on a system. One of the most used is frequency scaling; when a processor is not used at 100 percent, it can scale back its frequency and therefore use less energy. On the same lines, some device makers underclock their processors, setting their frequency lower to what they are supposed to run on. ARM also has a rather unique solution, using its big.LITTLE technology. This solution contains two separate processors, both binary-compatible. One core is power-efficient but slower than the second processor, which is designed to handle more complex and demanding programs at the cost of increased power usage.


To begin development on an ARM system, you need relatively few things, all of which are readily available.

From a hardware point of view, there are a few questions that need to be asked. Will you need specific hardware, or does a system already exist? There are several all-in-one ARM systems on the market, ranging from tiny systems running at just a few dozen megahertz to large systems that can run a full OS running at more than a gigahertz. If none of these fit your requirements, or if your production is large enough, you can create your own board with a processor or a SoC of your choice.

Evaluation boards are a great way to start a project, and there are hundreds available for just about any size or requirement. ARM provides several boards suited for several types of applications. Starter boards are a great way to get to know a system and to prototype your project. They offer great debugging features, and a huge amount of documentation is available. When you are ready, you can make your own system or look for an existing system.

ARM provides several evaluation boards. The ARM Versatile Express boards provide excellent training for the Cortex A cores or soft-core Cortex-M. The Versatile range was previously used for Classic processors — from the ARM7TDMI processor up to the ARM1176. The KEIL series are more oriented toward the micro-controller domain and also includes some classic cores.

As well as ARM-based evaluation boards, almost all chip makers have their own evaluation boards. Freescale has some excellent evaluation boards for its iMX line of SoCs, complete with just about every type of connector you can think of. Infineon has a range of clever modular systems, and Texas Instruments has some small-factor systems.

Evaluation systems are useful, but they will not be used for long. They have numerous outputs that will not be used later and tend to take up more space than is required. When you finish evaluating a board, you now need to decide if you want to create your own board, or if you will use a pre-existing system. There are multiple ARM-based systems that are not evaluation boards. Moxa creates an impressive amount of industrial systems, and one of my previous clients had a stock of Moxa 7420 systems and builds its software around those systems. The Moxa 7420 is an XScale board running at 533 MHz and has eight serial ports, two Ethernet ports, USB connectivity and CompactFlash storage, along with 128 Mb of RAM and 32 Mb of ROM. With this impressive system, the client reacted quickly to market needs by developing software and a hardware solution for industrial systems on a platform that it knew well.

As explained previously, you need to think carefully about your project and know in advance what is required. Do you need a video controller? How about a SATA controller? An industrial system might not need either but would have a more specific requirement, for example, a CAN bus.

What Boards Are Available?

Probably the most important part of a project is the board. There are several ways to go depending on your project or requirements. Evaluation boards are an excellent way to start if you decide to use one particular processor, but there are some good general purpose boards available. Following is a list of just a few boards available.

Although it is impossible to list them all, there are a number of places to look for such information. ARM has an impressive line of evaluation boards, and more information can be found on their website here:

Keil, ARM’s tool company, also makes numerous development boards that fit in well with its line of development tools, found here:

Arduino Due

Arduino boards are mostly known for their 8-bit single-board computers and are an excellent way to get into the fields of electronics and embedded development. The Arduino family comes with a complete range of shields, ranging from I/O ports to SD-card storage. Arduinos have been used for a huge amount of projects, from aquarium control to robotics to automated lawnmowers.

Although an entire generation of Arduinos has been using 8-bit PIC microcontrollers, the Arduino Due uses a Cortex-M microcontroller. These boards are not designed for processing power, even if the ARM CPU is clocked at 84 MHz, but they are designed for electronic projects based on I/O. They have 54 digital I/O ports, which can each be programmed as input or output. They have 12 analog inputs and 2 analog outputs.

You can find Arduinos in a lot of projects based on robotics or sensors, simply because the processor is so heavily based on I/O. It is also hugely popular because it is based on a Cortex-M and is therefore energy-efficient. It is not uncommon to see an Arduino Due run on battery power, and it has been used on mobile robotics, or even autopilot systems for remote controlled planes.

Raspberry Pi

In 2012, the Raspberry Pi Foundation released the Raspberry Pi, a credit card-sized single-board computer designed for teaching computer science. Two versions currently exist; Model-A and Model-B, which could possibly be a cultural reference to the BBC Micro, the computer ARM originally designed for school computer science. Thirty years later, ARM-based systems were back in British schools, and indeed schools around the world, teaching children the basics of computer science.

The revolution didn’t stop there. Raspberry Pis were such a success that it was difficult to get hold of them in the beginning. People were buying them as a general-purpose computer, for tinkering, for use as DIY NAS boxes, or just about anything. They have been used for home automation by using an I2C bus, for home media players using the hugely popular XBMC, or for play. Mojang ported the hugely popular game Minecraft to the Raspberry Pi.

Raspberry Pi is a basic computer with all the functionality required for basic systems. It has either 256 Mb or 512 Mb of RAM, one or two USB ports, video output via RCA, HDMI or DSI, audio output, 10/100 Ethernet for the Model B, and an SD slot for the filesystem. There is no on-board storage; the operating system has to be placed onto an SD card. The system is based on an ARM1176JZF-S running at 700MHz with overclocking possibilities.

Raspberry Pi lacks the I/O possibilities of Arduino, but that was never its intention. Raspberry Pi has enough hardware to boot a Linux system and teach computer science. However, that does not mean that it has no I/O; it does have two I/O ports with GPIO lines. However, of all the GPIO lines available on the processor, one-third are not connected, and some are reserved for the SD card reader and an SPI port. There are numerous projects using the IO lines, but most of these are educational boards mainly focused on turning LEDs on and off.


The Beagleboard is a large system compared to the Raspberry Pi and the Arduino. It is based on a Cortex-A8 and has more system functionality than the Raspberry Pi. It has complete video out capabilities, four USB ports, Ethernet, Micro-SD, a camera port, and an expansion port. Although mainly designed for software projects, it does have some I/O capability. The Beagleboard is the computer of the ARM-based development boards. If you are looking for raw processing power above all, this is the system to use.


The Beaglebone is a light version of the Beagleboard. It still has the crunching power and speed of a Beagleboard but is slightly more I/O-oriented. It has more output pins and lacks an HDMI connector.

Just like the Arduino, the Beaglebone has capes — add-on cards that extend I/O capabilities or add system functionality. There are LCD touch-screen capes, battery capes, and wireless and extended I/O capes. There are also breakout boards, which enable you to create your own circuits.

What Operating Systems Exist?

If a CPU is the heart of a hardware embedded system, an operating system is the brain. An operating system enables you to concentrate on your program by abstracting the hardware details. You can concentrate on building your application while letting the operating system handle the hardware technicalities. Memory configuration, networking, and peripheral I/O can be handled directly by the operating system. Multiple choices exist, each with their strong points.


Linux has been ported to just about any MMU-enabled processor that exists and has been used on ARM systems for decades. Linux has a huge user base, and the possibility of compiling a home-kernel is a major advantage for embedded systems. You can leave out large sections of the kernel for hardware that you do not need and leave in only the strict minimum. When adding new hardware, there are lots of resources necessary for adding drivers, and it is possible that in the open-source community someone has already developed such a driver.


VxWorks is a real-time operating system designed and developed by Wind River. It has a multitasking kernel and has multiprocessor support. It is used in a lot of mission-critical systems, where reliability is premium. It powers the on-board computers for the Airbus A400 and powers the radar warning system for the F/A-18 Hornet. VxWorks power numerous space projects. One of the most famous uses is in the Mars Curiosity rover, where VxWorks was considered to be the only system reliable enough to be placed onto a rover that was sent 350 million miles away, in an environment where nobody can ever perform a hardware update.


Android is a Linux-based operating system, designed initially by Android Inc. and later bought by Google. Most Android development is done for a Java runtime environment, but the Android operating system is open source and freely available. However, before the JVM is launched, there are multiple applications written in C, and of course the Linux kernel, with lots of work needed on low-level systems.


Apple fans will be slightly disappointed. iOS does run on Apple-specific, ARM-powered processors with Apple extensions; however, the operating system is proprietary, meaning that you cannot have access to boot-time code. Apple iOS applications are written in Objective-C; however, you can write in assembly in iOS applications, either by writing inline code, or even by adding an assembly S file. This allows for some highly optimized applications.

Which Compiler Is Best Suited to My Purpose?

Again, several solutions exist. The GNU compiler does an excellent job and is readily available on just about every platform. ARM also has a compiler. Although not free, it has the advantage of having all ARM’s knowledge in a single executable file and can heavily optimize your project.

GNU Compiler Collection

The GNU compiler is the entry level C/C++ compiler. GCC was originally a C compiler, named the GNU C Compiler and was released in 1987. Since then, it has been renamed the GNU Compiler Collection because it now supports many more languages, including different forms of C (C++, Objective-C, and Objective-C++) as well as other languages (Fortran, Ada, Go, and so on). Today, GCC has been released for Linux, Windows, MacOS, and even RiscOS. Some companies consider GCC to be essential to the success of their platform.

GCC naturally supports ARM architectures and has support for different ARM processors and architectures. Instead of compiling for generic ARM, it can be fine-tuned for different architectures, either using or omitting technology as needed. It has full support for the entire ARM Classic collection, the Cortex series, and even the more recent 64-bit Cortex-A53. If you are curious, you can write a small program and see how it would be compiled for an ARM2.

Sourcery CodeBench

Sourcery CodeBench is a complete environment with an IDE, debugger, libraries, and support. The Lite Edition, however, has only the command-line tools, namely a custom version of the GCC. Changes are first made in Sourcery CodeBench before being returned to GCC.

Sourcery CodeBench Lite is a complete toolchain, including the compiler, linker, debugger, and just about any tool required to compile C, C++, and assembly.

ARM Compiler

Nobody knows ARM processors better than ARM and it has also released the ARM Compiler, the result of more than 20 years of knowledge. This compiler is designed specifically for ARM processors and includes some of the most advanced optimization techniques available. Although this is a professional solution that requires a license, it has exceptional optimization techniques that far surpass GCC.

Getting Ready for Debugging

Debug solutions exist for ARM cores or for ARM-based chips. One of the references in the ARM world is Lauterbach, with its Trace32 solution. Different modules exist enabling assembly-level debugging and displaying internal and external peripherals, hardware breakpoints, and trace solutions. It is possible to debug with only a serial line, and in some cases that is exactly what you have to do. However, more professional solutions can leverage the work load considerably by increasing bandwidth and adding advanced trace functionality.

Lauterbach Trace32

Lauterbach produces some extremely good hardware debuggers, notably the PowerDebug and PowerTrace series. These devices can effectively debug in assembly or higher languages such as C and C++. The interface has excellent support for watching variables and displaying memory contents and traces. It has full MMU support, enabling the full display of entries and registers. All the CPU registers and attached peripherals can be controlled directly. Some of the biggest names in the embedded field use Lauterbach devices in their engineering departments.

Are There Any Complete Development Environments?

Yes, there are complete solutions that exist, including an IDE, compiler toolchain, and debug tools, all in one package.


ARM has its own complete development environment: the ARM Development Studio. At the time of writing, the current version is DS-5.

The DS-5 environment is a professional solution with the Eclipse IDE as a central point and excellent debugging tools, all coupled with the ARM compiler. When coupled with a DSTREAM hardware debugger, it becomes a capable solution with a huge trace buffer that enables long-time traces to be run, even on fast targets. If hardware is not yet available, software emulation is possible with a specialized emulator.

The ARM DS-5 solution also has an option for power monitoring with a device that can read voltage and current, and link capture data with other captures so that you can know when and why a device changes its power settings, and know what portion of code uses the most battery.

The DS-5 solution is a professional solution aimed at engineering teams that work on bleeding-edge applications that must have the upper-hand in code efficiency. Solutions of this quality are not cheap, but ARM has managed to make the price extremely reasonable, and everything you need is contained in a single package.

You can find more information at

ARM DS-5 Community Edition

Compared to the professional DS-5 solution, this version is also managed by ARM but has some limitations and does not come with the ARM compiler. It does, however, come with function profiling, process tracing, and a limited set of performance counters. Debugging is done with software because there is no hardware debugger included, but GDB is a powerful tool, and the DS-5 CE completes that beautifully.

DS-5 CE is maintained by ARM and has the same quality as the professional build, but with open source tools and limited functionality. It is a great platform to begin working on to get used to the DS-5 environment The ARM forums are the place to look for information and to ask questions.

Is There Anything Else I Need to Know?

Some systems, such as the Raspberry Pi, come with everything needed to get a working ARM system up and running in seconds. Other systems, such as evaluation boards, may require more specialized electronics but are normally acquired by laboratories containing specialized equipment.

There are two devices that are more or less considered to be essential: a serial port and a digital voltmeter.

Modern systems can communicate either by Ethernet or USB, but embedded systems almost always have a serial port. Using a serial port is much easier than using any other device such as I2C or CAN. With a serial device, you put bytes into a specific register, and that is just about it. For this reason, and because there is little software needed to make it function, almost all embedded devices have a serial terminal.

A digital voltmeter is often useful for verification reasons to verify if an output is set to the right level or if an input is set to logic level 1. It is useful for debugging but not for analysis; other devices exist for that.

Depending on your budget, an oscilloscope can be extremely valuable to a project. Looking at a signal and not just reading the output from a voltmeter enables some advanced debugging. A power supply can also be useful, but because most modern boards use a USB power supply, it is only useful when adding electronics on a breadboard or for homemade designs.


In this chapter, I have presented embedded systems and shown how ARM processors can be suited for a wide range of projects, from the smallest project to the most powerful project. In an example project, I demonstrated how important it is to plan ahead and to select the right processor. I presented some of the tools required for an embedded project, such as the operating system, compiler, and debugger, and also showed just some of the ARM-powered boards that are available on the market.

In the next chapter, I will talk more about an ARM processor — the internal systems that must be understood, the different operation modes, memory management, and the start-up sequence of an ARM processor.