Introduction - Practical Reverse Engineering: x86, x64, ARM, Windows Kernel, Reversing Tools, and Obfuscation (2014)

Practical Reverse Engineering: x86, x64, ARM, Windows Kernel, Reversing Tools, and Obfuscation (2014)

Introduction

The reverse engineering learning process is similar to that of foreign language acquisition for adults. The first phase of learning a foreign language begins with an introduction to letters in the alphabet, which are used to construct words with well-defined semantics. The next phase involves understanding the grammatical rules governing how words are glued together to produce a proper sentence. After being accustomed to these rules, one then learns how to stitch multiple sentences together to articulate complex thoughts. Eventually it reaches the point where the learner can read large books written in different styles and still understand the thoughts therein. At this point, one can read reference books on the more esoteric aspects of the language—historical syntax, phonology, and so on.

In reverse engineering, the language is the architecture and assembly language. A word is an assembly instruction. Paragraphs are sequences of assembly instructions. A book is a program. However, to fully understand a book, the reader needs to know more than just vocabulary and grammar. These additional elements include structure and style of prose, unwritten rules of writing, and others. Understanding computer programs also requires a mastery of concepts beyond assembly instructions.

It can be somewhat intimidating to start learning an entirely new technical subject from a book. However, we would be misleading you if we were to claim that reverse engineering is a simple learning endeavor and that it can be completely mastered by reading this book. The learning process is quite involved because it requires knowledge from several disparate domains of knowledge. For example, an effective reverse engineer needs to be knowledgeable in computer architecture, systems programming, operating systems, compilers, and so on; for certain areas, a strong mathematical background is necessary. So how do you know where to start? The answer depends on your experience and skills. Because we cannot accommodate everyone's background, this introduction outlines the learning and reading methods for those without any programming background. You should find your “position” in the spectrum and start from there.

For the sake of discussion, we loosely define reverse engineering as the process of understanding a system. It is a problem-solving process. A system can be a hardware device, a software program, a physical or chemical process, and so on. For the purposes of the book, the system is a software program. To understand a program, you must first understand how software is written. Hence, the first requirement is knowing how to program a computer through a language such as C, C++, Java, and others. We suggest first learning C due to its simplicity, effectiveness, and ubiquity. Some excellent references to consider are The C Programming Language, by Brian Kernighan and Dennis Ritchie (Prentice Hall, 1988) and C: A Reference Manual, by Samuel Harbison (Prentice Hall, 2002). After becoming comfortable with writing, compiling, and debugging basic programs, consider reading Expert C Programming: Deep C Secrets, by Peter van der Linden (Prentice Hall, 1994). At this point, you should be familiar with high-level concepts such as variables, scopes, functions, pointers, conditionals, loops, call stacks, and libraries. Knowledge of data structures such as stacks, queues, linked lists, and trees might be useful, but they are not entirely necessary for now. To top it off, you might skim through Compilers: Principles, Techniques, and Tools, by Alfred Aho, Ravi Sethi, and Jeffrey Ullman, (Prentice Hall, 1994) and Linkers and Loaders, by John Levine (Morgan Kaufmann, 1999), to get a better understanding of how a program is really put together. The key purpose of reading these books is to gain exposure to basic concepts; you do not have to understand every page for now (there will be time for that later). Overachievers should consider Advanced Compiler Design and Implementation, by Steven Muchnick (Morgan Kaufmann, 1997).

Once you have a good understanding of how programs are generally written, executed, and debugged, you should begin to explore the program's execution environment, which includes the processor and operating system. We suggest first learning about the Intel processor by skimming throughIntel 64 and IA-32 Architectures Software Developer's Manual, Volume 1: Basic Architecture by Intel, with special attention to Chapters 2–7. These chapters explain the basic elements of a modern computer. Readers interested in ARM should consider Cortex-A Series Programmer's Guide andARM Architecture Reference Manual ARMv7-A and ARMv7-R Edition by ARM. While our book covers x86/x64/ARM, we do not discuss every architectural detail. (We assume that the reader will refer to these manuals, as necessary.) In skimming through these manuals, you should have a basic appreciation of the technical building blocks of a computing system. For a more conceptual understanding, consider Structured Computer Organization by Andrew Tanenbaum (Prentice Hall, 1998). All readers should also consult the Microsoft PE and COFF Specification. At this point, you will have all the necessary background to read and understand Chapter 1, “x86 and x64,” and Chapter 2, “ARM.”

Next, you should explore the operating system. There are many different operating systems, but they share many common concepts including processes, threads, virtual memory, privilege separation, multi-tasking, and so on. The best way to understand these concepts is to read Modern Operating Systems, by Andrew Tanenbaum (Prentice Hall, 2005). Although Tanenbaum's text is excellent for concepts, it does not discuss important technical details for real-life operating systems. For Windows, you should consider skimming through Windows NT Device Driver Development, by Peter Viscarola and Anthony Mason (New Riders Press, 1998); although it is a book on driver development, the background chapters provide an excellent and concrete introduction to Windows. (It is also excellent supplementary material for the Windows kernel chapter in this book.) For additional inspiration (and an excellent treatment of the Windows memory manager), you should also read What Makes It Page? The Windows 7 (x64) Virtual Memory Manager, by Enrico Martignetti (CreateSpace Independent Publishing Platform, 2012).

At this point, you would have all the necessary background to read and understand Chapter 3 “The Windows Kernel.” You should also consider learning Win32 programming. Windows System Programming, by Johnson Hart (Addison-Wesley Professional, 2010), and Windows via C/C++, by Jeffrey Richter and Christophe Nasarre (Microsoft Press, 2007), are excellent references.

For Chapter 4, “Debugging and Automation,” consider Inside Windows Debugging: A Practical Guide to Debugging and Tracing Strategies in Windows, by Tarik Soulami (Microsoft Press, 2012), and Advanced Windows Debugging, by Mario Hewardt and Daniel Pravat (Addison-Wesley Professional, 2007).

Chapter 5, “Obfuscation,” requires a good understanding of assembly language and should be read after the x86/x64/ARM chapters. For background knowledge, consider Surreptitious Software: Obfuscation, Watermarking, and Tamperproofing for Software Protection, by Christian Collberg and Jasvir Nagra (Addison-Wesley Professional, 2009).

Note

This book includes exercises and walk-throughs with real, malicious viruses and rootkits. We intentionally did this to ensure that readers can immediately apply their newly learned skills. The malware samples are referenced in alphabetical order (Sample A, B, C, …), and you can find the corresponding SHA1 hashes in the Appendix. Because there may be legal concerns about distributing such samples with the book, we decided not to do so; however, you can download these samples by searching various malware repositories, such as www.malware.lu, or request them from the forums at http://kernelmode.info. Many of the samples are from famous hacking incidents that made worldwide news, so they should be interesting. Perhaps some enthusiastic readers will gather all the samples in a package and share them on BitTorrent. If none of those options work for you, please feel free to email the authors. Make sure that you analyze these in a safe environment to prevent accidental self-infection.

In addition, to familiarize you with Metasm, we've prepared two exercise scripts: symbolic-execution-lvl1.rb and symbolic-execution-lvl2.rb. Answering the questions will lead you to a journey in Metasm internals. You can find the scripts atwww.wiley.com/go/practicalreverseengineering.

It is important to realize that the exercises are a vital component of the book. The book was intentionally written in this way. If you simply read the book without doing the exercises, you will not understand or retain much. You should feel free to blog or write about your answers so that others can learn from them; you can post them on the Reverse Engineering reddit (www.reddit.com/r/ReverseEngineering) and get feedback from the community (and maybe the authors). If you successfully complete all of the exercises, pat yourself on the back and then send Bruce your resume.

The journey of becoming an effective reverse engineer is long and time consuming, requiring patience and endurance. You may fail many times along the way (by not understanding concepts or by failing to complete exercises in this book), but don't give up. Remember: Failing is part of success. With this guidance and the subsequent chapters, you should be well prepared for the learning journey.