Praise for Gray Hat Hacking: The Ethical Hacker’s Handbook, Fourth Edition (2015)
PART III. Advanced Malware Analysis
Chapter 20Dissecting Android Malware
Chapter 21Dissecting Ransomware
Chapter 22Analyzing 64-bit Malware
Chapter 23Next-Generation Reverse Engineering
CHAPTER 20. Dissecting Android Malware
Android is one of today’s most prevalent smartphone platforms. Smartphone devices replace the traditional “mobile phones” as a pocket-sized personal computer and multimedia device, all in one. These personal devices provide a window into the owner’s life. A calendar containing the user’s daily schedule, a phonebook with a list of contacts, social media accounts, and banking applications are only a small subset of all the information that can be found on a typical smartphone. Malware authors have already tapped into this rich platform and are exploiting it in various ways. Understanding the Android architecture and application analysis techniques empowers users to determine whether applications accessing their personal data are doing it in a nonmalicious way.
This chapter provides analysis techniques and tools that can be used to determine the functionality and potential maliciousness of Android applications.
In this chapter, we cover the following topics:
• How the Android platform works
• Static and dynamic analysis with a focus on malicious software analysis
The Android Platform
Before we start with malware analysis, it is necessary to get familiar with the Android platform. Probably the most interesting information from an analysis point of view involves how applications work and are executed. The following sections explain the Android application package (APK), important configuration files such as AndroidManifest, and the executable file format DEX running on a Dalvik virtual machine.
Android Application Package
The Android application package (APK) is an archive format used to distribute applications for the Android operating system. The APK archive contains all the files needed by the application and is a convenient way to handle and transfer applications as a single file. The archiving file format is the widely popular ZIP file format. This makes it very similar to the Java archive (JAR), which also uses ZIP.
Because APK files are just ZIP archives with a different file extension, there is no way to differentiate them from other ZIP archives. Magic bytes is a name for a sequence of bytes (usually at the beginning of file) that can be used to identify a specific file format. The Linux file command can be used to determine the file type. Following is the output of the file command for an APK:
As expected, the file type is reported as a ZIP archive. The following output shows the magic bytes of the ZIP file format:
The first two bytes are the printable characters PK, which represent the initials of the ZIP file format’s inventor Phil Katz, followed by an additional two bytes: 03 04. To examine the content of an APK archive, simply un-ZIP it with any of the tools supporting the format. Following is an example of unzipping the content of an APK archive:
Here, a generic structure of a somewhat minimalistic APK archive is shown. Depending on the APK type and content, it can contain various files and resources, but a single APK can only be up to a maximum of 50MB.
NOTE An APK archive can have a maximum size of 50MB, but it can have up to two additional expansion files, with each of them up to 2GB in size. These additional files can also be hosted on the Android Market. The size of expansion files is added to the size of the APK, so the size of application on the market will be the total of the APK and the expansion files.
Following is an overview of the APK directory structure and common files:
• AndroidManifest.xml This file is present in the root directory of every APK. It contains the necessary application information for it to run on the Android system. More information about this file is provided in the upcoming section.
• META-INF This directory contains several files that are related to the APK metadata such as certificates or manifest files.
• CERT.RSA The certificate file of the application. In this case, this is an RSA certificate, but it can be any of the supported certificate algorithms (for example, DSA or EC).
• CERT.SF Contains the list entries in the MANIFEST.MF file, along with hashes of the respective lines in it. CERT.SF is then signed and can be used to validate all entries in the MANIFEST.MF file using transitive relation. The following command can be used to check the entries in the manifest file:
jarsigner -verbose -verify -certs apk_name.apk
• MANIFEST.MF Contains a list of filenames for all the files that should be signed, along with hashes of their content. All entries in this file should be hashed in CERT.SF, which can then be used to determine the validity of the files in the APK.
• classes.dex This Dalvik executable (DEX) file contains the program code to be executed by the Dalvik virtual machine on the Android operating system.
• res This folder contains raw or compiled resource files such as images, layouts, strings, and more.
• resources.arsc This file contains only precompiled resources such as XML files.
The Android application manifest file AndroidManifest.xml is located in the root directory of every Android application. This file contains essential information about the application and its components, required permissions, used libraries, Java packages, and more. The AndroidManifest.xml file is stored in a binary XML format in the APK and therefore has to be converted to textual representation before it can be analyzed. Many tools are available that can convert from binary XML format, and in this section we will use apktool. This is a collection of tools and libraries that can be used to decode manifest files, resources, decompile DEX files to smali, and so on. To decode the APK, execute apktool with d option, as shown here:
After apktool extracts and decodes all the files, the manifest can be examined in any text editor. An example of the AndroidManifest.xml file is shown here:
Here are the important fields in the manifest file when reverse engineering Android malware:
• The manifest element defines the package element, which is a Java package name for the application. The package name is used as a unique identifier and should be based on the author’s Internet domain ownership of the package name. The domain is reversed as shown at line , which when flipped resolves to androidapplication1.me.org.
• The application element contains the declaration of the application, while its subelements declare the application’s components.
• The activity element defines the visual representation of the application that will be shown to the users. The label “Movie Player” under the android:label attribute defines the string that is displayed to the user when the activity is triggered (for example, the UI shown to the users). Another important attribute is android:name, which defines the name of the class implementing the activity.
• The intent-filter element, along with the elements action and category , describe the intent. The action element defines the main entry to the application using the following action name: android.intent.action.MAIN. A category element classifies this intent and indicates that it should be listed in the application launcher using the following name: android.intent.category.LAUNCHER. A single activity element can have one or more intent-filters that describe its functionality.
• The uses-permission element is relevant when looking for suspicious applications. One or more of these elements define all the permissions that the application needs to function correctly. When you install and grant the application these rights, it can use them as it pleases. Theandroid:name attribute defines the specific permission the application is requesting. In this case, the application (which describes itself as a movie player) requires android.permission.SEND_SMS, which would allow it to send SMS messages with the desired content to arbitrary numbers. This clearly raises suspicion as to the legitimacy of this application and requires further investigation.
NOTE This example contains just a small subset of the possible manifest elements and attributes. When analyzing a complex manifest file, consult the Android Developer Reference to fully understand the different elements and attributes.
The Dalvik executable (DEX) format contains the byte code that is executed by the Android Dalvik virtual machine. DEX byte code is a close relative of the Java byte code that makes up class files. The Dalvik VM has a register-based architecture, whereas Java has a stack-based one. The instructions used in disassembly are fairly similar, and someone familiar with Java instructions wouldn’t need much time to get used to the Dalvik. One evident difference with disassembling Dalvik and Java is their dominant usage of registers instead of a stack. Dalvik VM instructions operate on 32-bit registers, which means that registers provide data to an instruction that operates on them. Each method has to define the number of registers it uses. That number also includes registers that are allocated for argument passing and return values. In a Java VM, instructions take their arguments from the stack and push the results back to the stack. To illustrate this difference, the following listing shows a Dalvik disassembly of the start of a function in IDA:
The lines labeled , , and are part of the function definition, which shows the number of registers used by the method and their allocation between input arguments and output return values. The instructions at , , , , and use two registers: v2 and v3. Registers in Dalvik use character prefix “v,” followed by a register number. The prefix is used to denote these registers as “virtual” and distinguish them from the physical hardware CPU registers. Now, here’s the same function disassembly using Java byte code:
As you can see, there are no referenced registers; instead, all operations are done over the stack. Examples of instructions that operate using a stack can be found at , , , , and . For example, the dup instruction will duplicate the value on top of the stack so that there are two such values at the top of the stack.
Because DEX and Java class files are related, it is possible to go from one format to the other. Because Java has a longer history and a lot of tools have been developed for analysis, disassembling, and especially decompilation, it is useful to know how to translate from DEX to JAR. The Dex2jar project is a collection of several programs that work with DEX files. The most interesting of them is dex2jar, which can convert DEX files to Java byte code. The following listing shows how to run the dex2jar command and convert from DEX to JAR, which was used in the previous example when comparing the two disassembler outputs with IDA:
Most people find it much easier to read high-level code like Java instead of JVM disassembly. Because JVM is fairly simple, the decompilation process is doable and can recover Java source code from class files. Dex2jar brings all the Java decompiler tools to the Android world and allows for easy decompilation of Android application written in Java.
Many Java decompilers are available online, but most of them are outdated and no longer maintained. JD decompiler is probably the most popular and well-known decompiler. It also supports three different GUI applications for viewing source code: JD-GUI, JD-Eclipse, and JD-IntelliJ. JD-GUI is a custom GUI for quick analysis of source code without installing big Java editors. JD-GUI is available for the Windows, OS X, and Linux operating systems.
To decompile a DEX file, you first have to convert it to a JAR file using dex2jar and then open it with JD-GUI. The following shows how to use dex2jar:
To see the source code in JD-GUI, open the file classes-dex2jar.jar. Figure 20-1 shows JD-GUI with decompiled Java source code. It is possible to export all decompiled class files from JD-GUI using the File | Save All Sources option.
Figure 20-1 JD-GUI decompiled Java source code
One problem with decompilers is that they are very sensitive to byte code modification, which can prevent them from recovering any sensible source code. Another problem with decompilers is that they don’t offer a side-by-side comparison with disassembly, and wrong decompilation can cause functionality to be missing from the output. When dealing with malicious code, it is always recommended that you double-check the disassembly for any suspicious code and functionality that might have been hidden from the decompiler. In cases when JD cannot determine the decompilation code, it will output the disassembly of a class file. The following is JD output for a non-decompiled function:
The problem with the previously mentioned DEX decompilation is that the file first has to be converted to JAR format and then decompiled using Java tools. In such a scenario, there are two locations for failure: the conversion of DEX and the decompilation of JAR. The JEB decompiler aims to solve this problem by performing decompilation directly on DEX files. It comes with a handy GUI that’s very similar to IDA, making it a familiar user experience. Unlike the JD decompiler, JEB is a commercial product, and a single license costs US$1,000. Following is some of the functionality offered by JEB:
• Direct decompilation of Dalvik byte code.
• Interactive analysis GUI with capabilities for cross-referencing and renaming methods, fields, classes, and packages.
• Exploring full APK, including manifest file, resources, certificates, strings, and so on.
• Supports saving the modifications made during analysis to disk and sharing the file for collaboration.
• Support for Windows, Linux, and Mac OS.
Figure 20-2 shows a decompiled DEX file using JEB. The same DEX file was used to generate decompiled Java code with the JD in the previous section.
Figure 20-2 DEX decompilation with JEB
Overall, JEB is the only commercial software aimed at reverse engineers that provides capabilities for analyzing DEX files directly. With the look and feel of IDA, it will certainly appeal to those familiar with it.
Another native DEX decompiler is DAD, which is part of the open source Androguard project. This project contains everything needed to analyze Android applications and also has many interesting scripts aimed at malware analysis. You can use the DAD decompiler by simply invoking the androdd.py script, as shown here:
DAD doesn’t come with a GUI for reading decompiled source, but any text or Java editor such as IntelliJ or NetBeans is probably better for analyzing source code anyway. Decompiled code is stored in the specified directory dad_java, and can be opened with any text editor. The following shows a part of the decompiled MoviePlayer.java:
When everything else fails, there is always a disassembler waiting. Reading disassembly output might not be the most appealing task, but it is a very useful skill to acquire. When you’re analyzing complex or obfuscated malware, disassembling the code is the only reliable way to understand the functionality and devise a scheme for de-obfuscation. Baksmali and smali are the disassembler and assembler, respectively, for the Dalvik byte code. The assembling functionality is a very interesting benefit because it allows for modifications and code transformations on the assembly level without patching and fiddling with the bytes. The syntax for disassembling a DEX file with baksmali is very straightforward and can be seen in the following listing:
As shown, the output of the baksmali command are files named after their respective Java class names with the .smali file extension. Smali files can be examined with any text editor. The following listing shows a snippet of the MoviePlayer.smali file:
To make reading smali files more enjoyable, there are many syntax highlighters for various editors such as VIM, Sublime, and Notepad++. Links to plug-ins for various editors can be found in the “For Further Reading” section.
Another way to generate baksmali disassembly directly from APK involves using apktool. It is a convenient wrapper for decoding all binary XML files, including Android manifests and resources, but also disassembling the DEX file with baksmali. Just by running apktool, you can decompose the APK file and make it ready for inspection, as shown in the following listing:
Example 20-1: Running APK in Emulator
NOTE This exercise is provided as an example rather than as a lab due to the fact that in order to perform the exercise, malicious code is needed.
When you’re analyzing applications, it is valuable to see them running on the phone as well as to check how they behave and what functionality they implement. A safe way to run untrusted applications on an Android phone is to use an emulator. The Android SDK includes an emulator and various versions of operating systems that run on many different device types and sizes. Virtual machines are managed using the Android Virtual Device (AVD) Manager. The AVD Manager is used to create and configure various options and settings for the virtual devices. The AVD Manager GUI can be started using the android command and passing it avd as a parameter:
$ ~/android/adt-bundle-linux-x86_64-20140321/sdk/tools/android avd
After the Android Virtual Device Manager starts, click the New button on the right side of the menu and create the new device, as shown in the Figure 20-3.
Figure 20-3 New AVD configuration
The next step is to start the previously created AVD by running the following command:
APK packages can be installed on the running emulator using the adb command, as shown in the following listing:
After installation, the application can be found in the application listing on the device running in the emulator. Figure 20-4 shows the application listing and the installed application Movie Player among the applications. Information about the installed application, its permissions, memory usage, and other details are available in the application menu under Settings | Apps | org.me.androidapplication1.
Figure 20-4 Installed application listing
Dynamic analysis is a very important reverse-engineering technique. The ability to run and observe the application in action can give important hints about functionality and potential malicious activities. The Android emulator comes with a variety of Android operating system versions and can be used to test vulnerability and malware impact across the Android ecosystem.
This section outlines an Android malware analysis workflow and introduces the tools needed for the analysis. Reverse engineering and malware analysis on Android follows the same principles and techniques as analysis on the Windows, Linux, or Mac. There are still some Android architecture–specific details that can give important hints when looking at malicious samples.
For malware analysis, there are usually two different tasks:
1. Determine whether the sample is malicious.
2. Determine the malicious functionality of the sample.
It is usually much easier to determine whether or not something is malicious (or suspicious) instead of understanding the malicious functionality. To answer the maliciousness question, you can use the following checklist:
• Is the application popular and used by many people or installed on a large number of machines? The more popular the application, the less likely it contains something very bad. This, of course, doesn’t mean that there is nothing bad, but the risk is usually lower because a big user group means that bugs and problems with the application are easier to surface. Therefore, if there are many user complaints, it is still worth investigating.
• Has the application been present in Google Play for a long time without any bad history? This check is related to the first one and can be used to strengthen the decision. Very popular applications with a long history without problems are less obvious candidates for shipping something bad as that would damage their reputation.
• Does the author have other applications published with good ratings?
• Does the application request sensitive permissions? In the Android world, applications are as dangerous as the permissions they are granted. Some of the sensitive permissions that should be allowed with care, especially if many are requested, are phone calls, personal information, accounts, storage, system tools, SMS and MMS, and network communication.
• Does the application contain obfuscation or crashes known analysis tools? Malware authors are known to exploit various vulnerabilities and weaknesses in the analysis software to thwart the analysis process. Some commercial applications also employ various obfuscations to prevent crackers from pirating, but it is not a very common occurrence among free or simple applications.
• Does the application contact any suspicious domains? Malware authors like to reuse domains, so it is common to find the same bad domain in different malware samples.
• When examining the strings table, can you identify any suspicious-looking strings? Similar to malware analysis of Windows executables, looking at the strings list of the application can provide a hint about malicious applications.
Malware Analysis Primer
This section takes a look at a sample Android application and tries to determine whether there is anything malicious in it. Because the application doesn’t come from the Google Play market, the first three checks from the previous section will be skipped and analysis will continue from the question Does the application request sensitive permissions?
The answer to this question lies in the AndroidManifest.xml. Because we already discussed how to convert the manifest file and read its content, we can speed up the process using some handy Androguard scripts. Androperm is a simple script that just outputs the APK permissions. An example of the script output is given here:
SEND_SMS is definitely a suspicious-looking permission. It is typically associated with premium SMS scams that inflict monetary damages onto infected users. The androapkinfo script can be used next to get a summary overview of the application with various malware-oriented details. Following is the abbreviated output of androapkinfo:
Once again, we have the list of permissions the application requires, along with a handy message about the potential malicious use of it. The checks at and are indicators for suspicious code-obfuscation techniques. Also, we have a list of activities that can be used as an entry point to start code analysis. Finally, we have a list of class files that use the SMS functionality and should be investigated to confirm that SMS permissions are not misused.
To check the code of the classes MoviePlayer and HelloWorld, we decompile the application and locate the two interesting classes:
The main activity is implemented in MoviePlayer.java, which makes it a good candidate for analysis. The file can be examined in any text editor, but preferably one with Java syntax highlighting. The full code listing of the function onCreate, which uses SMS functionality, is given next:
The first suspicious thing about this function is the Unicode text buffer. This is nothing more than a safe way for a decompiler to output Unicode strings that a textual editor might not display properly. In this case, the string is in Cyrillic, and translated into English it has the following meaning: “Wait, access to the video library requested...” Next, the variable v0 is initialized as the SmsManager object. On the lines labeled , , and , the code is trying to send an SMS message. The function sendTextMessage has the following prototype:
In this case, the destinationAddress are the numbers 3353 and 3354, whereas the text argument is 798657 in all three cases. The two numbers belong to the premium SMS service, which is charged more expensively than the regular SMS, and the custom text message is probably used to distinguish the affiliate who is sending the money.
The code definitely doesn’t look like a movie player application, and a quick look at other decompiled files shows very little code and nothing that could indicate anything related to advertised functionality. This kind of malware is very common on phones because it can bring immediate financial gain to the authors.
Black-box emulator environments are very useful tools for monitoring malware samples and understanding their functionality without reading code. Droidbox is a modified Android image that offers API monitoring functionality. It uses baksmali/smali to rewrite the application and a custom Android emulator image to log all the monitored APIs with their arguments. This approach is a good first step for understanding the malicious applications or for confirming the findings from the static analysis approach.
Example 20-2: Black-Box APK Monitoring with Droidbox
NOTE This exercise is provided as an example rather than as a lab due to the fact that in order to perform the exercise, malicious code is needed.
Droidbox comes with a modified Android image and can be easily started after the Droidbox image archive is unpacked. The first step is running the custom Android image, as follows:
After the image has booted up, it is time to run the malicious application inside the emulator and collect the logs. The application can be instrumented in the emulator via the droidbox.sh script, like so:
After an arbitrary amount of time has passed, you can stop the monitoring by pressing CTRL-C, which will output logs in JSON format. The output in the previous listing was reduced for brevity. To format the JSON in a nicer way, use the following command:
From the output, it quickly becomes evident that the application is sending three SMS messages, as we have already discussed. The ability to observe and get insight in the application activity in such an easy way makes this approach very useful for malware-analysis purposes. It should be noted that this approach cannot be used by itself and has to be accompanied by the reverse engineering of the application. Black-box approaches like this one don’t guarantee that malicious functionality will be executed during the time of monitoring, so it can miss some or all of the malicious code. In such cases, it is possible to wrongly assume that the application is not malicious while in fact it is just hiding that functionality.
For best results, it is recommended that you use both static analysis of application code and black-box monitoring.
Black-box malware analysis is a cheap way to get an overview of malware functionality. It can be used to find interesting entry points for deeper static analysis. Droidbox is a simple-to-use black-box Android analysis system. It can easily be extended and turned into an automatic analysis system to classify and process a large number of samples and build knowledge on top of the resulting reports.
As consumers are adopting new technologies and making them part of their lives, malware authors are changing their approach and migrating to these technologies. The smartphone as an omnipresent device that makes the Internet always available has a growing malware concern. Trojans trying to steal personal data, backdoors trying to allow attackers to access the device, adware trying to generate revenue for their authors are just some of the potential threats present in the smartphone world. Android, as one of the most popular smartphone platforms, is a perfect target for such malicious activities.
Android malware analysis and reverse engineering follow mostly the traditional Windows malware analysis approaches, but they also bring some new challenges. Understanding the Android ecosystem and design differences will allow you to efficiently analyze applications and determine any malicious intent. As malware shifts its focus to new technologies, it is important that malware researchers also follow and develop adequate analysis tools and techniques.
For Further Reading
Android manifest introduction developer.android.com/guide/topics/manifest/manifest-intro.html.
Android application signing process developer.android.com/tools/publishing/app-signing.html.
DEX file format source.android.com/devices/tech/dalvik/dex-format.html.
Jarsigner documentation docs.oracle.com/javase/7/docs/technotes/tools/windows/jarsigner.html.
Smali syntax highlight for various editors sites.google.com/site/lohanplus/files/.
Smali syntax highlight for Sublime github.com/strazzere/sublime-smali.
SmsManager API documentation developer.android.com/reference/android/telephony/SmsManager.html.
Study on Android Auto-SMS www.symantec.com/connect/blogs/study-android-auto-sms.
Various Android analysis tools: