Ten Ways to Protect Your Programs from Hackers - The Part of Tens - C++ For Dummies (2014)

C++ For Dummies (2014)

Part VI

The Part of Tens

Chapter 30

Ten Ways to Protect Your Programs from Hackers

In This Chapter

arrow Protecting yourself from user input

arrow Handling failures in your code

arrow Maintaining a program log

arrow Following good development process

arrow Practicing good version control

arrow Authenticating users securely

arrow Managing your sessions

arrow Obfuscating your code

arrow Signing your code

arrow Using encryption securely

Chapter 28 describes things you should do in your code to avoid writing programs that are vulnerable to hackers. It also describes features that you can enable if your operating system supports them, such as Address Space Layout Randomization (ASLR) and Data Execution Prevention (DEP). This chapter describes further steps you can take as part of the software development process to defend yourself from the “hackerata.”

Don't Make Assumptions about User Input

The programmer has a frame of mind when writing a program. She's thinking about the problem that she's trying to solve. Given that a person can keep only so many things in her mind at one time, she's probably not thinking much beyond the immediate problem.

Programmer's tunnel vision is okay during the early development phase. At some point, however, the programmer (or, better yet, some other programmer who had nothing to do with the development of the code) needs to sit back and forget about the immediate problem. She needs to ask herself, “How will this program react to illegal input?”

For example, in the field for the username, suppose someone enters several thousand garbage characters. How will the program react? Ideally, you want the program to respond with an error message like, “What the heck are these several thousand characters doing in the place where I expected a name?” Barring that, a simple “I don't understand what you mean” is fine.

In fact, anything short of crashing or corrupting data is acceptable. This is because a crash indicates a possible intrusion vector that a hacker can exploit to get your program to do something that you don't want it to do.

image Not every crash is exploitable, but many crashes are. In fact, throwing lots of garbage input at a program and looking for the crashes is called fuzzing the program and is often the first step to finding exploits in deployed applications.

Here are some of the rules for checking input:

· Make no assumptions about the length of the input.

· Don't accept more input than you have room for in your fixed-length buffers (or used variable-size buffers).

· Check the range of every numerical value to make sure that it makes sense.

· Check for and filter out special characters that may be used by a hacker to inject code.

· Don't pass raw input onto another service, such as a database server.

And perform all of the same checks on the values returned from remote services. The hacker may not be on the input side, he may be on the response side.

Handle Failures Gracefully

By this, I don't mean “Don't be a sore loser.” What I mean is that your program should respond reasonably to failures that occur within the program. For example, if your call to a library function returns a nullptr, the program should detect this and do something reasonable.

Reasonable here is to be understood fairly liberally. I don't necessarily mean that the program needs to sniff around to figure out exactly why the function didn't return a reasonable address. It could be that the request was for way too much memory due to unreasonable input. Or it could be that the constructor detected some type of illegal input. It doesn't matter. The point is that the program should restore its state as best it can and set up for the next bit of input without crashing or corrupting existing data structures such as the heap.

In general this means the following:

· Check for illegal input to interface functions and throw an exception when you detect it.

· Catch and handle exceptions at the proper points.

image Some programmers are good at checking for illegal input and throwing exceptions when problems occur, but they're not so good at catching exceptions and handling them properly. I would give this type of programmer a B-. At least their programs are hard to exploit, but they're easy to crash, making them vulnerable to the relatively less dangerous Denial of Service attacks.

Another rule of thumb is to fail secure. For example, if you are trying to check someone's password and the user inputs 2,000 characters replete with embedded SQL statements, not only is it necessary to reject this garbage, but you must also not approve this nonsense as if it were a valid password. Whenever you catch an exception, you should not assume things about the state of the system (like the fact that the user has been properly identified and credentialed) that may not be true. It is far better to require the user to reenter his credentials after a major failure than it would be to assume that an invalid user has already been approved.

Maintain a Program Log

Create and maintain runtime logs that allow someone to reconstruct what happened in the event of a security failure. (Actually, this is just as true in the event of any type of failure.) For example, you probably want to log every time someone signs into or out of your system. You'll definitely want to know who was logged into your system when a security event occurred — this is the group that’s most at risk of a security loss and who are most suspicious when looking for culprits. In addition, you'll want to log any system errors which would include most exceptions.

A real-world production program contains a large number of calls that look something like the following:

log(DEBUG, "User %s entered legal password", sUser);

image This is just an example. Every program will need some type of log function. Whether or not it's actually called log() is immaterial.

This call writes the string User xxx entered legal password, where xxx is replaced by the name contained in the variable sUser to the program log file when the system is in Debug mode. When the program is not in Debug mode, this call does nothing — this is to avoid paying a performance penalty for logging too much information when everything is going smoothly.

System log functions usually support anywhere from two to five levels of severity that dictate whether log messages get written out or not. There are some failures that are always written to the log file:

if (validate(sUser, sPassword) == true)
{
log(DEBUG, "User %s entered legal password", sUser);
}
else
{
log(ALWAYS, "User %s entered illegal password", sUser);
}

Here, the program logs a valid user password only when the system is in Debug mode, but it always logs an invalid user password just in case this represents an attempt by someone to break into the system by guessing passwords.

Log files must be maintained. Generally, that means running a job automatically at midnight or some other time of decreased use that closes the current log file, moves it into a separate directory along with the day's date, and opens a new log file. This keeps a single log file from getting too big and unwieldy. It also makes it a lot easier to go back to past log files to find a particular event.

In addition, reviewing log files is a boring, thankless, and therefore error-prone job. This makes it a job best performed by computers. Most large systems have special programs that scan the log file looking for anomalies that may indicate a problem. For example, one or two invalid passwords per hour is probably nothing to worry about — people fat-finger their passwords all the time. But a few thousand invalid passwords in an hour is probably worth getting excited about. This may indicate an attempt at forced entry by brute-force guessing a password.

image An entire For Dummies book could be written on log file maintenance. There is way more to this topic than I can cover here. Log files must be backed up daily and cleaned out periodically lest they grow forever. In addition, someone or some program needs to monitor these log files to detect problems such as repeated attempts to guess someone's password.

Maintaining a system log gives the system administrator the raw material that she needs to reconstruct what happened in the event that the unthinkable happens and a hacker makes it into the system.

Follow a Good Development Process

Every program should follow a well thought out, formal development process. This process should include at least the following steps:

· Collect and document requirements, including security requirements.

· Review design.

· Adhere to a coding standard.

· Undergo unit test.

· Conduct formal acceptance tests that are based on the original requirements.

In addition, peer reviews should be conducted at key points to verify that the requirements, design, code, and test procedures are high quality and meet company standards.

image I am not against New Age development techniques such as recursive and agile development. But agile is not a synonym for sloppy — just because you are using an agile development process does not mean that you can skip any of the preceding development steps.

image I am also not a Process Nazi. The preceding steps do not need to be as formal or as drawn out for a small program involving one or two developers as they would be for a project that employs dozens of systems analysts, developers, and testers. However, even small programs can contain hackable security flaws, and it takes only one for your computer to become somebody's bot.

Implement Good Version Control

Version control is a strange thing. It's natural not to worry about version 1.1 when you're under the gun to get version 1.0 out the door and into the waiting users' outstretched hands. However, version control is an important topic that must be addressed early because it must be built into the program's initial design and not bolted on later.

One almost trivial aspect of version control is knowing which version of the program a user is using. Now this sounds kind of stupid, but believe me, it's not. When a user calls up and says, “It does this when I click on that,” the help desk really needs to know which version of the program the user is using. He could be describing a problem in his version that's already been fixed in the current version.

A program should have an overall version number that either gets displayed when the program starts or is easily retrievable by the user (or both). Usually the version number is displayed as part of the help system. In the code, this can be as simple as maintaining a global variable with the version number that gets displayed when the user selects Help⇒About. The programmer is responsible for updating this version number whenever a new version gets pushed out to production. This Help window should also display the version number of any Web services that the program uses, if possible.

A more pernicious aspect of version control is how to roll new versions of the application out to users. This includes both the code itself and changes to the data structures, such as database tables, that the application may access.

The code roll-out problem is trivial with browser-based Web applications — you simply load a new version onto the server, and the next time the user clicks on your page, he gets the new version. However, this problem is much more difficult for applications that install onto the user's computer. The problem is that this is a great opportunity for hackers to exploit your application.

Suppose for example that you have devised a really cool update feature in your application. The user clicks on Update Now, and the application goes back to the server and checks for a new version. If a new version is available, the application automatically downloads the update and installs it.

If a hacker figures out the protocol that your application uses for downloading updates and if that protocol is not sufficiently secured, a hacker can convince your application on other people's computers to download a specially modified version that she's created, complete with malware of her own creation. Pretty soon your entire user base is infected with some type of malware that you know nothing about.

I'm not saying that automatic updates can't be done securely — obviously they can, or companies like Microsoft and Apple wouldn't do them. I'm just saying that if you do choose the automatic-update route, you need to be very careful about how you implement security.

Even if you go the old fashioned route and have the user download a new MyApplication_Setup.exe that installs the new application, you need to worry about whether some hacker may have uploaded a version of your program laced with malware. Most download Web sites are pretty careful about checking applications for malware. Another approach is to calculate a secure checksum and include it with the download file. The user can then recalculate that checksum on the file that he downloads. If the number he calculates doesn't match the number that you uploaded, then the executable file may have been tampered with and the user shouldn't install the program. Although this approach is pretty secure, very few users bother.

Authenticate Users Securely

User authentication should be straightforward: The user provides an account name and a password, and your program looks the account name up in a table and compares the passwords. If the passwords match, the user is authenticated. But when it comes to antihacking, nothing is that simple.

First, never store the passwords themselves in the database. This is called storing them in the clear and is considered very bad form. It's far too easy for a hacker to get his hands on the password file. Instead, save off a secure transform of the password.

image This is known as a secure hash, and there are several such algorithms defined; the most common are MD5, SHA1, and SHA256. All of these hash functions share several common properties: It is very unlikely that two passwords will generate the same hash value, it is virtually impossible to figure out the original password from the hashed value, and even a small change in the password results in a wildly different hashed value.

Unfortunately no secure hash function is included in the standard C++ library, but there are standard implementations of the most common algorithms in many open source libraries. In addition, the algorithms, along with sample code for each of these, is available on Wikipedia.

In practice, these hash functions work as follows:

1. The user creates a password when he registers with the application.

2. The application appends a random string (known as a salt) to either the front or the end of the password the user enters.

3. The application runs the resulting string through one of the secure hash algorithms.

For example, the SHA256 algorithm generates a 64-digit hexadecimal-number (256-bit) result. The application stores this result and the salt string in a database table along with the user's name.


Why salt? It's bad for the heart

The salt value adds security to the process by making it difficult for a hacker to deduce the password, even if he intercepts the hashed value by sniffing the line. It does this in two ways:

· One successful technique at guessing passwords is to construct a table of pre-hashed common passwords. This is known as a password dictionary.

When the hashed password comes over the line, the hacker looks up the hashed value in the dictionary — this is a very quick operation. If it is found, then the hacker knows the user's password. However, this technique will not work, even if the user picks a common password, if a random long salt value has been added.

· A salt can help make up for passwords that are too short.

For example, a hacker who knows that a user is lazy and doesn't use passwords with more than six characters has no trouble trying all possible six-letter combinations to reconstruct a password from its hash. But a 6-letter password combined with a random 30-letter hash cannot be calculated in advance. Lastly, a random salt value gives two different users with the same password a different hash value.

However, salts aren't magic. The salt value is transmitted in the clear. If the user chooses a sufficiently short password (say, four characters) and the hacker knows this, the hacker can still generate the hashes of all possible four-letter passwords combined with the salt string and break the lazy user's password.

Remember that for a salt to be effective it needs to have three properties:

· It must be generated separately for each user — using the same salt value over and over doesn't add any security.

· It must be sufficiently long (say 20 or 30 characters).

· It must be random.


When the user logs in, the application goes through the following steps to verify the user's password:

1. Uses the username to look up the salt values and the hashed password in the user table.

2. Adds the salt string to the user's password and calculates the secure hash.

3. Compares this newly calculated value to the value stored in the table. If it matches the value stored in the table, then the user is authenticated.

This algorithm has the advantage that if a hacker were to get hold of the password table, it would still be difficult for him to create a password that would match one of the existing hashes.

image It may be difficult but it is not impossible to find a password that matches a given hash value, however — the MD5 algorithm, though popular, is particularly susceptible to this type of attack. If you suspect that the password table has been compromised, you must invalidate all of the user accounts and force people to securely reregister with the application.

Once you've authenticated a user, your application should assign that user a role that specifies the types of things that he is allowed to do in the application. Consider a weekly status report application. A normal user should only be able to edit entries that he creates. He may or may not be able to read other users' entries. A person assigned the role of supervisor may be able to edit other users' entries. Only the few users assigned the role of administrator should be able to edit the tables, register new users, or change the roles of other users.

By keeping the number of administrators to a minimum, you reduce the number of vulnerabilities that your application exposes. For example, if a hacker breaks into a normal user's account, he can do little more than edit that user's status information. Bad but certainly not a disaster.

And, finally, your application should keep statistics on user log-ins. You should consider deactivating the accounts of users that haven't used the system in a long time. In addition, the application should react automatically to repeated attempts to log in with the wrong password by either permanently, or at least temporarily, locking out the account. This will make it much more difficult for a computer on the other end of a connection to run through thousands or millions of possible passwords attempting to brute-force guess one.

Manage Remote Sessions

You can make certain assumptions when all of your application runs on a single computer. For one thing, once the user has authenticated himself, you don't need to worry about him being transformed into a different person (unless your application is intended to run at Hogwarts Academy of Witchcraft and Wizardry). Applications that communicate with a remote server can't make this assumption — a hacker who is listening on the line can wait until the user authenticates himself and then hijack the session.

What can the security-minded programmer do to avoid this situation? You don't want to repeatedly ask the user for his password just to make sure that the connection hasn’t been hijacked. The alternative solution is to establish and manage a session. You do this by having the server send the remote application a session cookie once the user has successfully authenticated himself.

image The term cookie is not very descriptive. It is actually a string of digits or characters. A cookie can be in a file on the hard disk, or it can be held in RAM. Session cookies are generally held in RAM for extra security. All modern browsers include support for cookies.

A cookie may include information such as a hash of the user's password and the time that the session started. Throughout the session, the server periodically challenges the remote application for a copy of the cookie. As long as the remote application presents a valid cookie, the server can be reasonably certain that the person on the other end of the connection is the same person who entered the correct password in the first place.

If at any time the remote application cannot produce a cookie that matches the cookie provided to it at the beginning of the session, the server application automatically logs the user off and refuses to listen to the remote application until it can log in again with valid username and password. If the connection between the server and the application is lost, the server invalidates the cookie, thereby forcing the application to reauthenticate. When the user logs out, the session cookie is also invalidated.

In this way the server can be reasonably certain than some nefarious application hasn't managed to hijack the user's session. But what about in the other direction? How does the application know whether it can trust the server? A way to solve this problem is for the remote application to generate a cookie back to the server that uniquely identifies it as a legitimate server.

A second approach, one that is much more secure, is to establish a secure session using a standard protocol like Secure Socket Layer (SSL) or Transport Layer Security (TLS). While the details are well beyond the scope of this book, these protocols allow the server and the remote application to exchange passwords in a secure fashion. These passwords are then used to encrypt all communication between the two for the remainder of the session. This encryption precludes a hacker from intercepting the session — without the password, the hacker can't understand what the server is saying nor trick the server into accepting its output. Further, since the messages are encrypted with keys that are exchanged securely, a hacker can't even understand what information is being exchanged between the server and the remote app if she happens to be listening on the line.

Obfuscate Your Code

Code obfuscation is the act of making the executable as difficult for a hacker to understand as possible.

image To obfuscate means to make obscure or unclear.

The logic is simple. The easier it is for a hacker to understand how your code works, the easier it will be for the hacker to figure out vulnerabilities.

The single easiest step you can take is to make sure that you only ever distribute a Release version of your program that does not include debug symbol information. When you first create the project file, be sure to select that both a Debug and a Release version should be created, as shown in Figure 30-1. The only real difference between these two is the compiler switches: The Debug version includes the -g switch, which tells the compiler to include symbol table information in the executable file. This symbol information tells the debugger where each line of code is located within the executable and where each variable is stored. This information is necessary in order to set break points and display the value of variables. But if this information is available in the version distributed to customers, then the hacker will be given a blue print to each line of your source code.

image

Figure 30-1: The wizard used to create programs allows you to create both a Debug and a Release version of the project.

The Release version of Code::Blocks does not include the gcc -g switch, so no symbol information is included in its executable.

image The Release version may also include enhanced code optimizations to generate faster or smaller executable files via one of the various gcc -O switches.

image You can add a Release version to an existing project in Code::Blocks by selecting Project⇒Properties and then selecting the Build Targets tab to reveal a dialog box like that shown in Figure 30-2. Select Add and fill in the name of the new target. Make sure that the settings in this top-level dialog box match the Debug settings (for example, make sure that the type is Console Application and that the target executable and object directories are filled in). Then select Build options and make sure that the Release target does not have the -g compiler switch set.

image

Figure 30-2: You can add a new build target from the Project⇒Properties window.

You will need to build the Debug version during Unit Test and Debug; but before final test and release, you should tell Code::Blocks to generate the Release version of the program by selecting Build⇒Select Target⇒Release⇒Build⇒Rebuild. To keep things straight, Code::Blocks puts the Release executable in a separate target directory.

image Never, ever, distribute versions of your application with symbol information included.

You should always endeavor to make your source code as simple, clean, and clear as you possibly can. However, you can purchase a commercial code obfuscator that mangles your program to make it more difficult to reverse engineer. Some obfuscators work on the machine code, and some work at the source level, generating a C++ program from your C++ that even you would be hard pressed to follow. The critical thing about any obfuscator is that while it makes the code more difficult for a human to follow, it does not change the meaning of the code in any way.

image Don't put too much faith in code obfuscators. It may make the code harder to reverse engineer, but it's still not impossible. Given enough time, a determined hacker can reverse engineer any program.

Sign Your Code With a Digital Certificate

Code signing works by generating a secure hash of the executable code and combining it with a certificate issued by a valid certificate authority. The process works like this: The company creating the program must first register itself with one of the certificate authorities. Let's use the example of my great retirement hope, My Company, Inc. I must first convince one of the commercially available certificate authorities that My Company, Inc., is in fact a real company and not just some den of thieves or figment of my imagination. I do this by revealing its address, its phone numbers, the names and addresses of its officers, the URL of its Web site, and so on. I may also be asked to produce My Company's tax filings for the past few years to prove that this isn't some bogus claim.

Once the certificate authority is convinced that My Company is a valid software entity, it issues a certificate. This is a long number that anyone can use to verify that the holder of this certificate is the famous My Company of San Antonio.

Executables generated by My Company can then be signed with this certificate. Signing an executable does two things:

· It creates a secure hash that would make it very difficult (as close to impossible as possible) for a hacker to modify the executable without being detected by the user's computer.

· It insures the user that this program was created by a legitimate software development company.

When a user runs my program for the first time, the application presents its certificate and secure hash combination to the operating system. The OS first calculates a hash of the executable and compares it to the hash presented. If these match, then the OS is reasonably certain that the executable is the same one that My Company shipped out of its doors. The OS then validates the certificate to make sure that it is valid and hasn't been revoked. If that matches, the OS presents a dialog box to the user that states this application is a valid executable from My Company, Inc., and asks whether it should continue executing it.

Use Secure Encryption Wherever Necessary

Like any good warning, this admonition has several parts. First, “Use encryption wherever necessary.” This tends to bring to mind thoughts of communicating bank account information over the Internet, but you should think more general than that. Data that's being communicated, whether over the Internet or over some smaller range, is known generally as Data in Motion. Data in Motion should be encrypted unless it would be of no use to a hacker.

Data stored on the disk is known as Data at Rest. This data should also be encrypted if there is a chance of the disk being lost, stolen, or copied. Businesses routinely encrypt the hard disks on their company laptops in case a laptop gets stolen at the security scanner in the airport or left in a taxi somewhere. Small portable storage devices such as thumb drives are especially susceptible to being lost — data on these devices should be encrypted.

Encryption is not limited to data — the entire communication session should be encrypted if a hacker could coax your application into revealing secrets by spoofing either the remote application or the server.

But this section’s title says “Use Secure Encryption.” Don't make up your own encryption scheme or try to improve upon existing schemes because the results won't be secure. Encryption algorithms go through years of testing and evaluation by experts before they are adopted by the public. Don't think that you are going to improve on them on your own. You are more likely to just mess them up.

A good example of this lies in the Wi-Fi in your phone, tablet, or laptop. The original definition for Wi-Fi (known as 802.11b) used a reasonably secure published algorithm for securing the packets of information sent over the airwaves. It called this standard WEP (Wired Equivalent Privacy, sometimes erroneously labeled Wireless Encryption Protocol). Unfortunately, the designers of 802.11b didn't implement the protocol correctly, which left 802.11b hopelessly vulnerable to hacking. By 2004, programs were available on the Web that could break a WEP-encrypted data stream in three minutes or less. This flaw was recognized fairly quickly, and subsequent standards replaced WEP with the more secure WPA2 (Wi-Fi Protected Access).

image When the holes in WEP were first discovered, the Wi-Fi Alliance, keepers of the 802.11 standard, knew they had a problem. The replacement encryption standard that they wanted could not be implemented on many of the existing Wi-Fi Access Points that were built to support WEP. Therefore, the Wi-Fi Alliance decided to release an intermediate standard known as WPA1. This protocol implemented the original encryption algorithm the way it should have been implemented in the first place. Since it was so similar to WEP, WPA1 could be implemented on existing hardware with relatively minor changes to the firmware. Nevertheless, the fixes resulted in a significant increase in security over the flawed WEP implementation. However, WPA1 was never intended to be anything more than a stop-gap. The completely new WPA2 standard was introduced in 2004 and required on all devices built after 2006. Secure applications no longer allow the use of WEP. For example, the Payment Card Industry outlawed its use in 2008. Though I saw support for WPA1 as late as 2010, WPA2 is the state of the art as of this writing.