Microsoft Excel 2016 BIBLE (2016)
Working with Formulas and Functions
Creating Formulas That Manipulate Text
IN THIS CHAPTER
1. Seeing how Excel handles text entered into cells
2. Looking at Excel worksheet functions that handle text
3. Getting examples of advanced text formulas
Excel is, of course, best known for its ability to crunch numbers. It's also quite versatile, however, with handling text. As you know, you can enter text for such things as row and column headings, customer names and addresses, part numbers, and just about anything else. In addition (as you may expect), you can use formulas to manipulate the text contained in cells.
This chapter contains many examples of formulas that use a variety of functions to manipulate text. Some of these formulas perform feats that you may not have thought possible.
A Few Words About Text
When you enter data into a cell, Excel immediately goes to work and determines whether you're entering a formula, a number (including a date or time), or anything else. That “anything else” is considered text.
You may hear the term string used instead of text. You can use these terms interchangeably. Sometimes they even appear together, as in text string.
A single cell can hold up to 32,000 characters — roughly equlvalent to the number of characters in this chapter. But Excel is not a word processor, and I can't think of a reason why anyone would need to even come close to that number.
If you need to display lots of text in a worksheet, consider using a text box. Choose Insert Text Text Box, click the worksheet to create the text box, and then start typing. Working with large amounts of text in a text box is easier than editing cells. In addition, you can easily move, resize, or change the dimensions of a text box. However, if you need to work with the text using formulas and functions, the text must reside in cells.
When a Number Isn't Treated as a Number
If you import data into Excel, you may be aware of a common problem: sometimes the imported values are treated as text.
Depending on your error-checking settings, Excel may display error indicators to identify numbers stored as text. An error indicator appears as a small rectangle in the upper-left corner of the cells. Activate the cell and click the icon, which expands to show a list of options. To force the number to be treated as an actual number, select Convert to Number from the list of options.
To control which error-checking rules are in effect, choose File Options, and then select the Formulas tab. You can enable any or all of the nine error types.
Here's another way to convert these nonnumbers to actual values. Activate any empty cell and choose Home Clipboard Copy (or press Ctrl+C). Then select the range that contains the values you need to fix. Choose Home Clipboard Paste Special. In the Paste Special dialog box, select the Add operation and then click OK. This procedure essentially adds zero to each cell — and, in the process, forces Excel to treat the nonnumbers as actual values.
Excel has an excellent assortment of worksheet functions that can handle text. You can access these functions just where you'd expect: from the Text control in the Function Library group of the Formulas tab.
A few other functions that are relevant to text manipulation appear in other function categories.
See Appendix A, “Worksheet Function Reference,” for a listing of the functions in the Text category. Or you can peruse these functions in the Insert Function dialog box. Activate an empty cell, and choose Formulas Function Library y Insert Function. In the Insert Function dialog box, select the Text category and scroll through the list. To find out more about a particular function, click the Help on This Function link.
Many of the text functions are not limited to text: They can also operate with cells that contain numeric values. You'll find that Excel is very accommodating when it comes to treating numbers as text.
The examples discussed in this section demonstrate some common (and useful) things you can do with text. You may need to adapt some of these examples for your own use.
Working with character codes
Every character you see on your screen has an associated code number. For Windows systems, Excel uses the standard ANSI character set. The ANSI character set consists of 255 characters, numbered (not surprisingly) from 1 through 255. An ANSI character requires one byte of storage. Excel also supports an extended character set known as Unicode, in which each character requires two bytes of storage.
Figure 11.1 shows an Excel worksheet that displays all 255 ANSI characters. This example uses the Wingdings 3 font. (Other fonts may have different characters.)
Figure 11.1 The ANSI character set (for the Wingdings 3 font).
This book's website, www.wiley.com/go/excel2016bible, includes a copy of this workbook, which also includes some simple VBA macros that enable you to display the character set for any font installed on your system. The file is namedcharacter set.xlsm.
Two functions come into play when dealing with character codes: CODE and CHAR. These functions may not be very useful by themselves, but they can prove quite useful in conjunction with other functions. I discuss these functions in the following sections.
Excel 2013 introduced two new functions that are similar to CODE and CHAR but work with Unicode characters. The new functions are UNICODE and UNICHAR.
The CODE function
The Excel CODE function returns the character code for its argument. The formula that follows returns 65, the character code for uppercase A:
If the argument for CODE consists of more than one character, the function uses only the first character. Therefore, this formula also returns 65:
The CHAR function
The CHAR function is essentially the opposite of the CODE function. Its argument should be a value between 1 and 255, and the function returns the corresponding character. The following formula, for example, returns the letter A:
To demonstrate the opposing nature of the CODE and CHAR functions, try entering this formula:
This formula, which is illustrative rather than useful, returns the letter A. First, it converts the character to its code value (65), and then it converts this code back to the corresponding character.
Assume that cell A1 contains the letter A (uppercase). The following formula returns the letter a (lowercase):
This formula takes advantage of the fact that all the alphabetic characters appear in alphabetical order within the character set; lowercase letters follow uppercase letters (with a few other characters tossed in between). Each lowercase letter is exactly 32 character positions higher than its corresponding uppercase letter.
Inserting Special Characters
If you need to insert special characters not found on your keyboard, you can use the Symbol dialog box (choose Insert Symbols Symbol). This dialog box simplifies inserting special characters (including Unicode characters) into cells. For example, you may want to display the Greek letter pi (π) in your worksheet. From the Symbol dialog box, select the Symbol font (see the accompanying figure). Examine the characters, locate the pi character, and click Insert. You'll see (in the Character Code area of the Symbol dialog box) that this character has a numerical code of 112. You can use this function to display the pi symbol (as long as the cell is formatted to use the Symbol font:
In addition, Excel has several built-in AutoCorrect symbols. For example, if you type (c) followed by a space or the Enter key, Excel converts it to a copyright symbol (©).
To see the other symbols that you can enter this way, display the AutoCorrect dialog box. To display this dialog box, choose File Options and select the Proofing tab in the Excel Options dialog box. Then click the AutoCorrect Options button. You can then scroll through the list to see which autocorrections are enabled (and delete those that aren't useful to you).
If you find that Excel makes an autocorrection that you don't want, press Ctrl+Z immediately to undo the autocorrection.
Determining whether two strings are identical
You can create a simple logical formula to determine whether two cells contain the same entry. For example, use this formula to determine whether cell A1 has the same contents as cell A2:
This formula will return either TRUE or FALSE, depending on the contents of cells A1 and A2. However, Excel is a bit lax in its comparisons when text is involved. Consider the case in which A1 contains the word January (initial capitalization), and A2 contains JANUARY(all uppercase). You'll find that the previous formula returns TRUE even though the contents of the two cells are not really the same. In other words, the comparison is not case sensitive.
Often, you don't need to worry about the case of the text. If you need to make an exact, case-sensitive comparison, though, use the EXACT function. The following formula returns TRUE only if cells A1 and A2 contain exactly the same entry:
When you compare text, be careful with trailing space characters, which are often difficult to identify. The following formula returns FALSE because the first string contains a trailing space:
When an extra space is at the end of text in a cell, it's impossible to tell that it's there just by looking at the cell. So if your text comparison formulas don't seem to be working, a trailing space could be the problem.
Joining two or more cells
Excel uses an ampersand (&) as its concatenation operator. Concatenation is simply a fancy term that describes what happens when you join the contents of two or more cells. For example, if cell A1 contains the text Tucson and cell A2 contains the text Arizona, the following formula will return TucsonArizona:
Notice that the two strings are joined without an intervening space. To add a space between the two entries (to get Tucson Arizona), use a formula like this one:
Or, even better, use a comma and a space to produce Tucson, Arizona:
If you'd like to force the second string to be on a new line, concatenate the strings using CHAR(10), which inserts a line break character. Also, make sure that you apply the Wrap Text format to the cell. The following example joins the text in cell A1 and the text in cell B1, with a line break in between:
To apply Wrap Text formatting, select the cells and then choose Home Alignment Wrap Text.
You can also concatenate characters returned by the CHAR function. The following formula returns the string Stop by concatenating four characters returned by the CHAR function:
Here's a final example of using the & operator. In this case, the formula combines text with the result of an expression that returns the maximum value in column C:
="The largest value in Column C is " & MAX(C:C)
Excel also has a CONCATENATE function, which takes up to 255 arguments. For example:
This function simply combines the arguments into a single string. You can use this function if you like, but using the & operator results in shorter formulas.
Displaying formatted values as text
The TEXT function enables you to display a numeric value in a specific number format. Figure 11.2 shows a simple worksheet. The formula in cell D3 is
="The net profit is " & B3
Figure 11.2 The formula in D3 doesn't display the formatted number.
This formula essentially combines a text string with the contents of cell B3 and displays the result. Note, however, that the formula displays the contents of B3 as a raw value (no formatting). To improve readability, you might want to display the contents of B3 by using a Currency number format.
Contrary to what you might expect, applying a number format to the cell that contains the formula has no effect. This is because the formula returns a string, not a value. You can, however, use the TEXT function to apply formatting. The TEXT function supports most (but not all) of Excel's standard custom number formatting strings.
Here's a revised formula that uses the TEXT function to apply formatting to the value in B3:
="The net profit is " & TEXT(B3," $#,##0")
This formula displays the text along with a nicely formatted value:
The net profit is $281,252
The second argument for the TEXT function consists of a standard Excel number format string. You can enter any valid number format code for this argument.
The preceding example uses a simple cell reference (B3). Of course, you can use an expression instead. Here's an example that combines text with a number resulting from a computation:
="Average Expenditure: "& TEXT(AVERAGE(A:A),"$#,##0.00")
This formula might return a string, such as the following:
Average Expenditure: $7,794.57
Here's another example that uses the NOW function (which returns the current date and time). The TEXT function displays the date and time, nicely formatted:
="Report printed on "&TEXT(NOW(),"mmmm d, yyyy at h:mm AM/PM")
The formula might display the following:
Report printed on March 22, 2016 at 3:23 PM
See Chapter 25, “Using Custom Number Formats,” for details on Excel number formats.
Displaying formatted currency values as text
The DOLLAR function converts a number to text using the currency format. It takes two arguments: the number to convert and the number of decimal places to display. The DOLLAR function uses the regional currency symbol — for example, a dollar sign ($).
You can sometimes use the DOLLAR function in place of the TEXT function. The TEXT function, however, is much more flexible because it doesn't limit you to a specific number format.
The following formula returns Total: $1,287.37. (The second argument for the DOLLAR function specifies the number of decimal places.)
="Total: "&DOLLAR(1287.367, 2)
If you're looking for a function that converts a number into spelled-out text (such as “One hundred twelve and 32/100”), you won't find it. Well, Excel does have a BAHTTEXT function, but it converts the number into the Thai language. Why Excel doesn't include an English language version of this function remains a mystery.
Repeating a character or string
The REPT function repeats a text string (first argument) any number of times you specify (second argument). For example, this seasonal formula returns HoHoHo:
You can also use this function to create crude horizontal dividers between cells. This example displays a squiggly line, 20 characters in length:
Creating a text histogram
A clever use for the REPT function is to create a simple histogram (or frequency distribution chart) directly in a worksheet. Figure 11.3 shows an example of such a histogram. You'll find this type of graphical display especially useful when you need a visual summary of many values and a standard chart is unwieldy.
Figure 11.3 Using the REPT function to create a histogram in a worksheet range.
You can also use the Data Bars conditional formatting feature to display a simple histogram directly in cells. (See Chapter 21, “Visualizing Data Using Conditional Formatting,” for details.)
The formulas in column D graphically depict the sales numbers in column B by displaying a series of characters in the Wingdings 2 font. This example uses character code 162 (a solid rectangle). A formula using the REPT function determines the number of characters displayed. The formula in cell D2 follows:
Assign the Wingdings font to cell D2, and then copy the formulas down the column to accommodate all the data. Depending on the numerical range of your data, you may need to change the scaling. Experiment by replacing the 100 value in the formulas. You can substitute any character you like for the solid rectangle in the formula to produce a different character in the chart.
The workbook shown in Figure 11.3 is available at this book's website at www.wiley.com/go/excel2016bible. The file is named text histogram.xlsx; it also contains another example of this technique.
Padding a number
You're probably familiar with a common security measure (frequently used on printed checks) in which numbers are padded with asterisks on the right. The following formula displays the value in cell A1, along with enough asterisks to make a total of 24 characters:
=(A1 & REPT("*",24-LEN(A1)))
If you'd prefer to pad the number with asterisks on the left instead, use this formula:
The following formula displays 12 asterisks on both sides of the number:
The preceding formulas are a bit deficient because they don't show any number formatting. This revised version displays the value in A1 (formatted), along with the asterisk padding on the right:
Figure 11.4 shows this formula in action.
Figure 11.4 Using a formula to pad a number with asterisks.
You can also pad a number by using a custom number format. To repeat the next character in that format until it fills the column width, include an asterisk (*) in the custom number format code. For example, use this number format to pad the number with dashes:
To pad the number with asterisks, use two asterisks in the number format code, like this:
See Chapter 25 for more information about custom number formats, including additional examples using the asterisk format code.
Removing excess spaces and nonprinting characters
Often, data imported into an Excel worksheet contains excess spaces or strange (often unprintable) characters. Excel provides you with two functions to help whip your data into shape:
· TRIM removes all leading and trailing spaces and replaces internal strings of multiple spaces with a single space.
· CLEAN removes all nonprinting characters from a string. These “garbage” characters often appear when you import certain types of data.
This example uses the TRIM function. The formula returns Fourth Quarter Earnings (with no excess spaces):
=TRIM(" Fourth Quarter Earnings ")
Counting characters in a string
The LEN function takes one argument and returns the number of characters in the argument. For example, assume that the string September Sales is contained in cell A1.
The following formula returns 15:
Notice that space characters are included in the character count.
The following formula returns the total number of characters in the range A1:A3:
You see example formulas that demonstrate how to count the number of specific characters within a string later in this chapter. Chapter 13, “Creating Formulas That Count and Sum,” covers counting techniques in greater detail.
Changing the case of text
Excel provides three handy functions to change the case of text:
· UPPER converts the text to ALL UPPERCASE.
· LOWER converts the text to all lowercase.
· PROPER converts the text to Proper Case (the first letter in each word is capitalized, as in a proper name).
These functions are straightforward. The formula that follows, for example, converts the text in cell A1 to proper case:
If cell A1 contained the text MR. JOHN Q. PUBLIC, the formula would return Mr. John Q. Public.
These functions operate only on alphabetic characters; they simply ignore all other characters and return them unchanged.
These functions aren't perfect, and they sometimes produce undesired results. For example, this formula returns Don'T:
Apparently, the PROPER function is programmed to always capitalize the letter following an apostrophe. If the argument is "o'reilly", the function works perfectly.
Transforming Data with Formulas
Many of the examples in this chapter describe how to use functions to transform data in some way. For example, you can use the UPPER function to transform text into uppercase. Often, you'll want to replace the original data with the transformed data. Specifically, follow these steps:
1. Insert a new temporary column for formulas to transform the original data.
2. Create your formulas in the temporary column.
3. Select the formula cells.
4. Choose Home Clipboard Copy (or press Ctrl+C).
5. Select the original data cells.
6. Choose Home Clipboard Paste Values (V).
This procedure replaces the original data with the transformed data; then you can delete the temporary column that holds the formulas.
Extracting characters from a string
Excel users often need to extract characters from a string. For example, you may have a list of employee names (first and last names) and need to extract the last name from each cell. Excel provides several useful functions for extracting characters:
· LEFT returns a specified number of characters from the beginning of a string.
· RIGHT returns a specified number of characters from the end of a string.
· MID returns a specified number of characters beginning at a specified position within a string.
The following formula returns the last ten characters from cell A1; if A1 contains fewer than ten characters, the formula returns all text in the cell:
This next formula uses the MID function to return five characters from cell A1, beginning at character position 2. In other words, it returns characters 2 through 6.
The following example returns the text in cell A1 with only the first letter in uppercase. It uses the LEFT function to extract the first character and convert it to uppercase. This character then concatenates to another string that uses the RIGHT function to extract all but the first character (converted to lowercase). Here's what it looks like:
If cell A1 contained the text FIRST QUARTER, the formula would return First quarter.
This is different from the result obtained using the PROPER function. The PROPER function makes the first character in each word uppercase.
Replacing text with other text
In some situations, you may need a formula to replace a part of a text string with some other text. For example, you may import data that contains asterisks, and you need to convert the asterisks to some other character. You could use choose Home Editing Find & Select Replace to make the replacement. If you prefer a formula-based solution, you can take advantage of either of two functions:
· SUBSTITUTE replaces specific text in a string. Use this function when you know the character(s) to be replaced but not the position.
· REPLACE replaces text that occurs in a specific location within a string. Use this function when you know the position of the text to be replaced but not the actual text.
The following formula uses the SUBSTITUTE function to replace 2015 with 2016 in the string 2015 Budget. The formula returns 2016 Budget:
The following formula uses the SUBSTITUTE function to remove all spaces from a string. In other words, it replaces all space characters with an empty string. The formula returns 2015OperatingBudget:
=SUBSTITUTE("2015 Operating Budget"," ","")
The following formula uses the REPLACE function to replace one character beginning at position 5 with nothing. In other words, it removes the fifth character (a hyphen) and returns Part544:
Finding and searching within a string
The FIND and SEARCH functions enable you to locate the starting position of a particular substring within a string:
· FIND finds a substring within another text string and returns the starting position of the substring. You can specify the character position at which to begin searching. Use this function for case-sensitive text comparisons. Wildcard comparisons are not supported.
· SEARCH finds a substring within another text string and returns the starting position of the substring. You can specify the character position at which to begin searching. Use this function for non-case-sensitive text or when you need to use wildcard characters.
The following formula uses the FIND function and returns 7, the position of the first m in the string. Notice that this formula is case sensitive:
=FIND("m","Big Mama Thornton",1)
The formula that follows, which uses the SEARCH function, returns 5, the position of the first m (either uppercase or lowercase):
=SEARCH("m","Big Mama Thornton",1)
You can use the following wildcard characters within the first argument for the SEARCH function:
· Question mark (?) matches any single character.
· Asterisk (*) matches any sequence of characters.
If you want to find an actual question mark or asterisk character, type a tilde (˜) before the question mark or asterisk.
The next formula examines the text in cell A1 and returns the position of the first three-character sequence that has a hyphen in the middle of it. In other words, it looks for any character followed by a hyphen and any other character. If cell A1 contains the text Part-A90, the formula returns 4:
Searching and replacing within a string
You can use the REPLACE function in conjunction with the SEARCH function to replace part of a text string with another string. In effect, you use the SEARCH function to find the starting location used by the REPLACE function.
For example, assume that cell A1 contains the text Annual Profit Figures. The following formula searches for the six-letter word Profit and replaces it with the word Loss:
This next formula uses the SUBSTITUTE function to accomplish the same effect in a more efficient manner:
Advanced Text Formulas
The examples in this section appear more complex than the examples in the preceding section. As you can see, though, these examples can perform some useful text manipulations. Space limitations prevent a detailed explanation of how these formulas work, but this section gives you a basic introduction.
You can access all the examples in this section by downloading the file from this book's website at www.wiley.com/go/excel2016bible. The file is named text formula examples.xlsx.
Counting specific characters in a cell
This formula counts the number of Bs (uppercase only) in the string in cell A1:
This formula works by using the SUBSTITUTE function to create a new string (in memory) that has all the Bs removed. Then the length of this string is subtracted from the length of the original string. The result reveals the number of Bs in the original string.
The following formula is a bit more versatile: it counts the number of Bs (both uppercase and lowercase) in the string in cell A1. Using the UPPER function to convert the string makes this formula work with both uppercase and lowercase characters:
Counting the occurrences of a substring in a cell
The formulas in the preceding section count the number of occurrences of a particular character in a string. The following formula works with more than one character. It returns the number of occurrences of a particular substring (contained in cell B1) within a string (contained in cell A1). The substring can consist of any number of characters:
For example, if cell A1 contains the text Blonde On Blonde and B1 contains the text Blonde, the formula returns 2.
The comparison is case sensitive, so if B1 contains the text blonde, the formula returns 0. The following formula is a modified version that performs a case-insensitive comparison by converting the characters to uppercase:
Extracting the first word of a string
To extract the first word of a string, a formula must locate the position of the first space character and then use this information as an argument for the LEFT function. The following formula does just that:
This formula returns all the text prior to the first space in cell A1. However, the formula has a slight problem: it returns an error if cell A1 consists of a single word. A slightly more complex formula that checks for the error using the IFERROR function solves that problem:
The preceding formula uses the IFERROR function, which was introduced in Excel 2007. If your workbook will be used with previous versions of Excel, use this formula:
=IF(ISERR(FIND(" ",A1)),A1,LEFT(A1,FIND(" ",A1)-1))
Extracting the last word of a string
Extracting the last word of a string is more complicated because the FIND function only works from left to right. Therefore, the problem is locating the last space character. The formula that follows, however, solves this problem by returning the last word of a string (all text following the last space character):
This formula, however, has the same problem as the first formula in the preceding section: it fails if the string does not contain at least one space character. The following modified formula uses the IFERROR function to test for an error (that is, no spaces). If the first argument returns an error, the formula returns the complete contents of cell A1:
Following is a modification that doesn't use the IFERROR function. This formula works for all versions of Excel:
Extracting all but the first word of a string
The following formula returns the contents of cell A1, except for the first word:
If cell A1 contains 2016 Operating Budget, the formula returns Operating Budget.
The following formula, which uses the IFERROR function, returns the entire contents of cell A1 if the cell doesn't have a space character:
Here's a modification that works in all versions of Excel:
=IF(ISERR(FIND(" ",A1)),A1,RIGHT(A1,LEN(A1)-FIND(" ",A1,1)))
Extracting first names, middle names, and last names
Suppose you have a list consisting of people's names in a single column. You have to separate these names into three columns: one for the first name, one for the middle name or initial, and one for the last name. This task is more complicated than you may think because it must handle the situation for a missing middle initial. However, you can still do it.
The task becomes a lot more complicated if the list contains names with titles (such as Mr. or Dr.) or names followed by additional details (such as Jr. or III). In fact, the following formulas will not handle these complex cases. However, they still give you a significant head start if you're willing to do a bit of manual editing to handle special cases. For a way to remove these titles, see the next section, “Removing titles from names.”
The following formulas all assume that the name appears in cell A1.
You can easily construct a formula to return the first name:
This formula returns the last name:
The next formula extracts the middle name and requires that you use the other formulas to extract the first name and the last name. It assumes that the first name is in B1 and the last name is in D1. Here's what it looks like:
As you can see in Figure 11.5, the formulas work fairly well. There are a few problems, however, notably names that contain one word or four words. But, as I mentioned earlier, you can clean up these cases manually.
Figure 11.5 This worksheet uses formulas to extract the first name, last name, and middle name (or initial) from a list of names in column A.
This workbook, named extract names.xlsx, is available on this book's website at www.wiley.com/go/excel2016bible.
Excel provides two methods to extract text from strings without using formulas: the Text to Columns feature and the Flash Fill feature). Refer to Chapter 32, “Importing and Cleaning Data,” for more information about these features.
Removing titles from names
You can use the following formula to remove three common titles (Mr., Ms., and Mrs.) from a name. For example, if cell A1 contains Mr. Fred Munster, the formula would return Fred Munster:
N(A1) -FIND(" ",A1)),A1)
Creating an ordinal number
An ordinal number is an adjective form of a number. Examples include 1st, 2nd, 5th, 23rd, and so on.
The formula that follows displays the value in cell A1 as an ordinal number:
The formula is rather complex because it must determine whether the number will end in th, st, nd, or rd. This formula also uses literal arrays (enclosed in brackets).
See Chapter 18, “Performing Magic with Array Formulas,” for more on literal arrays.
Counting the number of words in a cell
The following formula returns the number of words in cell A1:
=LEN(TRIM(A1))-LEN(SUBSTITUTE( (A1)," ",""))+1
The formula uses the TRIM function to remove excess spaces. It then uses the SUBSTITUTE function to create a new string (in memory) that has all the space characters removed. The length of this string is subtracted from the length of the original (trimmed) string to get the number of spaces. This value is then incremented by 1 to get the number of words.
Note that this formula will return 1 if the cell is empty. The following modification solves that problem: