CompTIA Linux+ / LPIC-1 Cert Guide (Exams LX0-103 & LX0-104/101-400 & 102-400) (2016)
Chapter 11. Customizing Shell Environments
This chapter covers the following topics:
Working Within the Shell
Extending the Shell
Localization and Internationalization
This chapter covers the following exam topics:
Customize and use the shell environment: 105.1
Localization and Internationalization: 107.3
The shell lets you wield amazing power over your systems. You can perform simple tasks, such as copying files and running programs. You can combine many tasks into one, perform repetitive tasks with a few keystrokes, and even offload simple decision making to the shell. At first glance it’s an imposing interface. With a bit of knowledge you can start using these advanced features.
“Do I Know This Already?” Quiz
The “Do I Know This Already?” quiz enables you to assess whether you should read this entire chapter or simply jump to the “Exam Preparation Tasks” section for review. If you are in doubt, read the entire chapter. Table 11-1 outlines the major headings in this chapter and the corresponding “Do I Know This Already?” quiz questions. You can find the answers in Appendix A, “Answers to the ‘Do I Know This Already?’ Quizzes and Review Questions.”
Table 11-1 “Do I Know This Already?” Foundation Topics Section-to-Question Mapping
1. Which are proper ways of assigning the value bar to the environment variable FOO? (Choose two.)
a. set FOO=bar
b. export FOO=bar
c. FOO=bar
d. export FOO bar
2. Consider the following script:
#!/bin/bash
echo You are $AGE years old.
You run the following:
$ AGE=42
$ ./script
You are years old.
Why did the script not display the age?
a. AGE should be in lowercase.
b. The script should have been run as . ./script.
c. The script was not marked as executable.
d. You should have set age with the set keyword.
3. Which of the following scripts are sourced when you have a non-login session?
a. ~/.bashrc
b. ~/.bash_profile
c. ~/.profile
d. /etc/profile
4. Which of the following are true about Bash alias and functions? (Choose two.)
a. Both can save you typing.
b. Both can accept parameters.
c. Functions can span multiple lines while aliases can’t.
d. Functions must be exported.
5. You’re about to call a friend in Thunder Bay but you don’t know what time zone they are in. How could you tell what time it is in Thunder Bay?
a. tzselect
b. TZ=America/Thunder_Bay date
c. LC_TIME=America/Thunder_Bay date
d. date --timezone=America/Thunder_Bay
6. One of your coworkers in the Asia-Pacific office sent you a file containing instructions on how to set up the new application he is working on. When you look at the file in your editor, it’s a mess. You run file on the message and see:
message: Big-endian UTF-16 Unicode text
How can you decode this file?
a. It’s impossible to decode; the coworker must send it in ISO-8859.
b. LANGUAGE=en_US.UTF-16 cat message
c. tr -d UTF-16 < message
d. iconv -f UTF-16 -t ASCII message
Foundation Topics
Working Within the Shell
The shell is the desktop of the text interface. The shell takes the commands you enter, sends them to the operating system, and displays the results back to you. Along the way the shell implements features such as globbing, which matches files with wildcards, which makes smaller commands more powerful. You can also add your own commands to the shell to further cut down on your work.
So far in this book you’ve seen many shell commands such as those to manipulate files. Here’s where you learn how to customize the shell environment to make what you’ve learned so far more powerful.
Unless you change the defaults, the Linux shell is normally the bash shell. Of course, many other shells exist. A partial list is shown in Table 11-2.
Table 11-2 A Sampling of Linux Shells
Among other pieces of information, the user’s default shell is specified in the /etc/passwd entry for that user. If the shell specified does not exist, the user gets the bash shell by default.
Special shells can be specified, such as /bin/false (which returns a nonzero error code, effectively blocking access by a user attempting to log in) or /sbin/nologin (which also blocks logins but gives a polite message).
Choosing a shell is a personal matter. Most shells are related to one another somehow. bash, the Bourne Again Shell, is an improvement on the Bourne shell, which itself is an improvement on the original Thompson shell from Unix. The Korn shell is derived from Bourne but implements some of the good stuff from the C-shell.
This book will describe the use of bash. The concepts can be applied to all other shells.
Environment Variables
A variable is a bucket that stores information. If you need to store the process ID of that web server you just launched, put it in a variable called WEBSERVER_PID and it’ll be there when you need it next, as shown in Example 11-1.
Example 11-1 Example of a Variable
$ WEBSERVER_PID=29565
$ echo $WEBSERVER_PID
29565
$ ps -ef | grep $WEBSERVER_PID
nginx 14747 29565 0 Apr10 ? 00:04:09 nginx: worker process
root 29565 1 0 Mar27 ? 00:00:00 nginx: master process
The first line assigns the value 29565 to the variable called WEBSERVER_PID. Prefixing the variable name with a dollar sign tells the shell that it is to substitute the value of the variable at that spot. The next two commands in the preceding example use the variable by first displaying it with the echo command and then looking for processes containing that number with ps and grep.
Remember: Use the variable name by itself for assignment. Prepend a dollar sign to use the value inside the variable. It’s also important that there are no spaces around the equals sign.
A variable name can be upper- or lowercase and contain letters, numbers, and the underscore (_). By convention, anything you set in the shell is in uppercase. Variables within a script are lowercase or mixed case.
Variables can have meaning to you, or they can have meaning to the shell. $HOME is set automatically by the shell to the current user’s home directory, so that first command above was redundant!
If you have spaces in the right side of the assignment, you need to escape the space or quote it. These two commands have the same effect:
$ GREETING="Hello World"
$ GREETING=Hello\ World
Variable Scope
A confusing property of environment variables is that setting them in one shell applies only to that shell and not parents, children, or other shells. For example, if you had a program that looked at an environment variable called WORK_FACTOR to see how much work it could do, this wouldn’t work:
$ WORK_FACTOR=3
$ ./my_command
This is because running ./my_command starts a child process with a new environment. Variables you set aren’t passed to children. You need to export that variable:
$ WORK_FACTOR=3
$ export WORK_FACTOR
$ ./my_command
The export command marks the variable as something that gets passed to children. There are less verbose ways of accomplishing this: $ export WORK_FACTOR=3.
The export keyword can be used on the same line as the assignment. Alternatively, if you just want to pass the variable to the child and not set it in the current environment, you can set it on the same line as the command:
$ WORK_FACTOR=3 ./my_command
$ echo $WORK_FACTOR
$
The echo shows that even though you set WORK_FACTOR to 3 for the child, the current process didn’t keep it around.
By itself, export prints a list of variables currently exported to child environments.
Setting Variables from a Child
A common pattern is to put the assignments in a separate configuration file and use that file in a script or the command line. This adds consistency and makes commands easier to type. However the variables would be set in the child environment, which is thrown away when the process exits.
Given a configuration file called config containing an assignment; WORK_FACTOR=3.
If you were to run that, your current shell wouldn’t know about the variable:
$ ./config
$ echo $WORK_FACTOR
$
What you need is to make the assignments in the current environment. This is the job of source:
$ source ./config
$ echo $WORK_FACTOR
3
Sourcing a file executes it in the current shell environment instead of creating a separate child environment. Thus variables set in the script are available in the current environment. If the script didn’t export the variable, it will not be available to child processes later on.
There is an alternate way to source a file, which is to replace the word source with a period:
$ . ./config
Setting and Unsetting Variables
Bash has two built-in commands, set and unset, that are related in some sense but in another sense are just confusing.
The easy one is unset. Example 11-2 shows how to use this to destroy a variable.
Example 11-2 Using unset to Destroy a Variable
$ FOO=bar
$ echo $FOO
bar
$ unset FOO
$ echo $FOO
Note that when you’re unsetting, refer to the variable name without the dollar sign.
set is a multitalented command. By itself it shows all variables and functions in the environment. Contrast this to export, which only shows exported variables.
The next use of set is to enable or disable features in the shell. For example, set -x tells the shell to print each command as it is executed. At the shell this isn’t that useful, but you can use this in a shell script for debugging. To revert the behavior, use set +x. The alternate way to set this is withset -o xtrace.
A useful option is called noclobber, which is enabled with set -C or set -o noclobber, as shown in Example 11-3. This tells the shell not to overwrite existing files with redirects.
Example 11-3 Using noclobber
$ echo hi > test
$ echo hi > test
$ set -o noclobber
$ echo hi > test
-bash: test: cannot overwrite existing file
The first command in the preceding example puts the string hi into a file named test. It’s done once more to show that the shell will let you overwrite a file. The noclobber option is enabled and the same test is run, which results in an error. To disable noclobber run set +o noclobber.
The final use of set is to assign positional parameters. Within the shell are reserved variable names. Among those are $1 and other numbered variables, which are used to pass information to scripts from the command line and other functions. Anything not passed to set and recognized as an option is assigned to one of these positional parameters:
$ set a b c d e f g h i j k l m
$ echo $1
a
$ echo ${10}
j
Note the curly braces when trying to expand $10. Without them the shell thinks you’re trying to expand $1 and then appending a zero, so you would end up with a0.
Subshells
You’ve already seen how parent and child shells interact with respect to environment variables. There are times when you may want to temporarily use child shells to ensure you have a clean slate or to temporarily set some variables without changing your current shell.
The easiest way to get a subshell is to enclose the commands in parentheses. Copy Example 11-4 into a file called test.sh and run it.
Example 11-4 A Shell Script to Demonstrate Subshells
echo "In parent $BASH_SUBSHELL pid $$"
FOO=bar
echo $FOO
(FOO=baz; echo -n "In subshell $BASH_SUBSHELL "; echo $$)
echo $FOO
The first thing the script does is write some information to the console. The two variables are built in to Bash: $BASH_SUBSHELL gives the current subshell level and $$ gives the process id of the current process.
After that the script sets a variable FOO and prints it to the screen.
Following that is the script again inside a subshell. FOO is assigned a new value and the current values of BASH_SUBSHELL and the pid are printed. A subshell can span multiple lines, but for compactness the commands are separated with semicolons (;).
Finally, the script prints the contents of FOO. Running the script produces the output shown in Example 11-5.
Example 11-5 A Script to Demonstrate BASH_SUBSHELL Levels
$ sh test.sh
In parent 0 pid 11966
bar
In subshell 1 11966
bar
The value of FOO hasn’t changed in the parent shell because it was only changed in the subshell. Also, the subshell didn’t spawn a new process.
The env Wrapper
The subshell is a method to make sure some work doesn’t pollute your current environment. Sometimes you want to do some work in a known environment, which is where env comes in.
You’ve already seen an example of this need with the WORK_FACTOR=3 ./my_command example.
The env command, and note that it’s a command not built into the shell, allows you to modify the environment prior to running a command. By itself it displays all environment variables.
The most common use of env is at the beginning of a shell script. A script usually starts with the shebang line that tells Linux which interpreter to use: #!/usr/bin/ruby.
Coming from the first two characters, hash (#) and bang (!), the interpreter that follows the two characters is used to run the script. In this case it’s /usr/bin/ruby. But what if ruby is somewhere other than /usr/bin? Using #!/bin/env ruby gets the shell to search for Ruby in the PATH (more on this later).
Another use is to wipe out the current environment before continuing. For example, to ensure a predictable PATH you can wipe out the environment and then run a new shell:
$ env -i sh -c 'echo $PATH'
/usr/local/bin:/bin:/usr/bin
The current environment is kept if -i is not used:
$ env sh -c 'echo $PATH'
/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/
sean/bin
As running env by itself prints the whole environment, you can verify that env –i wipes the environment first:
$ env -i env
$
Extending the Shell
The shell is customizable. If you want to change the way certain commands operate, such as to always request confirmation when the root user deletes a file, you can do that. If you don’t like the way the prompt looks you can change it. If you have a few commands you use repeatedly, you can reduce them down to a single word.
Global and User Settings
When users log in to the system, either from a remote shell session or via the console, a series of scripts are run that can modify the environment. Some of the scripts are global to all users on the system and some exist only for a single user. These scripts are all sourced so that they can make changes to the user’s environment.
The scripts that get run depend on how the shell was invoked.
A Login Shell Session
A login shell is one that is executed when logging in to the system.
The /etc/profile script is the global configuration file that affects all users’ environments if they use the bash shell. It’s sourced (read) every time a user performs a login shell. This file is a script and is executed right before the user’s profile script. After this, any files inside of /etc/profile.dare sourced.
The user’s ~/.bash_profile script, if it exists, is the next script sourced. This file contains variables, code, and settings that directly affect that user’s—and only that user’s—environment. If .bash_profile doesn’t exist, the shell looks for .bash_login or .profile, stopping once a match has been found.
The .bash_profile, or alternative script if found, sources .bashrc. Note that this isn’t behavior in the shell; it’s a convention that makes everything easier later.
When the user logs out, the shell sources the .bash_logout file. This file is used to issue the clear command, so text from any previous command is not left on the user’s screen after he logs out. It can also clean up anything that may have been launched as part of the session.
Be careful on the exam because a lot of test-takers do not pick the .bash_logout file as part of the user’s login session. It’s definitely one of the more missed elements in the shell section.
An example of the user’s login session might be the following:
1. The user logs in with a username and password.
2. The /etc/profile is sourced.
3. Files under /etc/profile.d are sourced.
4. The user’s ~/.bash_profile is sourced.
5. The user’s ~/.bashrc is sourced from within the ~/.bash_profile.
6. The user conducts her business.
7. The user initiates a logout with the logout or exit command or by pressing Ctrl+D.
8. The user’s .bash_logout script is sourced.
A Non-Login Shell Session
Non-login shell sessions are typically the root user using the su command to temporarily become another user or a sysadmin using su to become the root user without loading the entire environment of the root or other user. Non-login shells are also started when you open new terminals from within a graphical session.
Note
The su command creates a non-login shell. If you need a login shell, place a dash after the su command, for example, su - sean.
When a user executes a non-login session, the only file sourced is the target account’s ~/.bashrc file. (On Red Hat machines, the first action in the ~/.bashrc is to source the /etc/bashrc file if it exists. Other distributions have different files that they run.)
Upon exiting that shell, no logout files or scripts are sourced, nor are the source account’s scripts run again.
The PATH
When you run a command the shell looks through the directories in the PATH variable to find the program to run. This lets you run /usr/local/bin/firefox just by typing firefox. Without the path you would always have to qualify your commands, such as to run ./firefox, /usr/local/bin/firefox, or ../bin/firefox depending on which directory you were in.
The path is a series of directories separated by colons (:) and is stored in an environment variable called PATH.
$ echo $PATH
/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/
sean/bin:/home/sean/cxoffice/bin:/opt/IBMJava2-142/jre/bin/
The shell looks through each directory in turn, stopping with the first match, or gives you a command not found error if none of the directories contain a match. If you give a relative or fully qualified path name the path is ignored.
Your system will have a default path set, so usually you want to add to it instead of replacing it. In your .bash_profile or .bashrc add export PATH=$HOME/bin:$PATH.
As $HOME expands to be your home directory, and $PATH expands to be the current path. This example prepends the bin directory underneath your home directory to the search path.
There is no implicit check for the current directory in your path. You must explicitly run ./program, /full/path/to/program, or add . to your path to run programs from your current directory. Make the shell check the current directory last with export PATH=$PATH:.
Again, $PATH expands to the current path, and then you’re appending the current directory.
Putting the current directory in the path can open up security holes. The thinking is that if someone has an evil program called ls in his home directory or maybe in /tmp and the root user is in that directory and runs ls, the evil program could run instead of /bin/ls.
Aliases and Functions
Aliases and functions are two related features that let you run more complicated commands with fewer keystrokes.
An alias replaces part of your command with something else. If you wanted to save a few keystrokes on ls -l you could make an alias: alias ll=”ls –l”.
Any time you type ll by itself it will be expanded to ls -l. For example ll, ll -h, or ll *.doc are all good uses, giving you a long listing, a long listing with human readable sizes, and a long listing of all the .doc files, respectively. Typing all will not trigger the alias because aliases only operate on distinct words.
Aliases take priority over searching the path so you can make sure you always include options when running a command by redefining the command: alias rm=”rm –I”.
This redefines the rm command to always run rm -i, which prompts the user to delete instead of doing it silently.
You can defeat the alias temporarily by either fully qualifying your command (/bin/rm) or escaping it (\rm). It’s assumed that if you’re smart enough to override an alias, you’re smart enough to know the consequences of what you’re doing!
Functions
Functions are invoked just like aliases except that they run Bash code instead of a simple substitution. If you wanted to take a complicated directory copy command like tar -cf - * | (cd somedir && tar -xf -) and vary the name of somedir, an alias won’t do it.
The form of a function is function name() { commands }. The commands can span multiple lines. So the function to copy a directory using the tar command is
function dircp () {
tar -cf - * | ( cd $1 && tar -xf - )
}
The special variable, $1, is expanded to the first option passed to the function. $2 is the second, and so forth. You can refer to all the parameters at once with $*.
You then call the function as if it were a command, such as with dircp /tmp/.
Unlike aliases the function must appear at the beginning of the command, and the rest of the command up until any redirections or semicolons is passed as arguments. You’re creating a mini program instead of a simple substitution.
PS1
If you’re going to spend a lot of time at a shell you might as well enjoy it, right? You can embed information into your command prompt to give you context. Do you work on multiple servers throughout the day? Put the name of the server in the command prompt and never face the embarrassment of running the right command on the wrong server!
The command prompt is stored in an environment variable called PS1.
$ PS1="Your wish master? "
Your wish master?
While that command prompt is sure to stroke your ego it is not that informative. Bash implements several special macros that are expanded each time the prompt is rendered.
$ PS1="\h:\w\$ "
bob:~/tmp$
In the preceding example the \h is expanded to the short name of the host, the \w becomes the current directory, and the \$ is $ for regular users and # for root.
The list of possible macros are as follows:
Note
It’s unreasonable to need to memorize all these options for an exam, so this list is for reference. You will want to know how to construct a PS1 command line and know that the macros begin with a backslash.
Adding More Dynamic Content
Is what you want not there? You can run commands or functions in your prompt, as shown in Example 11-6.
Example 11-6 Running Commands and Functions in Your Prompt
$ function logged_in_users() {
> w_lines=$(w | wc -l)
> echo -n $(( $w_lines - 2))
> }
$ PS1='$(id) $(logged_in_users) users logged in\$ '
uid=500(sean) gid=500(sean) 2 users logged in$
It’s important to use single quotes in this case; otherwise, the command will be run before the variable is set and it will always show the same value. The $() operator runs the command inside the parentheses and is discussed more in Chapter 12, “Shell Scripting.”
It should go without saying to be careful when running commands in your shell prompt. There are many good situations in which to use this feature, but it can also slow you down as it runs commands on each showing of the prompt. You cannot assume you know which directory you’re in, so any paths should be fully qualified.
PS2
PS2 is another prompt you can change in your shell. Its contents are displayed when you type a command that spans more than one line.
$ PS2="more? "
$ ls |
more?
This one is not used as much as PS1 but can be helpful for novice users for which > is not a good indication that the shell is waiting for more input.
Creating New Users (skeleton)
Now that you have some helpful and cool additions to your Bash startup scripts you may want to share them with users. There are two ways to do this: Move the code into the systemwide /etc/profile or copy the files over when you create a user.
The first option is possible but would make it difficult for users to override what you did if they have something that suits them better. You also don’t want to make big changes on existing users.
That leaves making the changes part of the new user setup. One of the things that the useradd tool does when it creates a user is to copy the skeleton directory over to the new user’s home directory. By default this is /etc/skel, as shown in Example 11-7, though you can override this in/etc/default/useradd.
Example 11-7 Using /etc/skel
# ls -la /etc/skel/
total 64
drwxr-xr-x. 4 root root 4096 Mar 19 2012 .
drwxr-xr-x. 197 root root 20480 Mar 5 07:47 ..
-rw-r--r-- 1 root root 18 Jun 22 2011 .bash_logout
-rw-r--r-- 1 root root 193 Jun 22 2011 .bash_profile
-rw-r--r-- 1 root root 124 Jun 22 2011 .bashrc
-rw-r--r-- 1 root root 201 Feb 23 2012 .kshrc
The default skeleton directory includes sample bash logout, profile, and rc files, and a ksh default script. This varies from distribution to distribution, however.
Any changes you make to these files are copied over when you create a new user. You can also add new files or get rid of items.
Localization and Internationalization
You’ll have particular ways of writing numbers, currency, and time wherever you live. In the United States it’s convention to write the date with the month first. Travel north to Canada, and you’ll see the month in different spots. North America is standardized on commas to group the thousands in a number and the period separates the integers from the decimals. Fly over to France and they use commas to separate the integers from the decimals. And everyone has their own currency!
Internationalization and localization are two concepts that allow a computer to store information one way but display it in a way that suits the conventions of the user. Internationalization is the feature that allows a system to display information in different ways, and localization is the process that bundles up all the regional changes for a single location into a locale.
Time Zones
The most readily visible localization features have to do with time. Every location on the planet belongs to a time zone. A time zone is defined as an offset from Universal Coordinated Time (UTC). UTC+0 is also known as Greenwich Mean Time (GMT) because it’s centered around Greenwich, London.
Many of those locations observe some form of daylight saving time (DST). DST is a system where clocks move ahead an hour in the summer to take advantage of longer nights. Each DST zone sets the dates when the hour is added and removed.
A Linux machine may be physically located in one time zone, but the users may connect remotely from other time zones. Unix has a simple solution to this problem: All timestamps are stored in UTC, and each user is free to set the time zone of her choosing. The system then adds or removes the time zone offset before displaying the value to the user.
Unix stores time as seconds since midnight UTC of January 1, 1970. This special date is known as the epoch, or the birthday of Unix. As of early 2015 the current Unix timestamp was over 1.4 billion.
Displaying Time
A date and time must include the time zone to have enough meaning to be compared with other dates and times. The date command shows the current date and time.
$ date
Sun Mar 8 21:15:01 CDT 2015
The time zone is displayed as CDT, Central Daylight Time, which is UTC-5. You can specify different formats with the plus sign (+) and percent encodings:
$ date +"%Y-%m-%dT%H:%M:%z"
2015-03-08T21:19:-0500
Instead of the time zone as a word, this date format uses the offset itself in a format known as ISO8601. With date -u the time is displayed in UTC, which is written as +0000 or sometimes Z, short for Zulu, a way of referring to UTC in military and aviation circles.
The percent encodings in the date commands each have special meaning:
%Y—Four-digit year
%m—Two-digit month
%d—Two-digit day
%H—Two-digit hour in 24-hour time
%M—Two-digit minute
%z—Time zone offset
Other characters, such as the T, colon, and dashes, are displayed as is. Run man date to get a full list of encodings.
Setting Time Zones
The configuration for each time zone, and by extension any daylight savings time, is stored in the zoneinfo files under /usr/share/zoneinfo. For example, /usr/share/zoneinfo/America/Winnipeg has all the configuration for the city of Winnipeg. Unlike most configuration files in Linux, these files contain binary data and can’t be directly viewed.
The system time zone is stored in a file called /etc/localtime. It is either a copy of the appropriate zoneinfo or a symlink to the appropriate file, depending on the distribution. Symlinks are typically used so that it’s clear which time zone is in use. For example you could set your current time zone to Winnipeg with:
ln -sf /usr/share/zoneinfo/America/Winnipeg /etc/localtime
Users who don’t have a time zone set get this zone as their default. They can override the setting through the TZ environment variable.
# date +%z
-0500
# TZ=Asia/Hong_Kong date +%z
+0800
In the preceding example the system time zone is UTC-5 in the first command but is overridden just for that command to the Hong Kong time zone in the second. The user could just as easily set TZ in her .bash_profile file to make the changes persistent.
Your distribution includes tools to set the time zone from a menu. Depending on the distribution, this may be tzselect, tzconfig, or dpkg-reconfigure tzdata. The tzselect command merely helps you find the name of the time zone you want and leaves the work of making it permanent up to you. The other two commands make the changes to the /etc/localtime file for you.
Additionally, your distribution may store the time zone as a word in other files. Two such examples are /etc/timezone for Debian and /etc/sysconfig/clock for Red Hat.
Character Encoding
Once upon a time computers used ASCII, the American Standard Code for Information Interchange. ASCII encodes characters into 7 bits, which gives 128 possible characters. This would be fine if all you ever used was English, but eventually the need for accented characters in different languages filled up the possible characters.
Most systems store information in at least 8 bits so computers started using the previously ignored bit to store special characters, giving 128 new spots for a character.
Vendors then started making their own character sets in the spots not already used by the English characters and punctuation, calling them code pages. ASCII character 200 might be an accented N in one code page and a Greek letter in another code page. If you wanted to use different characters, you had to switch to a different code page.
Some of these code pages were codified in the ISO-8859 standard, which defines the standard code pages. ISO-8859-1 is the Latin alphabet with English characters, ISO-8859-9 is the Latin alphabet with Turkish characters. Confusingly, ISO-8859-3 has some of the Turkish characters along with some other languages.
In the early 1990s it was clear that this was a mess and people got together to come up with a new standard that could handle everything. Thus Unicode, a universal encoding, was born.
Unicode defines each possible character as a code point, which is a number. The original ASCII set is mapped into the first 127 values for compatibility. Originally Unicode characters were encoded into 2 bytes giving 16,000 or so possible characters. This encoding, called UCS-2 (2-byte universal character set), ended up not being able to hold the number of characters needed when you look at all the languages and symbols on the planet. UTF-16 (16-bit Unicode Transformation Format) fixed this by allowing anything over 16K to be represented with a second pair of bytes.
Around the same time UTF-8 was being developed. The 2 bytes per character minimum is not compatible with existing ASCII files. The UTF-8 encoding type allows from 1 to 6 bytes to be used to encode a character, with the length of the character cleverly encoded in the high order bits of the number. UTF-8 is fully compatible with the original 127 characters but can still represent any Unicode code point.
UTF is by and large the dominant encoding type.
Representing Locales
Each locale is represented in terms of two or three variables:
Language code (ISO 639)
Country code (ISO 3166)
Encoding (optional)
It may seem odd to have both a language and a country but consider that multiple languages may be spoken in a country and that two countries sharing a common language may speak different dialects. Just ask anyone from France what they think about how French is spoken in Quebec, Canada!
Thus, the language and country are different. ISO 639 describes language names, such as en for English, de for German, or es for Spanish. ISO 3166 is for the country. While Germany happens to be DE for both language and country, that’s not always the case. The United States and Canada who both speak English are US and CA, respectively.
The encoding further describes how the characters are stored in the locale file. A particular locale file may use the old ISO-8859 encoding or the more robust Unicode, and even within Unicode there are multiple variants such as UTF-8, UTF-16, or UTF-32.
American English is in the en_US.UTF-8 locale, and Spanish is in es_ES.utf8. See what locales are installed on your system with the locale -a command, as shown in Example 11-8.
Example 11-8 Using the locale –a Command to See the Locales Installed on Your System
# locale -a
C
C.UTF-8
en_AG
en_AG.utf8
en_AU.utf8
en_BW.utf8
en_CA.utf8
en_DK.utf8
en_GB.utf8
... output omitted ...
es_ES.utf8
es_GT.utf8
es_HN.utf8
es_MX.utf8
es_NI.utf8
POSIX
Fallback Locales
Sometimes you don’t want to deal with locales especially if you’re writing a script that deals with output of other programs, which could change based on the user’s locale. In this case you can temporarily use the C locale. C, which can also be called POSIX, is a generic 8-bit ASCII locale.
Contents of a Locale
Each locale file contains instructions on how to display or translate a variety of items:
Addresses—Ordering of various parts, zip code format
Collation—How to sort, such as the ordering of accented characters or if capitalized words are grouped together or separately from lowercase
Measurement—Display of various units
Messages—Translations for system messages and errors
Monetary—How currency symbols are displayed and named
Names—Conventions for displaying people’s names
Numeric—How to display numbers such as the thousands and decimal separators
Paper—Paper sizes used in the country
Telephone—How telephone numbers are displayed
Time—Date and time formats such as the ordering of year, month, and date, or 24-hour clock versus using AM and PM
These locale files are usually distributed with the operating system as separate packages to save on space. If you don’t need the translations, you can generate the rest of the items without installing any packages with locale-gen on systems that support it, as shown in Example 11-9.
Example 11-9 Using locale-gen
# locale-gen fr_FR.UTF-8
Generating locales...
fr_FR.UTF-8... done
Generation complete.
# locale -a | grep FR
fr_FR.utf8
How Linux Uses the Locale
Internationalization in Linux is handled with the GNU gettext library. If programmers write their applications with that library and annotate their messages correctly, the user can change the behavior with environment variables.
As multiple things that can be localized, such as numbers and messages, gettext has a series of environment variables that it checks to see which locale is appropriate. In order, these are
LANGUAGE
LC_ALL
LC_XXX
LANG
The LANGUAGE variable is only consulted when printing messages. It is ignored for formatting. Also, the colon (:) gives the system a list of locales to try in order when trying to display a system message. LC_ALL is a way to force the locale even if some of the other variables are set.
LC_XXX gives the administrator the power to override a locale for a particular element. For example, if LANG were set to en_US.UTF-8 the user could override currency display by setting LC_MONETARY. The locale command displays the current settings, as shown in Example 11-10.
Example 11-10 Using locale
# locale
LANG=en_CA.UTF-8
LANGUAGE=en_CA:en
LC_CTYPE="en_CA.UTF-8"
LC_NUMERIC="en_CA.UTF-8"
LC_TIME="en_CA.UTF-8"
LC_COLLATE="en_CA.UTF-8"
LC_MONETARY="en_CA.UTF-8"
LC_MESSAGES="en_CA.UTF-8"
LC_PAPER="en_CA.UTF-8"
LC_NAME="en_CA.UTF-8"
LC_ADDRESS="en_CA.UTF-8"
LC_TELEPHONE="en_CA.UTF-8"
LC_MEASUREMENT="en_CA.UTF-8"
LC_IDENTIFICATION="en_CA.UTF-8"
LC_ALL=
This example is from a typical English system. You can override just
parts of it:
# LC_TIME=fr_FR.UTF8 date
samedi 7 mars 2015, 23:11:23 (UTC-0600)
# LC_MESSAGES=fr_FR.UTF8 man
What manual page do you want?
# LANGUAGE='' LC_MESSAGES=fr_FR.UTF8 man
Quelle page de manuel voulez-vous ?
In the preceding example, the time setting is switched to the French locale and the date is displayed in French. The second command sets the messages setting to French, but the English variant is used because the higher priority LANGUAGE is set. A French error message is used onceLANGUAGE is set to nothing.
Converting Files Between Encodings
Sometimes you get a file encoded with an encoding you are not expecting, and that causes errors with your scripts. The iconv tool manipulates files. For example, if you get an ASCII file that contains funny characters you can convert the file into a UTF-8 file with those characters stripped out.
iconv -c -f ASCII -t UTF-8 datafile.txt > datafile.utf8.txt
In order, the options are
-c—Clear any unknown characters
-f ASCII—From ASCII
-t UTF-8—To UTF-8
iconv -l—Returns a list of all the available encodings
Exam Preparation Tasks
As mentioned in the section “How to Use This Book” in the Introduction, you have a couple of choices for exam preparation: the exercises here, Chapter 21, “Final Preparation,” and the practice exams on the DVD.
Review All Key Topics
Review the most important topics in this chapter, noted with the Key Topics icon in the outer margin of the page. Table 11-3 lists a reference of these key topics and the page numbers on which each is found.
Table 11-3 Key Topics for Chapter 11
Define Key Terms
Define the following key terms from this chapter and check your answers in the glossary:
variable
sourcing
positional parameters
login shell
non-login shell
alias
function
localization
internationalization
locale
time zone
Greenwich Mean Time
daylight saving time
epoch
ASCII
code pages
ISO-8859 standard
code point
UCS-2
UTF-16
UTF-8
Review Questions
The answers to these review questions are in Appendix A.
1. On a default Linux system, if a new user is created with user-related commands, what is the default shell?
a. vsh
b. ksh
c. bash
d. sh
2. Which file if present is sourced from the user’s ~/.bash_profile file during a normal default user’s login session?
a. /etc/profile
b. ~/.bashrc
c. /etc/bashrc
d. ~/.bash_logout
e. ~/.exrc
3. You want a particular variable that was just declared to be available to all subshells. Which command would you use to ensure this?
a. set
b. export
c. bash
d. source
4. You are in the user named user1’s home directory, which is not in the system’s path. You want to run an executable named tarfoo in the current directory. It won’t run when you type just the command. Which of the following executes the program? (Choose all that apply.)
a. ../.tarfoo
b. ./tarfoo
c. ~tarfoo
d. /home/user1/tarfoo
5. You are the sysadmin for a mid-sized company and have a particular user who consistently overwrites important files with meaningless content. The files cannot be set read-only because they are written to by the shell and its functions. Which of the following options, when used with the set command, fixes this problem?
a. append-only
b. noclobber
c. hashall
d. monitor
6. Tired of missing dot files such as .bash_profile when running a directory listing, you decide you want to always include the -a flag when running ls. Which command will be most helpful?
a. export LS_OPTS=”-a”
b. LS_OPTS=”-a”
c. alias ls=”ls –a”
d. alias ls=” –a”
7. You’re vacationing in Hawaii. Ignoring why you chose to bring your work computer on vacation, how will you change the time zone for everything on your system?
a. export TZ=US/Hawaii
b. ln -sf /usr/share/zoneinfo/US/Hawaii /etc/localtime
c. ln -sf /usr/share/zoneinfo/US/Hawaii /etc/timezone
d. echo “US/Hawaii” > /etc/profile
8. For a particular server you want to make sure all users have /usr/local/cad/bin in their search paths. Where is the best place to put this?
a. /etc/profile
b. ~/.bashrc
c. /etc/path
d. /usr/local/bin
9. Consider this output:
# locale
LANG=en_US.UTF-8
LC_TIME="es_ES.UTF-8"
LC_MESSAGES="en_CA.UTF-8"
LC_ALL="de_DE.UTF-8"
If you were to run the date command, which locale would be used for the formatting?
a. American (US) English
b. Spanish (ES)
c. Canadian (CA) English
d. German (DE)