Working with files and directories - Shells - Ubuntu 15.04 Server with systemd: Administration and Reference (2015)

Ubuntu 15.04 Server with systemd: Administration and Reference (2015)

Part V. Shells

Chapter 20. Working with files and directories

In Linux, all files are organized into directories that, in turn, are hierarchically connected to each other in one overall file structure. A file is referenced not according to just its name, but also according to its place in this file structure. You can create as many new directories as you want, adding more directories to the file structure. The Linux file commands can perform sophisticated operations, such as moving or copying whole directories along with their subdirectories. You can use file operations such as find, cp, mv, and ln to locate files and copy, move, or link them from one directory to another. Desktop file managers, such as Konqueror and Nautilus used on the KDE and GNOME desktops, provide a graphical user interface to perform the same operations using icons, windows, and menus (see Chapters 3). This chapter will focus on the commands you use in the shell command line to manage files, such as cp and mv. However, whether you use the command line or a desktop file manager, the underlying file structure is the same.

Though not part of the Linux file structure, there are also special tools you can use to access Windows partitions and floppy disks. These follow much the same format as Linux file commands.

Archives are used to back up files or to combine them into a package, which can then be transferred as one file over the Internet or posted on an FTP site for easy downloading. The standard archive utility used on Linux and UNIX systems is tar, for which several desktop graphical front ends exist. You have several compression programs to choose from, including GNU zip (gzip), Zip, bzip, and compress.

Linux Files

You can name a file using any letters, underscores, and numbers. You can also include periods and commas. Except in certain special cases, you should never begin a filename with a period. Other characters, such as slashes, question marks, or asterisks, are reserved for use as special characters by the system and should not be part of a filename. Filenames can be as long as 256 characters. Filenames can also include spaces, though to reference such filenames from the command line, be sure to encase them in quotes. On a desktop like GNOME or KDE you do not need to use quotes.

You can include an extension as part of a filename. A period is used to distinguish the filename proper from the extension. Extensions can be useful for categorizing your files. You are probably familiar with certain standard extensions that have been adopted by convention. For example, C source code files always have a .c extension. Files that contain compiled object code have an .o extension. You can, of course, make up your own file extensions. The following examples are all valid Linux filenames. Keep in mind that to reference the name with spaces on the command line, you would have to encase it in quotes as “New book review”:

preface
chapter2
9700info
New_Revisions
calc.c
intro.bk1
New book review

Special initialization files are also used to hold shell configuration commands. These are the hidden, or dot, files, which begin with a period. Dot files used by commands and applications have predetermined names, such as the .mozilla directory used to hold your Mozilla data and configuration files. Recall that when you use ls to display your filenames, the dot files will not be displayed. To include the dot files, you need to use ls with the -a option.

The ls -l command displays detailed information about a file. First, the permissions are displayed, followed by the number of links, the owner of the file, the name of the group to which the user belongs, the file size in bytes, the date and time the file was last modified, and the name of the file. Permissions indicate who can access the file: the user, members of a group, or all other users. The group name indicates the group permitted to access the file object. The file type for mydata is that of an ordinary file. Only one link exists, indicating the file has no other names and no other links. The owner’s name is chris, the same as the login name, and the group name is weather. Other users probably also belong to the weather group. The size of the file is 207 bytes, and it was last modified on February 20 at 11:55 A.M. The name of the file is mydata.

If you want to display this detailed information for all the files in a directory, simply use the ls -l command without an argument.

$ ls -l
-rw-r--r-- 1 chris weather 207 Feb 20 11:55 mydata
-rw-rw-r-- 1 chris weather 568 Feb 14 10:30 today
-rw-rw-r-- 1 chris weather 308 Feb 17 12:40 monday

All files in Linux have one physical format, a byte stream, which is simply a sequence of bytes. This allows Linux to apply the file concept to every data component in the system. Directories are classified as files, as are devices. Treating everything as a file allows Linux to organize and exchange data more easily. The data in a file can be sent directly to a device such as a screen because a device interfaces with the system using the same byte-stream file format used by regular files.

This same file format is used to implement other operating system components. The interface to a device, such as the screen or keyboard, is designated as a file. Other components, such as directories, are themselves byte-stream files, but they have a special internal organization. A directory file contains information about a directory, organized in a special directory format. Because these different components are treated as files, they can be said to constitute different file types. A character device is one file type. A directory is another file type. The number of these file types may vary according to your specific implementation of Linux. Five common types of files exist, however: ordinary files, directory files, first-in first-out (FIFO) pipes, character device files, and block device files. Although you may rarely reference a file’s type, it can be useful when searching for directories or devices.

Although all ordinary files have a byte-stream format, they may be used in different ways. The most significant difference is between binary and text files. Compiled programs are examples of binary files. However, even text files can be classified according to their different uses. You can have files that contain C programming source code or shell commands, or even a file that is empty. The file could be an executable program or a directory file. The Linux file command helps you determine what a file is used for. It examines the first few lines of a file and tries to determine a classification for it. The file command looks for special keywords or special numbers in those first few lines, but it is not always accurate. In the next example, the file command examines the contents of two files and determines a classification for them:

$ file monday reports
monday: text
reports: directory

If you need to examine the entire file byte by byte, you can do so with the od (octal dump) command, which performs a dump of a file. By default, it prints every byte in its octal representation. However, you can also specify a character, decimal, or hexadecimal representation. The od command is helpful when you need to detect any special character in your file or if you want to display a binary file.

The File Structure

Linux organizes files into a hierarchically connected set of directories. Each directory may contain either files or other directories. In this respect, directories perform two important functions. A directory holds files, much like files held in a file drawer, and a directory connects to other directories, much as a branch in a tree is connected to other branches. Because of the similarities to a tree, such a structure is often referred to as a tree structure.

The Linux file structure branches into several directories beginning with a root directory, /. Within the root directory, several system directories contain files and programs that are features of the Linux system. The root directory also contains a directory called /home that contains the home directories of all the users in the system. Each user’s home directory, in turn, contains the directories the user has made for their own use. Each of these can also contain directories. Such nested directories branch out from the user’s home directory.

Note: The user’s home directory can be any directory, though it is usually the directory that bears the user’s login name. This directory is located in the directory named /home on your Linux system. For example, a user named dylan will have a home directory called dylan located in the system’s /home directory. The user’s home directory is a subdirectory of the directory called /home on your system.

Home Directories

When you log in to the system, you are placed within your home directory. The name given to this directory by the system is the same as your login name. Any files you create when you first log in are organized within your home directory. Within your home directory, you can create more directories. You can then change to these directories and store files in them. The same is true for other users on the system. Each user has a home directory, identified by the appropriate login name. Users, in turn, can create their own directories.

You can access a directory either through its name or by making it your working directory. Each directory is given a name when it is created. You can use this name in file operations to access files in that directory. You can also make the directory your working directory. If you do not use any directory names in a file operation, the working directory will be accessed. The working directory is the one from which you are currently working. When you log in, the working directory is your home directory, which usually has the same name as your login name. You can change the working directory by using the cd command to move to another directory.

Pathnames

The name you give to a directory or file when you create it is not its full name. The full name of a directory is its pathname. The hierarchically nested relationship among directories forms paths and these paths can be used to identify and reference any directory or file uniquely or absolutely. Each directory in the file structure can be said to have its own unique path. The actual name by which the system identifies a directory always begins with the root directory and consists of all directories nested below that directory.

In Linux, you write a pathname by listing each directory in the path separated from the last by a forward slash. A slash preceding the first directory in the path represents the root. The pathname for the chris directory is /home/chris. If the chris directory has a subdirectory calledreports, then the full the pathname for the reports directory would be /home/chris/reports. Pathnames also apply to files. When you create a file within a directory, you give the file a name. The actual name by which the system identifies the file, however, is the filename combined with the path of directories from the root to the file’s directory. As an example, the pathname for monday is /home/chris/reports/monday (the root directory is represented by the first slash). The path for the monday file consists of the root, home, chris, and reports directories and the filenamemonday.

Directory

Function

/

Begins the file system structure, called the root.

/home

Contains users’ home directories.

/bin

Holds all the standard commands and utility programs.

/usr

Holds those files and commands used by the system; this directory breaks down into several subdirectories.

/usr/bin

Holds user-oriented commands and utility programs.

/usr/sbin

Holds system administration commands.

/usr/lib

Holds libraries for programming languages.

/usr/share/doc

Holds Linux documentation.

/usr/share/man

Holds the online Man files.

/var/spool

Holds spooled files, such as those generated for printing jobs and network transfers.

/sbin

Holds system administration commands for booting the system.

/var

Holds files that vary, such as mailbox files.

/dev

Holds file interfaces for devices such as the terminals and printers (dynamically generated by udev, do not edit).

/etc

Holds system configuration files and any other system files.

Table 20-1: Standard System Directories in Linux

Pathnames may be absolute or relative. An absolute pathname is the complete pathname of a file or directory beginning with the root directory. A relative pathname begins from your working directory; it is the path of a file relative to your working directory. The working directory is the one you are currently operating in. Using the previous example, if chris is your working directory, the relative pathname for the file monday is reports/monday. The absolute pathname for monday is /home/chris/reports/monday.

The absolute pathname from the root to your home directory can be especially complex and, at times, even subject to change by the system administrator. To make it easier to reference, you can use the tilde (~) character, which represents the absolute pathname of your home directory. You must specify the rest of the path from your home directory. In the next example, the user references the monday file in the reports directory. The tilde represents the path to the user’s home directory, /home/chris, and then the rest of the path to the monday file is specified.

$ cat ~/reports/monday

System Directories

The root directory that begins the Linux file structure contains several system directories that contain files and programs used to run and maintain the system. Many also contain other subdirectories with programs for executing specific features of Linux. For example, the directory/usr/bin contains the various Linux commands that users execute, such as lpl. The directory /bin holds system level commands. Table 20-1 lists the basic system directories.

Listing, Displaying, and Printing Files: ls, cat, more, less, and lpr

One of the primary functions of an operating system is the management of files. You may need to perform certain basic output operations on your files, such as displaying them on your screen or printing them. The Linux system provides a set of commands that perform basic file-management operations, such as listing, displaying, and printing files, as well as copying, renaming, and erasing files. These commands are usually made up of abbreviated versions of words. For example, the ls command is a shortened form of “list” and lists the files in your directory. Thelpr command is an abbreviated form of “line print” and will print a file. The cat, less, and more commands display the contents of a file on the screen. Table 20-2 lists these commands with their different options. When you log in to your Linux system, you may want a list of the files in your home directory. The ls command, which outputs a list of your file and directory names, is useful for this. The ls command has many possible options for displaying filenames according to specific features.

Displaying Files: cat, less, and more

You may also need to look at the contents of a file. The cat and more commands display the contents of a file on the screen. The name cat stands for concatenate.

$ cat mydata
computers

The cat command outputs the entire text of a file to the screen at once. This presents a problem when the file is large because its text quickly speeds past on the screen. The more and less commands are designed to overcome this limitation by displaying one screen of text at a time. You can then move forward or backward in the text at your leisure. You invoke the more or less command by entering the command name followed by the name of the file you want to view (less is a more powerful and configurable display utility).

$ less mydata

When more or less invoke a file, the first screen of text is displayed. To continue to the next screen, you press the F key or the SPACEBAR. To move back in the text, you press the B key. You can quit at any time by pressing the Q key.

Command or Option

Execution

ls

This command lists file and directory names.

cat filenames

This filter can be used to display a file. It can take filenames for its arguments. It outputs the contents of those files directly to the standard output, which, by default, is directed to the screen.

more filenames

This utility displays a file screen by screen. Press the SPACEBAR to continue to the next screen and q to quit.

less filenames

This utility also displays a file screen by screen. Press the SPACEBAR to continue to the next screen and q to quit.

lpr filenames

Sends a file to the line printer to be printed; a list of files may be used as arguments. Use the -P option to specify a printer.

lpq

Lists the print queue for printing jobs.

lprm

Removes a printing job from the print queue.

Table 20-2: Listing, Displaying, and Printing Files

Printing Files: lpr, lpq, and lprm

With the printer commands such as lpr and lprm, you can perform printing operations such as printing files or canceling print jobs (see Table 20-2 ). When you need to print files, use the lpr command to send files to the printer connected to your system. In the next example, the user prints the mydata file:

$ lpr mydata

If you want to print several files at once, you can specify more than one file on the command line after the lpr command. In the next example, the user prints out both the mydata and preface files:

$ lpr mydata preface

Printing jobs are placed in a queue and printed one at a time in the background. You can continue with other work as your files print. You can see the position of a particular printing job at any given time with the lpq command, which gives the owner of the printing job (the login name of the user who sent the job), the print job ID, the size in bytes, and the temporary file in which it is currently held.

If you need to cancel an unwanted printing job, you can do so with the lprm command, which takes as its argument either the ID number of the printing job, or the owner’s name. It then removes the print job from the print queue. For this task, lpq is helpful, for it provides you with the ID number and owner of the printing job you need to use with lprm.

Managing Directories: mkdir, rmdir, ls, cd, pwd

You can create and remove your own directories, as well as change your working directory, with the mkdir, rmdir, and cd commands. Each of these commands can take as its argument the pathname for a directory. The pwd command displays the absolute pathname of your working directory. In addition to these commands, the special characters represented by a single dot, a double dot, and a tilde can be used to reference the working directory, the parent of the working directory, and the home directory, respectively. Taken together, these commands enable you to manage your directories. You can create nested directories, move from one directory to another, and use pathnames to reference any of your directories. Those commands commonly used to manage directories are listed in Table 20-3 .

Command

Execution

mkdir directory

Creates a directory.

rmdir directory

Erases a directory.

ls -F

Lists directory name with a preceding slash.

ls -R

Lists working directory as well as all subdirectories.

cd directory name

Changes to the specified directory, making it the working directory. cd without a directory name changes back to the home directory:
$ cd reports

pwd

Displays the pathname of the working directory.

directory name/filename

A slash is used in pathnames to separate each directory name. In the case of pathnames for files, a slash separates the preceding directory names from the filename.

..

References the parent directory. You can use it as an argument or as part of a pathname.
$ cd ..
$ mv ../larisa oldarticles

.

References the working directory. You can use it as an argument or as part of a pathname.
$ ls .

~/pathname

The tilde is a special character that represents the pathname for the home directory. It is useful when you need to use an absolute pathname for a file or directory:
$ cp monday ~/today

Table 20-3: Directory Commands

Creating and Deleting Directories

You create and remove directories with the mkdir and rmdir commands. In either case, you can also use pathnames for the directories. In the next example, the user creates the directory reports. Then the user creates the directory articles using a pathname:

$ mkdir reports
$ mkdir /home/chris/articles

You can remove a directory with the rmdir command followed by the directory name. In the next example, the user removes the directory reports with the rmdir command:

$ rmdir reports

To remove a directory and all its subdirectories, you use the rm command with the -r option. This is a very powerful command and could easily be used to erase all your files. You will be prompted for each file. To simply remove all files and subdirectories without prompts, add the -foption. The following example deletes the reports directory and all its subdirectories:

rm -rf reports

Displaying Directory Contents

You have seen how to use the ls command to list the files and directories within your working directory. To distinguish between file and directory names, however, you need to use the ls command with the -F option. A slash is then placed after each directory name in the list.

$ ls
weather reports articles
$ ls -F
weather reports/ articles/

The ls command also takes as an argument any directory name or directory pathname. This enables you to list the files in any directory without first having to change to that directory. In the next example, the ls command takes as its argument the name of a directory, reports. Then the lscommand is executed again, only this time the absolute pathname of reports is used.

$ ls reports
monday tuesday
$ ls /home/chris/reports
monday tuesday
$

Moving Through Directories

The cd command takes as its argument the name of the directory to which you want to move. The name of the directory can be the name of a subdirectory in your working directory or the full pathname of any directory on the system. If you want to change back to your home directory, you need to enter only the cd command by itself, without a filename argument.

$ cd reports
$ pwd
/home/chris/reports

Referencing the Parent Directory

A directory always has a parent (except, of course, for the root). For example, in the preceding listing, the parent for reports is the chris directory. When a directory is created, two entries are made: one represented with a dot (.), and the other with double dots (..). The dot represents the pathnames of the directory, and the double dots represent the pathname of its parent directory. Double dots, used as an argument in a command, reference a parent directory. The single dot references the directory itself.

You can use the single dot to reference your working directory, instead of using its pathname. For example, to copy a file to the working directory retaining the same name, the dot can be used in place of the working directory’s pathname. In this sense, the dot is another name for the working directory. In the next example, the user copies the weather file from the chris directory to the reports directory. The reports directory is the working directory and can be represented with the single dot.

$ cd reports
$ cp /home/chris/weather .

The .. symbol is often used to reference files in the parent directory. In the next example, the cat command displays the weather file in the parent directory. The pathname for the file is the .. symbol (for the parent directory) followed by a slash and the filename.

$ cat ../weather
raining and warm

Tip: You can use the cd command with the .. symbol to step back through successive parent directories of the directory tree from a lower directory.

File and Directory Operations: find, cp, mv, rm, ln

As you create more and more files, you may want to back them up, change their names, erase some of them, or even give them added names. Linux provides several file commands that you can use to search for files, copy files, rename files, or remove files (see Table 20-5 ). If you have a large number of files, you can also search them to locate a specific one. The commands are shortened forms of full words, consisting of only two characters. The cp command stands for “copy” and copies a file, mv stands for “move” and renames or moves a file, rm stands for “remove” and erases a file, and ln stands for “link” and adds another name for a file, often used as a shortcut to the original. One exception to the two-character rule is the find command, which performs searches of your filenames to find a file. All these operations can be handled by the GUI desktops, like GNOME and KDE.

Searching Directories: find

Once a large number of files have been stored in many different directories, you may need to search them to locate a specific file, or files, of a certain type. The find command enables you to perform such a search from the command line. The find command takes as its arguments directory names followed by several possible options that specify the type of search and the criteria for the search; it then searches within the directories listed and their subdirectories for files that meet these criteria. The find command can search for a file by name, type, owner, and even the time of the last update.

$ find directory-list -option criteria

The -name option has as its criteria a pattern and instructs find to search for the filename that matches that pattern. To search for a file by name, you use the find command with the directory name followed by the -name option and the name of the file.

$ find directory-list -name filename

The find command also has options that merely perform actions, such as outputting the results of a search. If you want find to display the filenames it has located, you simply include the -print option on the command line along with any other options. The -print option is an action that instructs find to write to the standard output the names of all the files it locates (you can also use the -ls option instead to list files in the long format). In the next example, the user searches for all the files in the reports directory with the name monday. Once located, the file, with its relative pathname, is printed.

$ find reports -name monday -print
reports/monday

The find command prints out the filenames using the directory name specified in the directory list. If you specify an absolute pathname, the absolute path of the found directories will be output. If you specify a relative pathname, only the relative pathname is output. In the preceding example, the user specified a relative pathname, reports, in the directory list. Located filenames were output beginning with this relative pathname. In the next example, the user specifies an absolute pathname in the directory list. Located filenames are then output using this absolute pathname.

$ find /home/chris -name monday -print
/home/chris/reports/monday

Tip: Should you need to find the location of a specific program or configuration file, you could use find to search for the file from the root directory. Log in as the root user and use / as the directory. This command searched for the location of the more command and files on the entire file system: find / -name more -print.

Searching the Working Directory

If you want to search your working directory, you can use the dot in the directory pathname to represent your working directory. The double dots would represent the parent directory. The next example searches all files and subdirectories in the working directory, using the dot to represent the working directory. If your working directory is your home directory, this is a convenient way to search through all your own directories. Notice that the located filenames that are output begin with a dot.

$ find . -name weather -print
./weather

Command or Option

Execution

find

Searches directories for files according to search criteria. This command has several options that specify the type of criteria and actions to be taken.

-name pattern

Searches for files with the pattern in the name.

-lname pattern

Searches for symbolic link files.

-group name

Searches for files belonging to the group name.

-gid name

Searches for files belonging to a group according to group ID.

-user name

Searches for files belonging to a user.

-uid name

Searches for files belonging to a user according to user ID.

-size numc

Searches for files with the size num in blocks. If c is added after num, the size in bytes (characters) is searched for.

-mtime num

Searches for files last modified num days ago.

-newer pattern

Searches for files modified after the one matched by pattern.

-context scontext

Searches for files according to security context (SE Linux).

-print

Outputs the result of the search to the standard output. The result is usually a list of filenames, including their full pathnames.

-type filetype

Searches for files with the specified file type. File type can be b for block device, c for character device, d for directory, f for file, or l for symbolic link.

-perm permission

Searches for files with certain permissions set. Use octal or symbolic format for permissions.

-ls

Provides a detailed listing of each file, with owner, permission, size, and date information.

-exec command

Executes command when files found.

Table 20-4: The find Command

You can use shell wildcard characters as part of the pattern criteria for searching files. The special character must be quoted, however, to avoid evaluation by the shell. In the next example, all files (indicated by the asterisk, *) with the .c extension in the programs directory are searched for and then displayed in the long format using the -ls action:

$ find programs -name '*.c' -ls

Locating Directories

You can also use the find command to locate other directories. In Linux, a directory is officially classified as a special type of file. Although all files have a byte-stream format, some files, such as directories, are used in special ways. In this sense, a file can be said to have a file type. Thefind command has an option called -type that searches for a file of a given type. The -type option takes a one-character modifier that represents the file type. The modifier that represents a directory is a d. In the next example, both the directory name and the directory file type are used to search for the directory called travel:

$ find /home/chris -name travel -type d -print
/home/chris/articles/travel
$

File types are not so much different types of files, as they are the file format applied to other components of the operating system, such as devices. In this sense, a device is treated as a type of file, and you can use find to search for devices and directories, as well as ordinary files. Table 20-4 lists the different types available for the find command’s -type option.

You can also use the find operation to search for files by ownership or security criteria, like those belonging to a specific user or those with a certain security context. The -user option lets to locate all files belonging to a certain user. The following example lists all files that the userchris has created or owns on the entire system. To list those just in the users’ home directories, you would use /home for the starting search directory. This would find all those in a user's home directory as well as any owned by that user in other user directories.

$ find / -user chris -print

Copying Files

To make a copy of a file, you simply give cp two filenames as its arguments (see Table 20-5 ). The first filename is the name of the file to be copied—the one that already exists. This is often referred to as the source file. The second filename is the name you want for the copy. This will be a new file containing a copy of all the data in the source file. This second argument is often referred to as the destination file. The syntax for the cp command follows:

$ cp source-file destination-file

In the next example, the user copies a file called proposal to a new file called oldprop:

$ cp proposal oldprop

You could unintentionally destroy another file with the cp command. The cp command generates a copy by first creating a file and then copying data into it. If another file has the same name as the destination file, that file is destroyed and a new file with that name is created. By default, Ubuntu configures your system to check for an existing copy by the same name (cp is aliased with the -i option). To copy a file from your working directory to another directory, you need to use that directory name as the second argument in the cp command. In the next example, the proposalfile is overwritten by the newprop file. The proposal file already exists.

$ cp newprop proposal

You can use any of the wildcard characters to generate a list of filenames to use with cp or mv. For example, suppose you need to copy all your C source code files to a given directory. Instead of listing each one individually on the command line, you could use an * character with the .cextension to match on and generate a list of C source code files (all files with a .c extension). In the next example, the user copies all source code files in the current directory to the sourcebks directory:

$ cp *.c sourcebks

If you want to copy all the files in a given directory to another directory, you could use * to match on and generate a list of all those files in a cp command. In the next example, the user copies all the files in the props directory to the oldprop directory. Notice the use of a props pathname preceding the * special characters. In this context, props is a pathname that will be appended before each file in the list that * generates.

$ cp props/* oldprop

You can, of course, use any of the other special characters, such as ., ?, or []. In the next example, the user copies both source code and object code files (.c and .o) to the projbk directory:

$ cp *.[oc] projbk

When you copy a file, you can give the copy a name that is different from the original. To do so, place the new filename after the directory name, separated by a slash.

$ cp filename directory-name/new-filename

Command

Execution

cp filename filename

Copies a file. cp takes two arguments: the original file and the name of the new copy. You can use pathnames for the files to copy across directories:

cp-r dirname dirname

Copies a subdirectory from one directory to another. The copied directory includes all its own subdirectories:

mv filename filename

Moves (renames) a file. The mv command takes two arguments: the first is the file to be moved. The second argument can be the new filename or the pathname of a directory. If it is the name of a directory, then the file is literally moved to that directory, changing the file’s pathname:

mv dirname dirname

Moves directories. In this case, the first and last arguments are directories:

ln filename filename

Creates added names for files referred to as links. A link can be created in one directory that references a file in another directory:

rm filenames

Removes (erases) a file. Can take any number of filenames as its arguments. Literally removes links to a file. If a file has more than one link, you need to remove all of them to erase a file:

Table 20-5: File Operations

Moving Files

You can use the mv command to either rename a file or to move a file from one directory to another. When using mv to rename a file, you simply use the new filename as the second argument. The first argument is the current name of the file you are renaming. If you want to rename a file when you move it, you can specify the new name of the file after the directory name. In the next example, the proposal file is renamed with the name version1:

$ mv proposal version1

As with cp, it is easy for mv to erase a file accidentally. When renaming a file, you might accidentally choose a filename already used by another file. In this case, that other file will be erased. The mv command also has an -i option that checks first to see if a file by that name already exists.

You can also use any of the special characters to generate a list of filenames to use with mv. In the next example, the user moves all source code files in the current directory to the newproj directory:

$ mv *.c newproj

If you want to move all the files in a given directory to another directory, you can use * to match on and generate a list of all those files. In the next example, the user moves all the files in the reports directory to the repbks directory:

$ mv reports/* repbks

Note: The easiest way to copy files to a CD-R/RW or DVD-R/RW disc is to use the built-in Nautilus burning capability. Just insert a blank disk, open it as a folder, and drag-and-drop files on to it. You will be prompted automatically to burn the files.

Copying and Moving Directories

You can also copy or move whole directories at once. Both cp and mv can take as their first argument a directory name, enabling you to copy or move subdirectories from one directory into another (see Table 20-5 ). The first argument is the name of the directory to be moved or copied, and the second argument is the name of the directory within which it is to be placed. The same pathname structure used for files applies to moving or copying directories.

You can just as easily copy subdirectories from one directory to another. To copy a directory, the cp command requires you to use the -r option, which stands for “recursive.” It directs the cp command to copy a directory, as well as any subdirectories it may contain. In other words, the entire directory subtree, from that directory on, will be copied. In the next example, the travel directory is copied to the oldarticles directory. Now two travel subdirectories exist, one in articles and one in oldarticles.

$ cp -r articles/travel oldarticles
$ ls -F articles
/travel
$ ls -F oldarticles
/travel

Erasing Files and Directories: the rm Command

As you use Linux, you will find the number of files you use increases rapidly. Generating files in Linux is easy. Applications such as editors, and commands such as cp, can easily be used to create files. Eventually, many of these files may become outdated and useless. You can then remove them with the rm command. The rm command can take any number of arguments, enabling you to list several filenames and erase them all at the same time. In the next example, the file oldprop is erased:

$ rm oldprop

Be careful when using the rm command, because it is irrevocable. Once a file is removed, it cannot be restored (there is no undo). With the -i option, you are prompted separately for each file and asked whether you really want to remove it. If you enter y, the file will be removed. If you enter anything else, the file is not removed. In the next example, the rm command is instructed to erase the files proposal and oldprop. The rm command then asks for confirmation for each file. The user decides to remove oldprop, but not proposal.

$ rm -i proposal oldprop
Remove proposal? n
Remove oldprop? y
$

Links: the ln Command

You can give a file more than one name using the ln command. You might do this because you want to reference a file using different filenames to access it from different directories. The added names are often referred to as links. Linux supports two different types of links, hard and symbolic. Hard links are literally another name for the same file, whereas symbolic links function like shortcuts referencing another file. Symbolic links are much more flexible and can work over many different file systems, while hard links are limited to your local file system. Furthermore, hard links introduce security concerns, as they allow direct access from a link that may have public access to an original file that you may want protected. Links are usually implemented as symbolic links.

Symbolic Links

To set up a symbolic link, you use the ln command with the -s option and two arguments: the name of the original file and the new, added filename. The ls operation lists both filenames, but only one physical file will exist.

$ ln -s original-file-name added-file-name

In the next example, the today file is given the additional name weather. It is just another name for the today file.

$ ls
today
$ ln -s today weather
$ ls
today weather

You can give the same file several names by using the ln command on the same file many times. In the next example, the file today is assigned the names weather and weekend:

$ ln -s today weather
$ ln -s today weekend
$ ls
today weather weekend

If you list the full information about a symbolic link and its file, you will find the information displayed is different. In the next example, the user lists the full information for both lunch and /home/george/veglist using the ls command with the -l option. The first character in the line specifies the file type. Symbolic links have their own file type, represented by an l. The file type for lunch is l, indicating it is a symbolic link, not an ordinary file. The number after the term “group” is the size of the file. Notice the sizes differ. The size of the lunch file is only 4 bytes. This is because lunch is only a symbolic link—a file that holds the pathname of another file—and a pathname takes up only a few bytes. It is not a direct hard link to the veglist file.

$ ls -l lunch /home/george/veglist
lrw-rw-r-- 1 chris group 4 Feb 14 10:30 lunch
-rw-rw-r-- 1 george group 793 Feb 14 10:30 veglist

To erase a file, you need to remove only its original name (and any hard links to it). If any symbolic links are left over, they will be unable to access the file. In this case, a symbolic link would hold the pathname of a file that no longer exists.

Hard Links

You can give the same file several names by using the ln command on the same file many times. To set up a hard link, you use the ln command with no -s option and two arguments: the name of the original file and the new, added filename. The ls operation lists both filenames, but only one physical file will exist.

$ ln original-file-name added-file-name

In the next example, the monday file is given the additional name storm. It is just another name for the monday file.

$ ls
today
$ ln monday storm
$ ls
monday storm

To erase a file that has hard links, you need to remove all its hard links. The name of a file is actually considered a link to that file—hence the command rm that removes the link to the file. If you have several links to the file and remove only one of them, the others stay in place and you can reference the file through them. The same is true even if you remove the original link—the original name of the file. Any added links will work just as well. In the next example, the today file is removed with the rm command. However, a link to that same file exists, called weather. The file can then be referenced under the name weather.

$ ln today weather
$ rm today
$ cat weather
The storm broke today
and the sun came out.
$

Archiving and Compressing Files

Archives are used to back up files or to combine them into a package, which can then be transferred as one file over the Internet or posted on an FTP site for easy downloading. The standard archive utility used on Linux and Unix systems is tar, for which several GUI front ends exist. You have several compression programs to choose from, including GNU zip (gzip), Zip, bzip, and compress. Table 20-6 lists the commonly used archive and compressions applications.

Applications

Description

tar

Archive creation and extraction
www.gnu.org/software/tar/manual/tar.html

FileRoller (Archive Manager)

GNOME front end for tar and gzip/bzip2

gzip

File, directory, and archive compression
www.gnu.org/software/gzip/manual/

bzip2

File, directory, and archive compression
www.gnu.org/software/gzip/manual/

zip

File, directory, and archive compression

Table 20-6: Archive and Compression Applications

Archiving and Compressing Files with File Roller

GNOME provides the File Roller tool (accessible from the Accessories menu, labeled Archive Manager) that operates as a GUI front end to archive and compress files, letting you perform Zip, gzip, tar, and bzip2 operation using a GUI interface. You can examine the contents of archives, extract the files you want, and create new compressed archives. When you create an archive, you determine its compression method by specifying its filename extension, such as .gz for gzip or .bz2 for bzip2. You can select the different extensions from the File Type menu or enter the extension yourself. To both archive and compress files, you can choose a combined extension like .tar.bz2, which both archives with tar and compresses with bzip2. Click Add to add files to your archive. To extract files from an archive, open the archive to display the list of archive files. You can then click Extract to extract particular files or the entire archive.

Tip: File Roller can also be use to examine the contents of an archive file easily. From the file manager, right-click the archive and select Open With Archive Manager. The list of files and directories in that archive will be displayed. For subdirectories, double-click their entries. This method also works for DEB software files, letting you browse all the files that make up a software package.

Archive Files and Devices: tar

The tar utility creates archives for files and directories. With tar, you can archive specific files, update them in the archive, and add new files as you want to that archive. You can even archive entire directories with all their files and subdirectories, all of which can be restored from the archive. The tar utility was originally designed to create archives on tapes. (The term “tar” stands for tape archive. However, you can create archives on any device, such as a floppy disk, or you can create an archive file to hold the archive.) The tar utility is ideal for making backups of your files or combining several files into a single file for transmission across a network (File Roller is a GUI interface for tar). For more information on tar, check the man page or the online man page at www.gnu.org/software/tar/manual/tar.html.

Note: As an alternative to tar, you can use pax, which is designed to work with different kinds of Unix archive formats such as cpio, bcpio, and tar. You can extract, list, and create archives. The pax utility is helpful if you are handling archives created on Unix systems that are using different archive formats.

Commands

Execution

tar options files

Backs up files to tape, device, or archive file.

tar optionsf archive_name filelist

Backs up files to a specific file or device specified as archive_name. filelist; can be filenames or directories.

Options

c

Creates a new archive.

t

Lists the names of files in an archive.

r

Appends files to an archive.

U

Updates an archive with new and changed files; adds only those files modified since they were archived or files not already present in the archive.

--delete

Removes a file from the archive.

w

Waits for a confirmation from the user before archiving each file; enables you to update an archive selectively.

x

Extracts files from an archive.

m

When extracting a file from an archive, no new timestamp is assigned.

M

Creates a multiple-volume archive that may be stored on several floppy disks.

f archive-name

Saves the tape archive to the file archive name, instead of to the default tape device. When given an archive name, the f option saves the tar archive in a file of that name.

f device-name

Saves a tar archive to a device such as a floppy disk or tape. /dev/fd0 is the device name for your floppy disk; the default device is held in /etc/default/tar-file.

v

Displays each filename as it is archived.

z

Compresses or decompresses archived files using gzip.

j

Compresses or decompresses archived files using bzip2.

Table 20-7: File Archives: tar

Displaying Archive Contents

Both file managers in GNOME and KDE have the capability to display the contents of a tar archive file automatically. The contents are displayed as though they were files in a directory. You can list the files as icons or with details, sorting them by name, type, or other fields. You can even display the contents of files. Clicking a text file opens it with a text editor, and an image is displayed with an image viewer. If the file manager cannot determine what program to use to display the file, it prompts you to select an application. Both file managers can perform the same kinds of operations on archives residing on remote file systems, such as tar archives on FTP sites. You can obtain a listing of their contents and even read their readme files. The Nautilus file manager (GNOME) can also extract an archive. Right-click the Archive icon and select Extract.

Creating Archives

On Linux, tar is often used to create archives on devices or files. You can direct tar to archive files to a specific device or a file by using the f option with the name of the device or file. The syntax for the tar command using the f option is shown in the next example. The device or filename is often referred to as the archive name. When creating a file for a tar archive, the filename is usually given the extension .tar. This is a convention only and is not required. You can list as many filenames as you want. If a directory name is specified, all its subdirectories are included in the archive.

$ tar optionsf archive-name.tar directory-and-file-names

To create an archive, use the c option. Combined with the f option, c creates an archive on a file or device. You enter this option before and right next to the f option. Notice no dash precedes a tar option. Table 20-7 lists the different options you can use with tar. In the next example, the directory mydir and all its subdirectories are saved in the file myarch.tar. In this example, the mydir directory holds two files, mymeeting and party, as well as a directory called reports that has three files: weather, monday, and friday.

$ tar cvf myarch.tar mydir
mydir/
mydir/reports/
mydir/reports/weather
mydir/reports/monday
mydir/reports/friday
mydir/mymeeting
mydir/party

Extracting Archives

The user can later extract the directories from the tape using the x option. The xf option extracts files from an archive file or device. The tar extraction operation generates all subdirectories. In the next example, the xf option directs tar to extract all the files and subdirectories from the tar file myarch.tar:

$ tar xvf myarch.tar
mydir/
mydir/reports/
mydir/reports/weather
mydir/reports/monday
mydir/reports/friday
mydir/mymeeting
mydir/party

You use the r option to add files to an already-created archive. The r option appends the files to the archive. In the next example, the user appends the files in the letters directory to the myarch.tar archive. Here, the directory mydocs and its files are added to the myarch.tar archive:

$ tar rvf myarch.tar mydocs
mydocs/
mydocs/doc1

Updating Archives

If you change any of the files in your directories you previously archived, you can use the u option to instruct tar to update the archive with any modified files. The tar command compares the time of the last update for each archived file with those in the user’s directory and copies into the archive any files that have been changed since they were last archived. Any newly created files in these directories are also added to the archive. In the next example, the user updates the myarch.tar file with any recently modified or newly created files in the mydir directory. In this case, the gifts file was added to the mydir directory.

tar uvf myarch.tar mydir
mydir/
mydir/gifts

If you need to see what files are stored in an archive, you can use the tar command with the t option. The next example lists all the files stored in the myarch.tar archive:

tar tvf myarch.tar
drwxr-xr-x root/root 0 2000-10-24 21:38:18 mydir/
drwxr-xr-x root/root 0 2000-10-24 21:38:51 mydir/reports/
-rw-r--r-- root/root 22 2000-10-24 21:38:40 mydir/reports/weather
-rw-r--r-- root/root 22 2000-10-24 21:38:45 mydir/reports/monday
-rw-r--r-- root/root 22 2000-10-24 21:38:51 mydir/reports/friday
-rw-r--r-- root/root 22 2000-10-24 21:38:18 mydir/mymeeting
-rw-r--r-- root/root 22 2000-10-24 21:36:42 mydir/party
drwxr-xr-x root/root 0 2000-10-24 21:48:45 mydocs/
-rw-r--r-- root/root 22 2000-10-24 21:48:45 mydocs/doc1
drwxr-xr-x root/root 0 2000-10-24 21:54:03 mydir/
-rw-r--r-- root/root 22 2000-10-24 21:54:03 mydir/gifts

Note: To backup files using several CD/DVD-ROMs, you would first create a split archive, one consisting of several files, using the -M option, the multi-volume option. The tape size for an ISO DVD would be specified with the tape-length option, --tape-length=2294900.

Compressing Archives

The tar operation does not perform compression on archived files. If you want to compress the archived files, you can instruct tar to invoke the gzip utility to compress them. With the lowercase z option, tar first uses gzip to compress files before archiving them. The same z option invokes gzip to decompress them when extracting files.

$ tar czf myarch.tar.gz mydir

To use bzip instead of gzip to compress files before archiving them, you use the j option. The same j option invokes bzip to decompress them when extracting files.

$ tar cjf myarch.tar.bz2 mydir

Remember, a difference exists between compressing individual files in an archive and compressing the entire archive as a whole. Often, an archive is created for transferring several files at once as one tar file. To shorten transmission time, the archive should be as small as possible. You can use the compression utility gzip on the archive tar file to compress it, reducing its size, and then send the compressed version. The person receiving it can decompress it, restoring the tar file. Using gzip on a tar file often results in a file with the extension .tar.gz. The extension .gz is added to a compressed gzip file. The next example creates a compressed version of myarch.tar using the same name with the extension .gz:

$ gzip myarch.tar
$ ls
$ myarch.tar.gz

Instead of retyping the tar command for different files, you can place the command in a script and pass the files to it. Be sure to make the script executable. In the following example, a simple myarchprog script is created that will archive filenames listed as its arguments.

myarchprog

tar cvf myarch.tar $*

A run of the myarchprog script with multiple arguments is shown here:

$ myarchprog mydata preface
mydata
preface

Archiving to Tape

If you have a default device specified, such as a tape, and you want to create an archive on it, you can simply use tar without the f option and a device or filename. This can be helpful for making backups of your files. The name of the default device is held in a file called /etc/default/tar. The syntax for the tar command using the default tape device is shown in the following example. If a directory name is specified, all its subdirectories are included in the archive.

$ tar option directory-and-file-names

In the next example, the directory mydir and all its subdirectories are saved on a tape in the default tape device:

$ tar c mydir

In this example, the mydir directory and all its files and subdirectories are extracted from the default tape device and placed in the user’s working directory:

$ tar x mydir

Note: There are other archive programs you can use such as cpio, pax, and shar. However, tar is the one most commonly used for archiving application software.

File Compression: gzip, bzip2, and zip

Several reasons exist for reducing the size of a file. The two most common are to save space or, if you are transferring the file across a network, to save transmission time. You can effectively reduce a file size by creating a compressed copy of it. Anytime you need the file again, you decompress it. Compression is used in combination with archiving to enable you to compress whole directories and their files at once. Decompression generates a copy of the archive file, which can then be extracted, generating a copy of those files and directories. File Roller provides a GUI interface for these tasks. For more information on gzip, check the man page or the online man page at www.gnu.org/software/gzip/manual/. For bzip2 also check its man page or the online documentation at www.bzip.org/docs.html.

Compression with gzip

Several compression utilities are available for use on Linux and Unix systems. Most software for Linux systems uses the GNU gzip and gunzip utilities. The gzip utility compresses files, and gunzip decompresses them. To compress a file, enter the command gzip and the filename. This replaces the file with a compressed version of it with the extension .gz.

Option

Execution

-c

Sends compressed version of file to standard output; each file listed is separately compressed:
gzip -cmydata preface > myfiles.gz

-d

Decompresses a compressed file; or you can use gunzip:
gzip -dmyfiles.gz
gunzip myfiles.gz

-h

Displays help listing.

-l file-list

Displays compressed and uncompressed size of each file listed:
gzip -lmyfiles.gz.

-r directory-name

Recursively searches for specified directories and compresses all the files in them; the search begins from the current working directory. When used with gunzip, compressed files of a specified directory are uncompressed.

-v file-list

For each compressed or decompressed file, displays its name and the percentage of its reduction in size.

-num

Determines the speed and size of the compression; the range is from –1 to –9. A lower number gives greater speed but less compression, resulting in a larger file that compresses and decompresses quickly. Thus –1 gives the quickest compression but with the largest size; –9 results in a very small file that takes longer to compress and decompress. The default is –6.

Table 20-8: The gzip Options

$ gzip mydata
$ ls
mydata.gz

To decompress a gzip file, use either gzip with the -d option or the command gunzip. These commands decompress a compressed file with the .gz extension and replace it with a decompressed version with the same root name but without the .gz extension. When you use gunzip, you needn’t even type in the .gz extension; gunzip and gzip -d assume it. Table 20-8 lists the different gzip options.

$ gunzip mydata.gz
$ ls
mydata

Tip: On your desktop, you can extract the contents of an archive by locating it with the file manager and double-clicking it. You can also right-click and choose Open with Archive Manager. This will start the File Roller application, which will open the archive, listing its contents. You can then choose to extract the archive. File Roller will use the appropriate tools to decompress the archive (bzip2, zip, or gzip) if compressed, and then extract the archive (tar).

You can also compress archived tar files. This results in files with the extensions .tar.gz. Compressed archived files are often used for transmitting extremely large files across networks.

$ gzip myarch.tar
$ ls
myarch.tar.gz

You can compress tar file members individually using the tar z option that invokes gzip. With the z option, tar invokes gzip to compress a file before placing it in an archive. Archives with members compressed with the z option, however, cannot be updated, nor is it possible to add to them. All members must be compressed, and all must be added at the same time.

The compress and uncompress Commands

You can also use the compress and uncompress commands to create compressed files. They generate a file that has a .Z extension and use a different compression format from gzip. The compress and uncompress commands are not that widely used, but you may run across .Z files occasionally. You can use the uncompress command to decompress a .Z file. The gzip utility is the standard GNU compression utility and should be used instead of compress.

Compressing with bzip2

Another popular compression utility is bzip2. It compresses files using the Burrows-Wheeler block-sorting text compression algorithm and Huffman coding. The command line options are similar to gzip by design, but they are not exactly the same. (See the bzip2 Man page for a complete listing.) You compress files using the bzip2 command and decompress with bunzip2. The bzip2 command creates files with the extension .bz2. You can use bzcat to output compressed data to the standard output. The bzip2 command compresses files in blocks and enables you to specify their size (larger blocks give you greater compression). As when using gzip, you can use bzip2 to compress tar archive files. The following example compresses the mydata file into a bzip compressed file with the extension .bz2:

$ bzip2 mydata
$ ls
mydata.bz2

To decompress, use the bunzip2 command on a bzip file:

$ bunzip2 mydata.bz2

Using Zip

Zip is a compression and archive utility modeled on PKZIP, which was used originally on DOS systems. Zip is a cross-platform utility used on Windows, Mac, MS-DOS, OS/2, Unix, and Linux systems. Zip commands can work with archives created by PKZIP and can use Zip archives. You compress a file using the zip command. This creates a Zip file with the .zip extension. If no files are listed, zip outputs the compressed data to the standard output. You can also use the - argument to have zip read from the standard input. To compress a directory, you include the -r option. The first example archives and compresses a file:

$ zip mydata
$ ls
mydata.zip

The next example archives and compresses the reports directory:

$ zip -r reports

A full set of archive operations is supported. With the -f option, you can update a particular file in the Zip archive with a newer version. The -u option replaces or adds files, and the -d option deletes files from the Zip archive. Options also exist for encrypting files, making DOS-to-Unix end-of-line translations and including hidden files.

To decompress and extract the Zip file, you use the unzip command.

$ unzip mydata.zip