Working with compressed files - Linux Nitty Gritty: Working at the Ubuntu Command-Line Prompt (2011)

Linux Nitty Gritty: Working at the Ubuntu Command-Line Prompt (2011)

Working with compressed files

A variety of file compression and archive types might be encountered by the typical Ubuntu user. However, the two main types are zip files, as used in the world of Windows, and compressed tar archives.

Zip files

The following will create a new zip file called report.zip, and add report.doc to it:

zip report.zip report.doc

To zip a folder full of files, add the -r command option. The following will create reports.zip, containing the contents of the reports folder:

zip -r reports.zip reports

NOTE If any files or folders have spaces or unusual characters in them, enclose them in quotation marks.

To unzip files, the unzip command is used. The following will extract the files from archive.zip:

unzip archive.zip

To list files in an archive prior to unzipping, use the -l command option:

unzip -l archive.zip

You can subsequently unzip a single file from the archive by specifying its filename (and path, if it’s contained within a subfolder within the zip!) after specifying the archive name. The following will extract only report.doc from archive.zip:

unzip archive.zip report.doc

tar archives

The tar command is both powerful and multi-faceted. It was originally designed for backup purposes, but works equally well for individual file/folder archiving.

The elementary creation and extraction of tar archives is described in this section. The curious reader is advised to search online for more complete guides, of which there are a great many.

Creating a tar archive

The following will create a simple tar archive called archive.tar, containing the contents of the reports folder:

NOTE Remember that tar archives are not automatically compressed. They are simply container files. A tar file’s size reflects almost exactly the combined size of the files it contains:

tar cf archive.tar reports

The -c command option tells tar to create an archive, and the -f command option tells the tar command that the filename immediately follows. The -f option should always be added at the end of the range of command-options, immediately before the archive’s filename.

NOTE You might be wondering why the hyphen isn’t used before command options with the tar command. The answer is that it’s optional and so most people leave it out. A minority of commands make the hyphen optional, but most require it.

To additionally compress the archive, the -j or -z options can be added in for bzip2 or gzip compression, respectively. Bzip2 compression is considered most efficient and is arguably most common. Note that the user should manually add the bz2 file extension to the archive name. It isn’t added automatically. The standard protocol with compressed tar files is to add two file extensions—one to indicate the file is a tar file, and one to indicate the type of compression.

The following will create a bzip2 tar archive of the reports folder:

tar cjvf archive.tar.bz2 reports

The -v command option has also been added above. This provides verbose feedback, explaining what tar is doing. Without it, tar provides no feedback at all unless something goes wrong.

Extracting from a tar archive

The process of extracting files from an archive is largely the same as creating an archive. Instead of the -c (create) command option, the -x (extract) command option is used. The same -j or -z options should be added in the case of gzip or bzip2 compression, and the -f command option should be added at the end of the range of command options to specify that the filename follows.

The following will extract the contents of archive.tar.bz2:

tar xjvf archive.tar.bz2

Again, the -v option has been added so that the user is provided with verbose feedback.

The following will extract the contents of archive.tar.gz:

tar xzvf archive.tar.gz

NOTE If you’re concerned that these chains of command options will be hard to learn then don’t worry. They’ll slip into your memory surprisingly easily after you’ve used them a few times.

To list files in an archive, use the -t option:

tar tjf archive.tar.bz2

To extract a single file, specify it after the archive name. The following will extract report.doc from the archive.tar.bz2 archive:

tar xjvf archive.tar.bz2 report.doc

If the file you want to extract is contained within a subfolder within the tar file, you’ll need to specify that in the filename component.

Enclose any filenames/paths in quotation marks in the event of spaces or unusual characters.

TIP If you’re interested in using tar archives for backup purposes, take a look at the Simple Backup Suite software. This automates the procedure via a GUI. Just install the sbackup package.