15.3 tar

tar(1) is the GNU tape archiver. It takes several files or directories and creates one large file. This allows you to compress an entire directory tree, which is impossible by just using gzip or bzip2. tar has many command line options, which are explained in its man page. This section will just cover the most common uses of tar.

The most common use for tar is to decompress and unarchive a package that you've downloaded from a web site or ftp site. Most files will come with a .tar.gz extension. This is commonly known as a “tarball”. It means that several files were archived using tar and then compressed using gzip. You might also see this listed as a .tar.Z file. It means the same thing, but this is usually encountered on older Unix systems.

Alternatively, you might find a .tar.bz2 file somewhere. Kernel source is distributed as such because it is a smaller download. As you might have guessed, this is several files archived with tar and then bzipped.

You can get to all the files in this archive by making use of tar and some command line arguments. Unarchiving a tarball makes use of the -z flag, which means to first run the file through gunzip and decompress it. The most common way to decompress a tarball is like so:

% tar -xvzf filename.tar.gz

That's quite a few options. So what do they all mean? The -x means to extract. This is important, as it tells tar exactly what to do with the input file. In this case, we'll be splitting it back up into all the files that it came from. -v means to be verbose. This will list all the files that are being unarchived. It is perfectly acceptable to leave this option off, if somewhat boring. Alternatively, you could use -vv to be very verbose and list even more information about each file being unarchived. The -z option tells tar to run filename.tar.gz through gunzip first. And finally, the -f option tells tar that the next string on the command line is the file to operate on.

There are a few other ways to write this same command. On older systems lacking a decent copy of GNU tar, you might see it written like so:

% gunzip filename.tar.gz | tar -xvf -

This command line will uncompress the file and send the output to tar. Since gzip will write its output to standard out if told to do so, this command will write the decompressed file to standard out. The pipe then sends it to tar for unarchiving. The “-” means to operate on standard input. It will unarchive the stream of data that it gets from gzip and write that to the disk.

Another way to write the first command line is to leave off the dash before the options, like so:

% tar xvzf filename.tar.gz

You might also encounter a bzipped archive. The version of tar that comes with Slackware Linux can handle these the same as gzipped archives. Instead of the -z command line option, you'd use -j:

% tar -xvjf filename.tar.bz2

It is important to note that tar will place the unarchived files in the current directory. So, if you had an archive in /tmp that you wanted to decompress into your home directory, there are a few options. First, the archive could be moved into your home directory and then run through tar. Second, you could specify the path to the archive file on the command line. Third, you can use the -C option to “explode” the tarball in a specified directory.

% cd $HOME
% cp /tmp/filename.tar.gz .
% tar -xvzf filename.tar.gz

% cd $HOME
% tar -xvzf /tmp/filename.tar.gz

% cd /
% tar -xvzf /tmp/filename.tar.gz -C $HOME

All the above statements are equivalent. In each case, the archive is unpacked inside your home directory and the original uncompressed archive is left in place.

So what good is being able to uncompress these archives if you can't make them? Well, tar handles that too. In most cases it's as easy as removing the “-x” option and replacing it with the “-c” option.

% tar -cvzf filename.tar.gz .

In this command line, the -c option tells tar to create an archive, while the -z option runs the resulting archive file through gzip to compress it. filename.tar.gz is the file that you want to create.

Specifying the “-f” option isn't always necessary, but is typically good practice anyway. Without it, tar writes to standard output, which is usually desired for piping tar's output to another program, like so.

% tar -cv filename.tar . | gpg --encrypt

That command creates an non-compressed tar archive of the current directory, pipes the tarball through gpg which encrypts and compresses the tarball, making it realistically impossible to read by anyone other than the person knowing the secret key.