Jan 032011

tarI recently had to move from one machine to another about 50 GB of data, divided into hundreds of thousands of small files, and i had no additional space on the machine to make a zipped tar and then move it comfortably, I tried a scp, but after 45 minutes it had moved around 2 GB of data, too slow.

And so I started looking at the options a bit more advanced of tar.

But as first thing, what’s tar ?

From wikipedia:

In computing, tar (derived from tape archive and commonly referred to as “tarball”) is both a file format (in the form of a type of archive bitstream) and the name of a program used to handle such files. The format was created in the early days of Unix and standardized by POSIX.1-1988 and later POSIX.1-2001.
Initially developed to be written directly to sequential I/O devices for tape backup purposes, it is now commonly used to collect many files into one larger file for distribution or archiving, while preserving file system information such as user and group permissions, dates, and directory structures.

Conventionally, uncompressed tar archive files have names ending in “.tar”. Unlike ZIP archives, tar files (somefile.tar) are commonly compressed as a whole rather than piecemeal. Applying a compression utility such as gzip, bzip2, lzip, lzma or compress to a tar file produces a compressed tar file, typically named with an extension indicating the type of compression (e.g.: somefile.tar.gz).
Popular tar programs like the BSD and GNU versions of tar support the command line options -z (gzip), and -j (bzip2) to automatically compress or decompress the archive file it is currently working with. GNU tar from version 1.20 onwards also supports –lzma (LZMA). 1.21 also supports lzop via –lzop, 1.22 adds support for xz via –xz or -J, and 1.23 adds support for lzip via –lzip. Both will automatically extract compressed gzip and bzip2 archives with or without these options.

The basic usage of tar is:

Tar up a directory and all of its subdirectories using:

tar cf archive.tar dir

Then, extract it in another directory with:

tar xf archive.tar

or if you want to list the contents of a tar file:

tar tf archive.tar

Legend of the main options:

1. c = Create
2. x = Extract
3. t = List content
4. v = live report of the process
5. f = Package
6. z = Compress and pack simultaneously (gzip).
7. j = Compress and pack simultaneously (bzip2).

Having a directory with files of different type:

/tmp/test┌- ls
AC.11.2010.pdf	accesslog_linuxaria.com_10_30_2010  bookmarks-2010-01-27.json  semiogramm_farbe.pdf
/tmp/test┌- du -hs .
68M	.

We can get from this directory a tar file a gunzipped tar and a tar zipped with bzip2

└┌(%:/tmp/test)┌- tar -cvf ../tar.tar *
└┌(%:/tmp/test)┌- tar -czvf ../tar.tar.gz *
└┌(%:/tmp/test)┌- tar -cjvf ../tar.tar.bzip2 *
└┌(%:/tmp/test┌- ls -lh ../tar*
-rw-r--r-- 1 tar tar 5.9K 2011-01-02 21:32 ../tar.jpg
-rw-r--r-- 1 tar tar  68M 2011-01-02 22:09 ../tar.tar
-rw-r--r-- 1 tar tar  30M 2011-01-02 22:07 ../tar.tar.bzip2
-rw-r--r-- 1 tar tar  30M 2011-01-02 22:09 ../tar.tar.g

But so far we have just saw the standard uses of tar let’s see some “special” use:

Remove all files from a previously extracted archive

tar -tf  | xargs rm -r

Move an entire directory structure with tar :

tar cf - dir1 | (cd dir2; tar xf -)

Move an entire directory structure over the network

tar cf - dir1 | ssh remote_host "( cd /path/to/dir2; tar xf - )"

This is the command i’ve used to move my 50 GB (in around 2 H), Faster than scp because this way you save a lot of tcp connection establishments (syn/ack packets).
If using a fast lan (I have just tested gigabyte ethernet) it is faster to not compress the data.

As above but using the options of gnu tar

tar --rsh-command ssh cvf username@remotehost:/path/to/dest/archive.tar dir1

This command will use ssh to write directly in a remote tar file the contents of directory dir1

copy from host1 to host2, through your host

ssh root@host1 "cd /somedir/tocopy/ && tar -cf - ." | ssh root@host2 "cd /samedir/tocopyto/ && tar -xf -"

Useful if you have access to host1 and host2 but the 2 host don’t see each other.

References: http://www.linuxjournal.com/content/stupid-tar-tricks

Popular Posts:

Flattr this!

  12 Responses to “Tar Tricks on Linux”

  1. For me the bottleneck is always the processing power on each end, so I’ve found it much faster to use netcat from one host to another on a secure LAN. If you don’t like giving up the encryption though, you can use ssh with “-oCiphers=blowfish-cbc,arcfour256” for a more efficient encryption algorithm.

  2. Sempre utile, grazie.
    Lo aggiungo ai segnalibri. =)

  3. Thx alot added to fav

  4. > tar cf – dir1 | (cd dir2; tar xf -)

    Check out tar “-C” command line option. If you prefer to use the “cd”, then at least write more reliable “( cd dir2 && tar xf – )”.

    > -rw-r–r– 1 tar tar 30M 2011-01-02 22:07 ../tar.tar.bzip2

    Standard extension for the files is .tar.bz2, not .tar.bzip2

    Also, for the article written in 2011, mention of -J/–xz and .tar.xz is a must.

    • Thanks for the feedback, you have reason i’m not used to the -J option but i’ve heard it can save more space than bzip2, i’ll give it a try for sure.

  5. for transferring large number of files from one host to another – i’d rather use rsync over ssh.
    rsync has an update feature in case the transfer got terminated and you want to continue where it stopped.

  6. I use cpio – a much neglected but very useful tool. To duplicate a directory on the same box:

    find mydir -depth -print | cpio -pdum /new/dir

    Over a network:

    find mydir -depth -print | cpio -o | ssh newhost “cd /new/dir; cpio -idum”

  7. Useful post… I’ve never used tar from host to host via ssh.
    Anyway, when the amount of data is very huge I usually use rsync+ssh (that let me to resume the upload).

  8. Another vote for rsync. So much more robust and flexible!

  9. This is how you tar a folder with a dot in front:

    tar -cvf archive.tar \.sercetdir/

 Leave a Reply

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>