I recently had to move from one machine to another about 50 GB of data, divided into hundreds of thousands of small files, and i had no additional space on the machine to make a zipped tar and then move it comfortably, I tried a scp, but after 45 minutes it had moved around 2 GB of data, too slow.
And so I started looking at the options a bit more advanced of tar.
But as first thing, what’s tar ?
In computing, tar (derived from tape archive and commonly referred to as “tarball”) is both a file format (in the form of a type of archive bitstream) and the name of a program used to handle such files. The format was created in the early days of Unix and standardized by POSIX.1-1988 and later POSIX.1-2001.
Initially developed to be written directly to sequential I/O devices for tape backup purposes, it is now commonly used to collect many files into one larger file for distribution or archiving, while preserving file system information such as user and group permissions, dates, and directory structures.
Conventionally, uncompressed tar archive files have names ending in “.tar”. Unlike ZIP archives, tar files (somefile.tar) are commonly compressed as a whole rather than piecemeal. Applying a compression utility such as gzip, bzip2, lzip, lzma or compress to a tar file produces a compressed tar file, typically named with an extension indicating the type of compression (e.g.: somefile.tar.gz).
Popular tar programs like the BSD and GNU versions of tar support the command line options -z (gzip), and -j (bzip2) to automatically compress or decompress the archive file it is currently working with. GNU tar from version 1.20 onwards also supports –lzma (LZMA). 1.21 also supports lzop via –lzop, 1.22 adds support for xz via –xz or -J, and 1.23 adds support for lzip via –lzip. Both will automatically extract compressed gzip and bzip2 archives with or without these options.
The basic usage of tar is:
Tar up a directory and all of its subdirectories using:
tar cf archive.tar dir
Then, extract it in another directory with:
tar xf archive.tar
or if you want to list the contents of a tar file:
tar tf archive.tar
Legend of the main options:
1. c = Create
2. x = Extract
3. t = List content
4. v = live report of the process
5. f = Package
6. z = Compress and pack simultaneously (gzip).
7. j = Compress and pack simultaneously (bzip2).
Having a directory with files of different type:
/tmp/test┌- ls AC.11.2010.pdf accesslog_linuxaria.com_10_30_2010 bookmarks-2010-01-27.json semiogramm_farbe.pdf /tmp/test┌- du -hs . 68M .
We can get from this directory a tar file a gunzipped tar and a tar zipped with bzip2
└┌(%:/tmp/test)┌- tar -cvf ../tar.tar * AC.11.2010.pdf accesslog_linuxaria.com_10_30_2010 bookmarks-2010-01-27.json semiogramm_farbe.pdf └┌(%:/tmp/test)┌- tar -czvf ../tar.tar.gz * AC.11.2010.pdf accesslog_linuxaria.com_10_30_2010 bookmarks-2010-01-27.json semiogramm_farbe.pdf └┌(%:/tmp/test)┌- tar -cjvf ../tar.tar.bzip2 * AC.11.2010.pdf accesslog_linuxaria.com_10_30_2010 bookmarks-2010-01-27.json semiogramm_farbe.pdf └┌(%:/tmp/test┌- ls -lh ../tar* -rw-r--r-- 1 tar tar 5.9K 2011-01-02 21:32 ../tar.jpg -rw-r--r-- 1 tar tar 68M 2011-01-02 22:09 ../tar.tar -rw-r--r-- 1 tar tar 30M 2011-01-02 22:07 ../tar.tar.bzip2 -rw-r--r-- 1 tar tar 30M 2011-01-02 22:09 ../tar.tar.g
But so far we have just saw the standard uses of tar let’s see some “special” use:
Remove all files from a previously extracted archive
tar -tf | xargs rm -r
Move an entire directory structure with tar :
tar cf - dir1 | (cd dir2; tar xf -)
Move an entire directory structure over the network
tar cf - dir1 | ssh remote_host "( cd /path/to/dir2; tar xf - )"
This is the command i’ve used to move my 50 GB (in around 2 H), Faster than scp because this way you save a lot of tcp connection establishments (syn/ack packets).
If using a fast lan (I have just tested gigabyte ethernet) it is faster to not compress the data.
As above but using the options of gnu tar
tar --rsh-command ssh cvf username@remotehost:/path/to/dest/archive.tar dir1
This command will use ssh to write directly in a remote tar file the contents of directory dir1
copy from host1 to host2, through your host
ssh root@host1 "cd /somedir/tocopy/ && tar -cf - ." | ssh root@host2 "cd /samedir/tocopyto/ && tar -xf -"
Useful if you have access to host1 and host2 but the 2 host don’t see each other.