Feb 262012
 

In these days I had to download a lot of files from a remote FTP server, the best solution in cases like this one is to login on the remote server and do a zipped archive of all the files (for this use tar -zcvf archivename.tgz /path/to/archive/) , in this way you’ll have to download just 1 file that is also been compressed and FTP can do perfectly this.

But this time I had no shell on the remote server, just a FTP account, so what’s the best way to download a large number of files recursively ?



As first thing I’ve took a look at the manual page of ftp, ah I forgot to say that the machine where I’ve to download all these files is a head-less server so no graphical interface or handy graphical FTP clients, looking at the FTP man page, the most similar thing of what i needed was the command mget:

mget remote-files
Expand the remote-files on the remote machine and do a get
for each file name thus produced. See glob for details on
the filename expansion. Resulting file names will then be
processed according to case, ntrans, and nmap settings.
Files are transferred into the local working directory, which
can be changed with ‘lcd directory’; new local directories
can be created with ‘! mkdir directory’.

So useful but not for me in this case where I’ve multiple subdirectory, so with a quick search on Google I’ve found that simply the protocol FTP don’t support recursive download and so you must use the client options to do this, so let’s see  how to do it with Wget

Wget

GNU Wget is a free software package for retrieving files using HTTP, HTTPS and FTP, the most widely-used Internet protocols. It is a non-interactive commandline tool, so it may easily be called from scripts, cron jobs, terminals without X-Windows support, etc.
So this seem the perfect tool to be used on a server, also as plus wget is available for sure in any Linux distribution repository and this make installing it trivial.

The basic syntax for wget is

wget ftp://myusername:mypassword@ftp.yoursite.com/yourfile

With a command like this one you use the FTP protocol with account myusername and the password mypassword to donwload from ftp.yoursite.com the file yourfile.
But we need some extra options to get a recursive download from that FTP site.

Extra Options

-r –recursive Turn on recursive retrieving.

-l depth –level=depth Specify recursion maximum depth level depth. The default maximum depth is 5.

So our command becomes:

wget -r --level=99 ftp://myusername:mypassword@ftp.yoursite.com/

In this way starting from the root directory wget download recursively down to 99 levels (or you can use inf for infinite)


Or you can use the -m option (that stands for mirror)
The -m option turns on mirroring i.e. it turns on recursion and time-stamping, sets infinite recursion depth and keeps FTP directory listings:

wget -m ftp://myusername:mypassword@ftp.yoursite.com/

If like me you have a really big site i suggest to run it with a nohup in front of the command and run it in background.

A final tip for wget, if you have to re-run it with the same site, you can also use the option -nc, in this way the files will not be downloaded 2 times.

-nc –no-clobber
If a file is downloaded more than once in the same directory, Wget’s behavior depends on a few options, including -nc. In certain cases, the local file will be clobbered, or overwritten, upon repeated download. In other cases it will be preserved.

When running Wget with -r or -p, but without -N, -nd, or -nc, re-downloading a file will result in the new copy simply overwriting the old. Adding -nc will prevent this behavior, instead causing the original version to be preserved and any newer copies on the server to be ignored.

Enhanced by Zemanta

Popular Posts:

Flattr this!

  8 Responses to “How to download recursively from an FTP site”

  1. Your third paragraph says about mget, then you switch to wget. Where is the truth?

  2. I’m a communication researcher and sometimes my students and I want to analyze posts to public message boards. To have a local copy of the posts we’ve been having to copy and paste each page individually. We generally need thousands of posts, so this is tedious. Is there some way to use wget or a similar utility to do this in a more automated way? We can work from Linux, OS X, or Windows.

  3. Many thanks, DD. I’m going to try it in the morning.

  4. mget is a popular command in FTP sessions and is short for “multiple get” (“get” downloads 1 file, “put” uploads 1).
    A utility like wget offers much more flexibility than the standard “ftp” utility, like different protocols (ftp, http,…), recursive downloading, automatic retries, time-stamping to get only newer files, …

    Alternatives to wget are lftp (http://lftp.yar.ru/) or ncftp on linux, but I’m sure there are tons of these around.

    @Dale Hample: wget can also download html pages in full, with some options to convert links so they work locally and so on.

  5. yafc
    get -r dir
    bingo

  6. Hi,

    I’m new here.

    I was going to recommend HTTRACK but someone has already beaten me to it, you only have to enter the top domain name and it will download all files contained or you can go down levels.
    You can also set it in a cron tab so that it will keep the local copy upto date, I use to use this method for syncing my two lampp stacks, it may even work with FTP.

 Leave a Reply

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

(required)

(required)

*