Monday 30 January 2012

How to Wget

* Download Single File with wget
# wget http://www.openss7.org/repos/tarballs/strx25-0.9.2.1.tar.bz2

* Download and Store With a Different File name Using wget -O
# wget -O taglist.zip http://www.vim.org/scripts /download_script.php?src_id=7701

* Specify Download Speed / Download Rate Using wget –limit-rate
# wget --limit-rate=200k http://www.openss7.org/repos/tarballs /strx25-0.9.2.1.tar.bz2

* Continue the Incomplete Download Using wget -c
# wget -c http://www.openss7.org/repos/tarballs /strx25-0.9.2.1.tar.bz2

* Download in the Background Using wget -b
# wget -b http://www.openss7.org/repos/tarballs /strx25-0.9.2.1.tar.bz2

* Mask User Agent and Display wget like Browser Using wget –user-agent
# wget --user-agent="Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.3) Gecko/2008092416 Firefox/3.0.3" URL-TO-DOWNLOAD

Some websites can disallow you to download its page by identifying that the user agent is not a browser. So you can mask the user agent by using –user-agent options and show wget like a browser as shown below.

* Test Download URL Using wget –spider
# wget --spider DOWNLOAD-URL

When you are going to do scheduled download, you should check whether download will happen fine or not at scheduled time. To do so, copy the line exactly from the schedule, and then add –spider option to check.

* Increase Total Number of Retry Attempts Using wget --tries 
# wget --tries=75 DOWNLOAD-URL

If the internet connection has problem, and if the download file is large there is a chance of failures in the download. By default wget retries 20 times to make the download successful. If needed, you can increase retry attempts using –tries option as shown below.

* Download Multiple Files / URLs Using Wget -i
First, store all the download files or URLs in a text file as:

# cat > download-file-list.txt 

Next, give the download-file-list.txt as argument to wget using -i option as shown below. 

# wget -i download-file-list.txt

* Download a Full Website Using wget –-mirror
Following is the command line which you want to execute when you want to download a full website and made available for local viewing.

# wget --mirror -p --convert-links -P ./LOCAL-DIR WEBSITE-URL

-–mirror : turn on options suitable for mirroring.
  -p  : download all files that are necessary to properly display a given 
         HTML page.
-–convert-links : after the download, convert the links in document for
                         local viewing.
-P ./LOCAL-DIR : save all the files and directories to the specified directory.

* Reject Certain File Types while Downloading Using wget -–reject 
# wget --reject=gif WEBSITE-TO-BE-DOWNLOADED

* Log messages to a log file instead of stderr Using wget -o
# wget -o download.log DOWNLOAD-URL

* Quit Downloading When it Exceeds Certain Size Using wget -Q 
When you want to stop download when it crosses 5 MB you can use the following wget command line.
# wget -Q5m -i FILE-WHICH-HAS-URLS

This quota will not get effect when you do a download a single URL. That is irrespective of the quota size everything will get downloaded when you specify a single file. This quota is applicable only for recursive downloads. 

* Download Only Certain File Types Using wget -r -A
# wget -r -A.pdf http://url-to-webpage-with-pdfs/

* FTP Download With wget
Anonymous FTP download using Wget
# wget ftp-url

FTP download using wget with username and password authentication.
# wget --ftp-user=USERNAME --ftp-password=PASSWORD DOWNLOAD-URL

HTTP Download using wget with username and password
# wget --user=username --password=password downloadURL .
These parameters can be overridden using the --ftp-user and --ftp-password options for FTP connections and the --http-user and --http-password options for HTTP connections.

No comments:

Post a Comment