William Jiang

JavaScript,PHP,Node,Perl,LAMP Web Developer – http://williamjxj.com; https://github.com/williamjxj?tab=repositories

curl vs. wget vs. lynx

Under linux, usually there are 3 commands to fetch contents from web: curl, wget, lynx.
For example, to download rss from my blog and save in index.html, the following 3 commands do the same thing:


$ lynx -source https://williamjxj.wordpress.com/feed/ >index.html
$ curl https://williamjxj.wordpress.com/feed/ >index.html
$ wget https://williamjxj.wordpress.com/feed/

In my Cygwin, The 3 commands are all in /usr/bin/, the size is: lynx > wget > curl. However, curl has libcurl.so supports and came out latest, so it should be most powerful.

  1. lynx
    lynx is a text-based web browser for use on cursor-addressable character cell terminals, such as vt100, vt220(I used a lot in my previous job) and is very configurable.

    by running lynx URL, we can access web in a pure text-based environment, page down and up, go to links etc, exactly the same as Windows web browser, without Javascript and Ajax supports.
    by using lynx -help, we can get a long list of it options.

  2. wget
    wget downloads files form web which supports http, https and ftp protocols. It also has a lot of options to support download. However, its options are some kinds of weird:

    
    wget −r −−tries=10 http://fly.srk.fer.hr/ −o log
    wget −c ftp://sunsite.doc.ic.ac.uk/ls−lR.Z
    wget −−no−cookies −−header "Cookie: ="
    

    The following example shows how to log to a server using POST and then proceed to download the desired pages, presumably only accessible to authorized users:

    
    # Log in to the server.  This can be done only once.
    wget −−save−cookies cookies.txt \
         −−post−data 'user=foo&password=bar' \
         http://server.com/auth.php
    
    # Now grab the page or pages we care about.
    wget −−load−cookies cookies.txt \
         −p http://server.com/interesting/article.php
    

    I extracted wget and curl’s features from Daniel Stenberg’s webpage.

    • Wget is command line only. There’s no lib or anything.
    • Recursive! Wget’s major strong side compared to curl is its ability to download recursively, or even just download everything that is referred to from a remote resource, be it a HTML page or a FTP directory listing.A gnu head!
    • Older. Wget has traces back to 1995, while curl can be tracked back no earlier than 1997.
    • Less developer activity. While this can be debated, I consider three metrics here: mailing list activity, source code commit frequency and release frequency. Anyone following these two projects can see that the curl project has a lot higher pace in all these areas, and it has indeed been so for several years.
    • HTTP 1.0. Wget still does its HTTP operations using HTTP 1.0, and while that is still working remarkably fine and hardly ever is troublesome to the end-users, it is still a fact. curl has done HTTP 1.1 since March 2001 (while still offering optional 1.0 requests).
    • GPL. Wget is 100% GPL v3. curl is MIT licensed.
    • GNU. Wget is part of the GNU project and all copyrights are assigned to FSF. The curl project is entirely stand-alone and independent with no organization parenting at all – with almost all copyrights owned by Daniel.
    • Wget requires no extra options to simply download a remote URL to a local file, while curl requires -o or -O. However trivial, this fact is often mentioned to me when people explain why they prefer downloading with wget.
  3. curl
    curl is a tool to transfer data from or to a server. It is designed to work without user interaction.

    • Features and is powered by libcurl – a cross-platform library with a stable API that can be used by each and everyone. This difference is major since it creates a completely different attitude on how to do things internally. It is also slightly harder to make a library than a “mere” command line tool.
    • Pipes. curl is more in the traditional unix-style, it sends more stuff to stdout, and reads more from stdin in a “everything is a pipe” manner.cURL
    • Return codes. curl returns a range of defined and documented return codes for various (error) situations.
    • Single shot. curl is basically made to do single-shot transfers of data. It transfers just the URLs that the user specifies, and does not contain any recursive downloading logic nor any sort of HTML parser.
    • More protocols. curl supports FTP, FTPS, HTTP, HTTPS, SCP, SFTP, TFTP, TELNET, DICT, LDAP, LDAPS, FILE, POP3, IMAP, SMTP, RTMP and RTSP at the time of this writing. Wget supports HTTP, HTTPS and FTP.
    • More portable. Ironically curl builds and runs on lots of more platforms than wget, in spite of their attempts to keep things conservative. For example: OS/400, TPF and other more “exotic” platforms that aren’t straight-forward unix clones.
    • More SSL libraries and SSL support. curl can be built with one out of four or five different SSL/TLS libraries, and it offers more control and wider support for protocol details.
    • curl (or rather libcurl) supports more HTTP authentication methods, and especially when you try over HTTP proxies.
    • Bidirectional. curl offers upload and sending capabilities. Wget only offers plain HTTP POST support.
    • HTTP multipart/form-data sending, which allows users to do HTTP “upload” and in general emulate browsers and do HTTP automation to a wider extent
    • Compression. curl supports gzip and inflate Content-Encoding and does automatic decompression.
Advertisements

3 responses to “curl vs. wget vs. lynx

  1. Tony Su 12/18/2010 at 3:54 pm

    Good Comparison, I actually use CURL a lot in PHP programming, CURL is so powerful in terms of a library. I use LYNX as a basic text browser since day one I am using Linux. Wget is so easy & fast for downloading anything from the web, like downloading the big size installation files or xml files.

    • williamjxj 12/19/2010 at 9:59 am

      Yeah, PHP curl is common used in php developing. My colleagues also used it to scrape.

      For the web scraping, I think the most 3 things are the ‘communicate tools’ like curl, ‘array functionality’, and ‘regular express’. curl for php seems good, but the other 2 features are not quite outstanding in php.

      In such case, Perl is super. My choice are: Perl’s Mechanize module + Perl’s powerful hash-table functionality, and Perl’s regular express.

      This components make scraping much easier, and go further: scraping page by page, link by link, as deep as you want; and millions data can be retrieved by SAX method; insert/update into DB, just in a couple of scripts!

  2. Ajith 01/14/2013 at 5:46 pm

    Thanks for the article.I am interested in learning and using command line tools for browsing.I just started to use lynx.I have used wget before.Could you please tell me about such tools and probably write a tutorial on it.
    Here is the list I got

    wget,lynx,curl,pastebinit,mutt,pine,irssi

    Thank you

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: