cURL – William Jiang

Under linux, usually there are 3 commands to fetch contents from web: curl, wget, lynx.
For example, to download rss from my blog and save in index.html, the following 3 commands do the same thing:


$ lynx -source https://williamjxj.wordpress.com/feed/ >index.html
$ curl https://williamjxj.wordpress.com/feed/ >index.html
$ wget https://williamjxj.wordpress.com/feed/

In my Cygwin, The 3 commands are all in /usr/bin/, the size is: lynx > wget > curl. However, curl has libcurl.so supports and came out latest, so it should be most powerful.

lynx
lynx is a text-based web browser for use on cursor-addressable character cell terminals, such as vt100, vt220(I used a lot in my previous job) and is very configurable.

by running lynx URL, we can access web in a pure text-based environment, page down and up, go to links etc, exactly the same as Windows web browser, without Javascript and Ajax supports.
by using lynx -help, we can get a long list of it options.
wget
wget downloads files form web which supports http, https and ftp protocols. It also has a lot of options to support download. However, its options are some kinds of weird:
```
wget −r −−tries=10 http://fly.srk.fer.hr/ −o log
wget −c ftp://sunsite.doc.ic.ac.uk/ls−lR.Z
wget −−no−cookies −−header "Cookie: ="
```
The following example shows how to log to a server using POST and then proceed to download the desired pages, presumably only accessible to authorized users:
```
# Log in to the server.  This can be done only once.
wget −−save−cookies cookies.txt \
     −−post−data 'user=foo&password=bar' \
     http://server.com/auth.php

# Now grab the page or pages we care about.
wget −−load−cookies cookies.txt \
     −p http://server.com/interesting/article.php
```
I extracted wget and curl’s features from Daniel Stenberg’s webpage.
- Wget is command line only. There’s no lib or anything.
- Recursive! Wget’s major strong side compared to curl is its ability to download recursively, or even just download everything that is referred to from a remote resource, be it a HTML page or a FTP directory listing.A gnu head!
- Older. Wget has traces back to 1995, while curl can be tracked back no earlier than 1997.
- Less developer activity. While this can be debated, I consider three metrics here: mailing list activity, source code commit frequency and release frequency. Anyone following these two projects can see that the curl project has a lot higher pace in all these areas, and it has indeed been so for several years.
- HTTP 1.0. Wget still does its HTTP operations using HTTP 1.0, and while that is still working remarkably fine and hardly ever is troublesome to the end-users, it is still a fact. curl has done HTTP 1.1 since March 2001 (while still offering optional 1.0 requests).
- GPL. Wget is 100% GPL v3. curl is MIT licensed.
- GNU. Wget is part of the GNU project and all copyrights are assigned to FSF. The curl project is entirely stand-alone and independent with no organization parenting at all – with almost all copyrights owned by Daniel.
- Wget requires no extra options to simply download a remote URL to a local file, while curl requires -o or -O. However trivial, this fact is often mentioned to me when people explain why they prefer downloading with wget.
curl
curl is a tool to transfer data from or to a server. It is designed to work without user interaction.
- Features and is powered by libcurl – a cross-platform library with a stable API that can be used by each and everyone. This difference is major since it creates a completely different attitude on how to do things internally. It is also slightly harder to make a library than a “mere” command line tool.
- Pipes. curl is more in the traditional unix-style, it sends more stuff to stdout, and reads more from stdin in a “everything is a pipe” manner.cURL
- Return codes. curl returns a range of defined and documented return codes for various (error) situations.
- Single shot. curl is basically made to do single-shot transfers of data. It transfers just the URLs that the user specifies, and does not contain any recursive downloading logic nor any sort of HTML parser.
- More protocols. curl supports FTP, FTPS, HTTP, HTTPS, SCP, SFTP, TFTP, TELNET, DICT, LDAP, LDAPS, FILE, POP3, IMAP, SMTP, RTMP and RTSP at the time of this writing. Wget supports HTTP, HTTPS and FTP.
- More portable. Ironically curl builds and runs on lots of more platforms than wget, in spite of their attempts to keep things conservative. For example: OS/400, TPF and other more “exotic” platforms that aren’t straight-forward unix clones.
- More SSL libraries and SSL support. curl can be built with one out of four or five different SSL/TLS libraries, and it offers more control and wider support for protocol details.
- curl (or rather libcurl) supports more HTTP authentication methods, and especially when you try over HTTP proxies.
- Bidirectional. curl offers upload and sending capabilities. Wget only offers plain HTTP POST support.
- HTTP multipart/form-data sending, which allows users to do HTTP “upload” and in general emulate browsers and do HTTP automation to a wider extent
- Compression. curl supports gzip and inflate Content-Encoding and does automatic decompression.

cURL (client URL library) is a library which allows you to connect and communicate to many different types of servers with many different types of protocols. Using cURL you can:

Implement payment gateways’ payment notification scripts.
Download and upload files from remote servers.
Login to other websites and access members only sections.

PHP cURL library is definitely the odd man out. Unlike other PHP libraries where a whole plethora of functions is made available, PHP cURL wraps up a major parts of its functionality in just four functions.

A typical PHP cURL usage follows the following sequence of steps.

curl_init – Initializes the session and returns a cURL handle which can be passed to other cURL functions.
curl_opt – This is the main work horse of cURL library. This function is called multiple times and specifies what we want the cURL library to do.
curl_exec – Executes a cURL session.
curl_close – Closes the current cURL session.

Below are some examples which should make the working of cURL more clearer: use cURL to download Google’s RSS feed.

<?php
/**
* Initialize the cURL session
*/
$ch = curl_init();
/**
* Set the URL of the page or file to download.
*/
curl_setopt($ch, CURLOPT_URL,
‘http://news.google.com/news?hl=en&topic=t&output=rss’);
/**
* Ask cURL to return the contents in a variable
* instead of simply echoing them to the browser.
*/
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
/**
* Execute the cURL session
*/
$contents = curl_exec ($ch);
/**
* Close cURL session
*/
curl_close ($ch);
?>

As you can see, curl_setopt is the pivot around which the main cURL functionality revolves. cURL functioning is controlled by way of passing predefined options and values to this function.

The above code uses two such options.

CURLOPT_URL: Use it to specify the URL which you want to process. This could be the URL of the file you want to download or it could be the URL of the script to which you want to post some data.
CURLOPT_RETURNTRANSFER: Setting this option to 1 will cause the curl_exec function to return the contents instead of echoing them to the browser.

The following is another code sample for download whole webpage from a website.

function DownloadUrl(){
$Url = $this->url;
if (!function_exists(‘curl_init’)){
die(‘CURL is not installed!’);
}
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $Url);
curl_setopt($ch, CURLOPT_REFERER, “http://www.imdb.com/title/tt0120338/”);
curl_setopt($ch, CURLOPT_USERAGENT, “MozillaXYZ/1.0”);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_TIMEOUT, 10);
$output = curl_exec($ch);
curl_close($ch);
return $output;
}

William Jiang

curl vs. wget vs. lynx

PHP cURL Library