William Jiang

JavaScript,PHP,Node,Perl,LAMP Web Developer – http://williamjxj.com; https://github.com/williamjxj?tab=repositories

Tag Archives: automation

Auto Scraping Web Sites

Auto scraping, or automation download, or spidering, they are the same concept: to automatically download information from website, and extract these information as local data.

I am a veteran for such things. Till now, more than 10 projects have been successfully done to automatically download data from different websites.

(1) Tools: Perl, CPAN modules, Shell Script (to make the routine automatically), MySQL (optional: Oracle, MS Excel, or even MS Access).

(2) The process is like this:
(2.a) Using the wonderful tools Perl + CPAN, this kind of thing become easy and available. It can download daily data page by page, tab by tab, link by link, loop to inline subdomains.

(2.b) The downloaded data could be html formats, or xml formats. For these data, Parsing are executed to extract useful data out of blend material (html, css, javascript, Flex, applets, etc).
The data could be emails, phone numbers, address, stock, financial rates, or any kind of data.

(2.c) The extracted data are default set to MySQL database, also can present them as the following formats: word document, pdf, excel, xml, etc.

(3) The data are accurate, timely, and comprehensive, based on daily, weekly, monthly, or even hourly.

As far as I know, such functions are very useful for doing business, for data analyzing, advertising, sales purposes, or any other demands.
So I prepared a existing sample application which can be shared with others, if you are interested in.

For such requirement, find me in the ‘About‘ tab and contact me. I will be glad to provide related information.

Advertisements