William Jiang

JavaScript,PHP,Node,Perl,LAMP Web Developer – http://williamjxj.com; https://github.com/williamjxj?tab=repositories

Tag Archives: scraper

Auto Scraping Web Sites

Auto scraping, or automation download, or spidering, they are the same concept: to automatically download information from website, and extract these information as local data.

I am a veteran for such things. Till now, more than 10 projects have been successfully done to automatically download data from different websites.

(1) Tools: Perl, CPAN modules, Shell Script (to make the routine automatically), MySQL (optional: Oracle, MS Excel, or even MS Access).

(2) The process is like this:
(2.a) Using the wonderful tools Perl + CPAN, this kind of thing become easy and available. It can download daily data page by page, tab by tab, link by link, loop to inline subdomains.

(2.b) The downloaded data could be html formats, or xml formats. For these data, Parsing are executed to extract useful data out of blend material (html, css, javascript, Flex, applets, etc).
The data could be emails, phone numbers, address, stock, financial rates, or any kind of data.

(2.c) The extracted data are default set to MySQL database, also can present them as the following formats: word document, pdf, excel, xml, etc.

(3) The data are accurate, timely, and comprehensive, based on daily, weekly, monthly, or even hourly.

As far as I know, such functions are very useful for doing business, for data analyzing, advertising, sales purposes, or any other demands.
So I prepared a existing sample application which can be shared with others, if you are interested in.

For such requirement, find me in the ‘About‘ tab and contact me. I will be glad to provide related information.

Advertisements

What I can do for web applications

What can I do for web application

Here I list what I can do at this stage as web developer:

(1) The design, developing of interactive websites

(a.1) user input in the webpage, storing the data into Database, and present them in web pages to allow user modifying, removing, or adding.
Meanwhile, generate Pdf, Excel, Csv, XML files to allow user download.

(a.2) Integrating multi-data resources (reports, logs, excel, etc) into a web application.

(a.3) bulky files, photos upload, such as the whole directory of pictures upload.

(a.4) online searching, editing, pagination, sorting functions.

(a.5) google maps application.

(2) The design, developing of Ajax and Dynamic web page, such as http://www.onlinejobshunter.com/. If more time, I can even develop further some outstanding functions like http://www.canada.com/.

(3) Automatically scrape website
Automatically scrape website(s), to download data, extract them, and present them as any kind of format: html, word document, pdf, excel, or storing into databases.

The data could be emails, phone numbers, address, scores, stock, financial rates, or any kind of data.

The scraping are timely, automatically, based on daily, weekly, monthly, or even hourly.

(4) The design, developing of Social networking, such as dating website.

(5) The design, developing of e-commerce, Business application, such as sales online, plus shopping cart and payment system.

(6) Social network integration for facebook and twitter, such as http://www.facebook.com/toyota.

Here is a sample which I wrote for the developing details.

(7) Drupal, wordpress, phpBB integration. For example, Using Drupal to develop CMS app, wordpress for blog, phpBB for forum.