A screen scraper extracts character-based data from a program’s display output and presents it in a richer format. It searches through a website’s code to filter out extraneous code and collects useful data. Screen scrapers are used by businesses to generate reports and presentations, but can also be used for spamming. There is debate about their legality and ethics. Some website owners implement tools to prevent their sites from being scraped.
A screen scraper is a computer program that collects character-based data from another program’s display output. Screen scrapers can extract the data they are looking for and present it in a richer format, such as with graphs or tables, or simply index the data for archiving. There are many other names for a screen scraper, including website scraper, content miner, website ripper, website extractor, automated data collector, and HTML scraper.
A screen scraper will search through a website’s code and filter out the extraneous code present to provide a pleasing presentation to the end browser. That code is needed to display the entire page in its intended layout, but a scraper is simply looking for useful data. This data is collected and presented as a simple database, without the bells and whistles of the original HTML code provided.
A good example of a screen scraper in action is with search engine spiders. These spiders access hundreds of thousands of websites, each of which has numerous pages within. Keyword data from these sites is collected and indexed, then presented to the end user as search engine results.
Most screen scrapers scour a website’s HTML coding to get their information, but they can also look for other scripting languages like JavaScript or PHP. The extracted data can then be presented as HTML itself, so that the user can access it with their web browser, or stored as text data that the user can access offline.
Companies use screen scrapers to extract data from a variety of keyword-related websites in order to generate charts, tables, spreadsheets, and comparison data for use in reports and presentations. The screen scraper saves an extraordinary amount of time, as an employee performing the same task would have to search for relevant sites, click links, and navigate each site individually to find and record the applicable data they need. A screen scraper can also be used when information is stored on a system that can no longer be accessed due to compatibility issues with newer hardware or software.
Screen scrapers can be both a blessing and a curse for site owners and web surfers. While they provide a perfectly functional service for businesses, search engines, and others, a screen scraper can also be used for less than altruistic purposes. For example, companies or individuals who use spam as an advertising method can use a screen scraper to extract email addresses from websites.
While a screen scraper can be a useful tool, there is some debate among the web community about the legality and ethics when using them. Copyright issues get blurry when a screen scraper takes someone’s hard work and presents it in another format for another website, and those sites that depend on advertising to generate revenue have problems when their ads are scrapped by the screen scraper. As a result, some website owners have started implementing tools that will prevent their sites from being scraped.
Protect your devices with Threat Protection by NordVPN