THttpScan VCL component for Delphi and C++ Builder. THttpScan Analyzes recursively html pages, extracts and reports all the links it finds
With THTTPSCAN you access to web sites as a collection of links to files and data, instead of as graphics and text. THTTPSCAN analyzes recursively HTML pages and reports all the links it finds to a text file: html, mail, jpg, mpeg, mp3, etc. THttpScan navigates through HTML pages in the neighborhood of the initial URL. The links appearing several times are treated only once. The LinkScan property allows you to limit the scanning to the initial site or the initial URL path. The LinkReport property allows you to report only links owned by the current site or even with the same path name. The DepthSearchLevel allows you to limit the level of pages scanned, starting from the initial page, especially when not limiting the scanning to a site. Using the LinkScan and LinkReport properties with a high DelphSearchLevel value, you can easily scan a whole site or only a subdirectory of a web site. Events are generated for each link found and each page read, returning URL, meta tags, document type, referrer, host name,… According to your line speed, you can grab thousands of links from a starting URL in a few minutes. THTTPSCAN saves you having to tangle with the HTML parsing. Most common parameters can be simply set from the Object Inspector. It can be placed on any window, it is only visible at design time. Full source code optional.
Tags: html pages analyze, cppbuilder, web scanner, scan links, search links, scanner component, C++Builder, component, analyzes html pages, crawler component, Delphi, extract links, recursively, http, html pages analysis