Web Archiving
Web archiving refers to the process of collecting and preserving information from World Wide Web. Web archives generally collect all types of Web contents including HTML Web pages, style sheets, JavaScripts, images, and video. Along with this, archivers collect metadata that includes access time, MIME type and content length. This information provides authenticity and provence required for researches and historical archiving purposes.
Massive Web archiving is done by employing Web crawlers for automated collection. Internet Archive is the largest Web archiving organization that employs crawling approach in Web archiving. The company is involved in an attempt to archive the entire Web. There are several software solutions and services commercially available to help individuals and companies archive their own Web content for legal or regulatory purposes.
Methods of Archiving
Some of the popular methods of Web archiving include remote harvesting, on-demand archiving, database archiving, and transactional archiving.
Remote harvesting is the method of collecting information automatically using Web crawlers. Examples of popular Web crawlers include Heritrix, HTTrack, Offline Explorer, and Web Curator.
On-demand Web archiving refers to archiving and retrieving Internet contents as per the specific requirements of the user. Popular services include WebCite, Archive-It, and Hanzo Archives.
Database archiving refers to the method of collecting underlying content of database-driven web sites by extracting the database content into a standard schema using XML.
Transactional archiving collects the details of actual transaction between a Web server and a Web browser. This method is used to preserve evidence of the content viewed on a particular Web page on a given date.


