Web Archiving

Internet archiving overview, tools, policies, and resources for California State University, Fullerton.

Archive-IT Introduction

Overview of Domain-Level Archiving with Archive-IT

Archive-IT is a subscription-based service that allows institutions to have whole domains or subdomains crawled and archived at regular intervals (e.g., daily, weekly, yearly, etc.). When Archive-IT scans pages, it compares them to the last archived version and only saves a fresh copy of webpage content if that content has actually changed. In this way, it can be extremely vigilant in tracking changes but also highly efficient in data storage. It also crawls and archives files linked or embedded within web pages, including pdfs, docs, images, and videos, in essence re-creating a fully-functional historical website and not merely a collection of webpage snapshots.

Please see the screenshots below for an overview of some of the main features and benefits of Archive-IT.

Archive-IT Homepage Snapshot

The screenshot below shows the Archive-IT home page, where users can search the collections of web domain archives selected and curated by Archive-IT subscribing institutions.

Archive-IT Home Page Snapshot

Archive-IT Sample Archived Site Home Page: Oscars.org

The image below exemplifies an Archive-IT collection (domain) homepage. It includes curated metadata about the collection, as well as information about how many times the sites has been archived and what additional file types (e.g., videos) were included in the archiving.

Archive-IT Oscars.org archived site

Archive-IT Sample Domain: Oscars.org History of Captures

Archive-IT Domain History of Captures

Archive-IT Sample Timestamped Archived Webpage: Oscars.org

Archive-IT Sample Archived Webpage Oscars.org