Skip to Main Content
Pollak Library

Web Archiving: Archiving Web Sites on the Internet Archive

Internet archiving overview, tools, policies, and resources for California State University, Fullerton.

 

The Internet Archive's Wayback Machine & ArchiveIT

Wayback Machine

Wayback Machine: A Free Tool for Page-Level Web Archiving

The Internet Archive is a non-profit corporation that hosts the largest webpage repository in the world. One of its most used tools is the Wayback Machine, which allows anyone--free of charge--to create a static, permanent snapshot of a webpage at a specific moment in time. Once generated, snapshots have their own unique URLs/permalinks, which can be used to create reliable hyperlinks and/or bibliographic citations in academic writings.

The screenshots below illustrate the key features and benefits of the Wayback Machine as a webpage archiving tool/platform. The Internet Archive blog gives a fuller introduction to the Wayback Machine and its suite of related webpage archiving tools.

Wayback Machine: View All Past Webpage Snapshots

Wayback Machine: View All Past Webpage Snapshots

Use the Wayback Machine to get an overview of when all past snapshots were taken of a specific webpage.

Wayback Machine Page History View

Wayback Machine Snapshot

Wayback Machine: View a Specific Webpage Snapshot

Use the Wayback Machine to view and cite a webpage snapshot at a specific moment in time.

Wayback Machine webpage snapshot

Wayback Machine

Wayback Machine: Create a New Webpage Snapshot

Use the Wayback Machine to create a new webpage snapshot and time-specific permalink:

Wayback Machine: Create a New Webpage Snapshot

Archive-IT

Archive-IT Introduction

Overview

Overview of Domain-Level Archiving with Archive-IT

Archive-IT is a subscription-based service that allows institutions to have whole domains or subdomains crawled and archived at regular intervals (e.g., daily, weekly, yearly, etc.). When Archive-IT scans pages, it compares them to the last archived version and only saves a fresh copy of webpage content if that content has actually changed. In this way, it can be extremely vigilant in tracking changes but also highly efficient in data storage. It also crawls and archives files linked or embedded within web pages, including pdfs, docs, images, and videos, in essence re-creating a fully-functional historical website and not merely a collection of webpage snapshots.

Please see the screenshots below for an overview of some of the main features and benefits of Archive-IT.

Archive-IT Sample

Archive-IT Sample Archived Site Home Page: Oscars.org

The image below exemplifies an Archive-IT collection (domain) homepage. It includes curated metadata about the collection, as well as information about how many times the sites has been archived and what additional file types (e.g., videos) were included in the archiving.

Archive-IT Oscars.org archived site

Archive-IT Snapshot

Archive-IT Homepage Snapshot

The screenshot below shows the Archive-IT home page, where users can search the collections of web domain archives selected and curated by Archive-IT subscribing institutions.

Archive-IT Home Page Snapshot