Web Archives: Home

Intro to Web Archives

What is Web Archiving?

Web archiving is the process of collecting, preserving, and providing enduring access to web content.

Why Archive the Web?

The web is continually changing and each day content disappears.

How does it work?

Web Crawlers

"Web crawlers are one widely used web archiving method. A crawler begins to archive when a user specifies a starting “seed” URL. It creates and saves a facsimile of the seed, then identifies, follows, and copies links leading out from that page. The crawler repeats these steps until it reaches a user-specified limit defined in terms of host domain, number of documents, data, page quantity, or time."

Wickner, Amy. “Recognizing Co-Creators in Four Configurations: Critical Questions for Web Archiving.” Journal of Contemporary Archival Studies 6, no. 1 (2019): Article 23. https://elischolar.library.yale.edu/jcas/vol6/iss1/23, preserved at https://perma.cc/9BYW-6VFD.

Web ARChive format (WARC)

A WARC is "a file format for concatenating several resources, each consisting of a set of simple text headers and an arbitrary data block, into one long file." WARCs are used to save data captured by web crawlers.

Internet Archive. “Frequently Asked Questions.” Archive-It.org. https://archive-it.org/blog/products-and-services/archive-it-faqs, preserved at https://perma.cc/7FCZ-2DQH.

Web Archives Replay Mechanisms

Users of web archives rely on tools like the Wayback Machine or ReplayWeb to reproduce content captured by web crawlers.

Digital Services Manager

Dalton Alves

he/him

Email Me

Contact:

Why Archive the Web?

2130 H Street NWWashington DC 20052202.994.6558

AskUs@gwu.edu