archiving preserving web content the internet archive
play

ARCHIVING & PRESERVING WEB CONTENT THE INTERNET ARCHIVE What? - PowerPoint PPT Presentation

ARCHIVING & PRESERVING WEB CONTENT THE INTERNET ARCHIVE What? A non-profit digital library and archive Where? San Francisco, CA When? Who? Founded in 1996 by Brewster Kahle How? Officially designated a library by the state of California


  1. ARCHIVING & PRESERVING WEB CONTENT

  2. THE INTERNET ARCHIVE What? A non-profit digital library and archive Where? San Francisco, CA When? Who? Founded in 1996 by Brewster Kahle How? Officially designated a library by the state of California in 2007

  3. THE WAYBACK MACHINE Online: https://archive.org/web/ The largest publicly available web archive in existence. > 280 Billion Pages > 100 million websites > 150 languages ~ 1 billion URLs added per week

  4. WEB ARCHIVING What is a web archive? A collection of archived URLs grouped by theme, event, subject area, or web address. A web archive contains as much as possible from the original resources and documents their change over time. It is a priority to recreate the same experience a user would have had if they had visited the live site on the day it was archived.

  5. THE LIFESPAN OF A WEBSITE How long does a website last? In general, a typical web page can be expected to last ~90-100 days before changing, moving, or disappearing completely. > In 2013, our colleagues at Old Dominion University determined that over 10% of event related content posted to social media platforms is lost after one year. > In 2014, a study by UCLA determined that 7-in-10 scholarly articles that include citations with hyperlinks suffer from reference rot .

  6. ARCHIVE-IT: A WEB ARCHIVING SERVICE A web-based application launched in 2006 that allows users to create, manage, access and store collections of web-based digital content. A fully hosted solution , including access and storage. A suite of tools for selecting and scoping, and cataloging . Provides the ability to capture content using 10 different frequencies . Archived web content includes: html, text, videos, audio, social media, PDF, images, password protected content, static databases and newspapers. Browse archived content 24 hours after a capture is complete; full text search is available within 7 days. Private access options are available.

  7. HOW IS ARCHIVE-IT DIFFERENT THAN THE GENERAL/GLOBAL WAYBACK? Focused collections One collection Control over scope and frequency Snapshot Technical support Automated All content and metadata indexed for search Search and cataloging not available Archived data shipped/downloaded Shipping/download not available Private access options Public access only Available 24 hours after captured Access varies Subscription service Absolutely free

  8. WHAT OUR PARTNERS ARE COLLECTING...

  9. ARCHIVE-IT USE CASES Create a thematic/topical web archive on a specific subject or event > Often related to traditional collecting activity around the same topical focus > Capture spontaneous events > Document different perspectives and social commentaries Fulfill a mandate to capture/preserve evolving web history > Construct a historical record of an institution or individual’s web/social media presence > Support an electronic records system to meet records retention requirements > Collect publications/documents that are no longer in print form Closure crawls > Document a public institution’s presence on the web before it changes or closes

  10. UNIVERSITY OF ALBERTA: ALBERTA FLOODS JUNE 2013 Use Case: Archive web content before, during, and after the 2013 Alberta floods > Institutional websites > Personal and institutional blogs > News articles

  11. WILFRID LAURIER UNIVERSITY Use Cases: > Archive the university’s web presence in order to meet required records retention mandates. > Document the university’s social media presence

  12. ACCESS TO COLLECTIONS Partners: > Can view through private web application with login/password General Public: > Can view from Archive-It’s website: http://www.archive-it.org/ > Search Archive-It data and metadata from institutional domains > Landing Pages : branded pages that link back to Archive-It hosted data

  13. EXAMPLES OF ORGANIZATIONS’ LANDING PAGES University of Texas at Austin Library of Virginia

  14. PRIVATE ACCESS OPTIONS > Entire account > Individual collections > Specific URLs > IP address

  15. STORAGE AND PRESERVATION Storage: > 2 copies (primary & backup) of archived data are stored at San Francisco data centers. > A third copy is transferred to the General Archive. > A copy of archived data can be shipped on a hard drive > Partners can always download their archived data from Internet Archive’s servers. Preservation partnerships: > 2008: LOCKSS > 2013: DuraCloud > 2017: Multiple in development...

  16. DATA REPOSITORY

  17. KEY ARCHIVE-IT FEATURES > Different levels of access for account users > Ten available capture frequencies (from twice daily to yearly) > Browse collections by URL, search by full-text and metadata > Detailed post crawl reports for analysis > Quality Assurance (QA) tools > Online Help Center and User Manual > Web Archivists and technical support > Hosting, access, and redundant storage

  18. SUBSCRIPTION MODEL > Annual, renewable subscription > Subscription levels vary by the amount of archived data archived > Factors include: type and number of sites, how large they are, and how frequently they are archived > All subscriptions include hosting, access, and perpetual storage (primary and backup)

  19. TIME COMMITMENTS 19% 5% 5% 58% 13% Staff dedicated to web archiving program NDSA, Web Archiving in the United States: A 2016 Survey

  20. THE WEB ARCHIVING LIFE CYCLE http://www.archive-it.org/publications

  21. COMPLIMENTARY TRIAL Create a collection of up to 5 websites, archive content, and view the results!

  22. ARCHIVE-IT WEB APPLICATION DEMO STO

  23. LEARN MORE Check out our blog: www.archive-it.org/blog Follow us on Twitter: @archiveitorg Like us on Facebook: https://www.facebook.com/ArchiveIt THANK YOU! Questions? ait@archive.org

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend