Link Ranking group Archives Unleashed 4.0 Gregory Wiedeman - - PowerPoint PPT Presentation

link ranking group
SMART_READER_LITE
LIVE PREVIEW

Link Ranking group Archives Unleashed 4.0 Gregory Wiedeman - - PowerPoint PPT Presentation

Link Ranking group Archives Unleashed 4.0 Gregory Wiedeman (University at Albany, SUNY) Mindaugas Vidmantas (The British Library) Peter Webster (Independent Scholar & Consultant) Kees Teszelszky (National Library of the Netherlands)


slide-1
SLIDE 1

Link Ranking group

Archives Unleashed 4.0

Gregory Wiedeman (University at Albany, SUNY) Mindaugas Vidmantas (The British Library) Peter Webster (Independent Scholar & Consultant) Kees Teszelszky (National Library of the Netherlands) Richard Deswarte (University of East Anglia)

slide-2
SLIDE 2

Are All Links Created Equal?

  • WarcBase scripts to export manageable raw HTML
  • Load into BeautifulSoup Python library
  • Look for <a> parents
  • Should we weigh links with certain parents more during during link

analysis?

  • Are navigational links (<li>) different that content links (<p> or <div>)?
slide-3
SLIDE 3

1 Rio WARC Link Parent Element Type Distribution

slide-4
SLIDE 4

4 WARC Link Parent Element Type Distribution

slide-5
SLIDE 5

Relative vs hardcoded in Rio (2016)

slide-6
SLIDE 6

Relative vs hardcoded in Rio (2016)

slide-7
SLIDE 7

CPP Link Parent Element Type Distribution

slide-8
SLIDE 8

Relative vs hardcoded in CPP (2005)

slide-9
SLIDE 9

Absolute and Relative Paths in 2005 CPP vs. 2016 Rio WARCs