WASP: Web Archiving and Search Personalized Johannes Kiesel , Arjen P - - PowerPoint PPT Presentation

wasp web archiving and search personalized
SMART_READER_LITE
LIVE PREVIEW

WASP: Web Archiving and Search Personalized Johannes Kiesel , Arjen P - - PowerPoint PPT Presentation

WASP: Web Archiving and Search Personalized Johannes Kiesel , Arjen P . de Vries, Matthias Hagen, Benno Stein and Martin Potthast @KieselJohannes, @arjenpdevries, @matthias_hagen, @bennostein, @martinpotthast DESIRES, August 29 th 2018 1


slide-1
SLIDE 1

WASP: Web Archiving and Search Personalized

Johannes Kiesel, Arjen P . de Vries, Matthias Hagen, Benno Stein and Martin Potthast @KieselJohannes, @arjenpdevries, @matthias_hagen, @bennostein, @martinpotthast DESIRES, August 29th 2018

1 @KieselJohannes 2018

slide-2
SLIDE 2

The Personal Search Engine: Motivation

2 @KieselJohannes 2018

slide-3
SLIDE 3

The Personal Search Engine: Motivation

3 @KieselJohannes 2018

slide-4
SLIDE 4

The Personal Search Engine: Motivation

4 @KieselJohannes 2018

slide-5
SLIDE 5

The Personal Search Engine: Motivation

5 @KieselJohannes 2018

slide-6
SLIDE 6

The Personal Search Engine: Motivation

6 @KieselJohannes 2018

slide-7
SLIDE 7

The Personal Search Engine: Motivation

7 @KieselJohannes 2018

slide-8
SLIDE 8

The Personal Search Engine: Motivation

8 @KieselJohannes 2018

slide-9
SLIDE 9

The Personal Search Engine: Motivation

9 @KieselJohannes 2018

slide-10
SLIDE 10

The Personal Search Engine: Motivation

10 @KieselJohannes 2018

slide-11
SLIDE 11

The Personal Search Engine: Motivation

11 @KieselJohannes 2018

slide-12
SLIDE 12

The Personal Search Engine: Motivation

12 @KieselJohannes 2018

slide-13
SLIDE 13

The Personal Search Engine: Inspiration

13 @KieselJohannes 2018

slide-14
SLIDE 14

The Personal Search Engine: Inspiration

Personal search engine!

14 @KieselJohannes 2018

slide-15
SLIDE 15

WASP

15 @KieselJohannes 2018

slide-16
SLIDE 16

WASP

16 @KieselJohannes 2018

slide-17
SLIDE 17

WASP

17 @KieselJohannes 2018

slide-18
SLIDE 18

WASP

18 @KieselJohannes 2018

slide-19
SLIDE 19

WASP

19 @KieselJohannes 2018

slide-20
SLIDE 20

WASP

Index Search Interface WARCs pywb Browser World Wide Web warcprox

20 @KieselJohannes 2018

slide-21
SLIDE 21

WASP

Index Search Interface WARCs pywb Browser World Wide Web warcprox proxy

❑ All requests (

) and responses ( ) while browsing are stored and indexed

❑ Page on localhost allows to search. Result page links to archive where... ❑ Visited pages are reproduced for the corresponding time

21 @KieselJohannes 2018

slide-22
SLIDE 22

WASP

Index Search Interface WARCs pywb Browser World Wide Web warcprox

/search

❑ All requests (

) and responses ( ) while browsing are stored and indexed

❑ Page on localhost allows to search. Result page links to archive where... ❑ Visited pages are reproduced for the corresponding time

22 @KieselJohannes 2018

slide-23
SLIDE 23

WASP

Index Search Interface WARCs pywb Browser World Wide Web warcprox

/archive/<time>/<url>

❑ All requests (

) and responses ( ) while browsing are stored and indexed

❑ Page on localhost allows to search. Result page links to archive where... ❑ Visited pages are reproduced for the corresponding time

23 @KieselJohannes 2018

slide-24
SLIDE 24

WASP

Personal search engine!

24 @KieselJohannes 2018

slide-25
SLIDE 25

WASP

25 @KieselJohannes 2018

slide-26
SLIDE 26

Insight 1: Not Indexing Near-duplicates

❑ What changes warrant a re-archiving?

26 @KieselJohannes 2018

slide-27
SLIDE 27

Insight 2: Browsable and Deletable History

27 @KieselJohannes 2018

slide-28
SLIDE 28

Insight 3: Easy (De-)activation of Archiving

Easy activation, deactivation, and status-check Patterns

28 @KieselJohannes 2018

slide-29
SLIDE 29

Insight 4: Combined Indexing of Sub-pages

❑ Should visited sub-pages of a single article be indexed as one? ❑ If so, to which sub-page should be linked in the result list?

29 @KieselJohannes 2018

slide-30
SLIDE 30

Insight 5: Personalized Search

30 @KieselJohannes 2018

slide-31
SLIDE 31

Insight 5: Personalized Search

31 @KieselJohannes 2018

slide-32
SLIDE 32

WASP: Web Archiving and Search Personalized

Insights overview

❑ Not Indexing Near-duplicates ❑ Browsable and Deletable History ❑ Easy (De-)activation of Archiving ❑ Combined Indexing of Sub-pages ❑ Personalized Search

Code and Instructions on Github github.com/webis-de/wasp

Thank you for your attention!

32 @KieselJohannes 2018