Concise Preservation by combining Managed Forgetting and - - PowerPoint PPT Presentation

concise preservation by combining
SMART_READER_LITE
LIVE PREVIEW

Concise Preservation by combining Managed Forgetting and - - PowerPoint PPT Presentation

Concise Preservation by combining Managed Forgetting and Contextualized Remembering Research Talk, May 9, 2014 University of Twente, Enschede Speaker: Nattiya Kanhabua L3S Research Center / University of Hannover ForgetIT Project Consortium


slide-1
SLIDE 1

Speaker: Nattiya Kanhabua L3S Research Center / University of Hannover

Concise Preservation by combining Managed Forgetting and Contextualized Remembering

Research Talk, May 9, 2014 University of Twente, Enschede

slide-2
SLIDE 2

An interdisciplinary team of experts in:

  • Preservation, information management, information extraction
  • Multimedia analysis, storage computing, cognitive psychology

ForgetIT Project Consortium

slide-3
SLIDE 3

Overview of the ForgetIT project

  • Motivation
  • Example use cases

Work Package 3: Managed forgetting

  • Objective
  • Achievements in Year 1

Outline

slide-4
SLIDE 4

However, we are facing:

  • Dramatic increase in content creation (e.g. digital photos)
  • Increasing use of mobile devices with restricted capacity
  • Information overload and changing professional + private lives
  • Inadvertent forgetting in lack of systematic preservation

Forgetting plays a crucial role for human remembering and life (focus, stress on important information, forgetting of details)

A Computer that forgets ? Intentionally ?? And in context of preservation???

Shouldn't there be something like forgetting in digital memories as well?

Forget IT

slide-5
SLIDE 5

Motivation

  • major progress in

preservation technology

  • maturing Information

extraction technology

  • storage as service (e.g.

clouds) Opportunities

  • increasing amount of

digital content handled over decades

  • more or less systematic

backup strategies used

  • non-paper practices for

long-term perspective required Needs

  • large gap for adoption
  • high-up front cost
  • no established

practices

  • lack of understanding
  • f benefit
  • reluctance to invest

Major Obstacles

slide-6
SLIDE 6

Vision: Building a Bridge

  • major progress in

preservation technology

  • maturing information

extraction technology

  • storage as service (e.g.

clouds) Opportunities

  • increasing amount of

digital content handled over decades

  • more or less systematic

backup strategies used

  • non-paper practices for

long-term perspective required Needs

ForgetIT

Enabling smooth transition to preservation Creating immediate benefit + reducing effort Opening alternatives to “keep it all” and “forgetting by accident” Easing interpretation in the long run taking inspiration from and complementing human memory

  • large gap for adoption
  • high-up front cost
  • no established

practices

  • lack of understanding
  • f benefit
  • reluctance to invest

Major Obstacles

slide-7
SLIDE 7

Building the Bridge

Managed Forgetting

Synergetic Preservation Contextualized Remembering

  • bringing back information

into active use in a meaningful way

  • as opposed to the current

“forgetting by accident”

  • inspired by human

forgetting

  • couples information

management and preservation management

slide-8
SLIDE 8
  • High

awareness

  • f trip details
  • Showing of

pictures

  • Sorting out

redundant pictures

  • Sub-

grouping and sorting

Simple Example: Holidays

+20 Years +5-10 Years +1 Years after trip +1 month

  • Trip to

Paris with Friends

  • Thousands
  • f pictures
  • Life goes on
  • Pictures go
  • ut of focus
  • Creation of a

small diverse subset for showing

  • ccasionally
  • Creation of

summary page

  • Addition of

context info

  • Further

reduction of redundancy

  • Rest of

pictures into archive

February 2015 Paris Team: Me, Mary Christine, Tom

  • Changes in

life (e.g. marriage)

  • Addition/

update of context information

  • Dealing

with preservatio n issues

girlfriend

slide-9
SLIDE 9
  • High

awareness

  • f trip details
  • Showing of

pictures

  • Sorting out

redundant pictures

  • Sub-

grouping and sorting

Simple Example: Holidays

+20 Years +5-10 Years +1 Years after trip +1 month

  • Trip to

Paris with Friends

  • Thousands
  • f pictures
  • Life goes on
  • Pictures go
  • ut of focus
  • Creation of a

small diverse subset for showing

  • ccasionally
  • Creation of

summary page

  • Addition of

context info

  • Further

reduction of redundancy

  • Rest of

pictures into archive

February 2015 Paris Team: Me, Mary Christine, Tom

  • Changes in

life (e.g. marriage)

  • Addition/

update of context information

  • Dealing

with preservatio n issues

girlfriend Girlfriend wife

slide-10
SLIDE 10
  • High

awareness

  • f trip details
  • Showing of

pictures

  • Sorting out

redundant pictures

  • Sub-

grouping and sorting

Simple Example: Holidays

+20 Years +5-10 Years +1 Years after trip +1 month

  • Trip to

Paris with Friends

  • Thousands
  • f pictures
  • Life goes on
  • Pictures go
  • ut of focus
  • Creation of a

small diverse subset for showing

  • ccasionally
  • Creation of

summary page

  • Addition of

context info

  • Further

reduction of redundancy

  • Rest of

pictures into archive

February 2015 Paris Team: Me, Mary Christine, Tom

  • Changes in

life (e.g. marriage)

  • Addition/

update of context information

  • Dealing

with preservatio n issues

girlfriend Girlfriend wife

  • Revisiting
  • f Photo of

trip photos

  • Re-

integration into overall photo collection (link into context)

slide-11
SLIDE 11

Managed Forgetting

Inspired by central role of human forgetting:

  • help in identifying and focus on relevant information
  • support preservation content selection
  • replace inadvertent forgetting

Based on:

  • Careful information value assessment
  • Forgetting strategies via policies
  • Forgetting options to integrate final manual checking

before deletion

  • Combination with multi-tier storage solution

possible

Managed forgetting ≠ automatic deletion Instead: range of forgetting options e.g.

  • resource condensation
  • change of indexing & ranking
  • reduction of redundancy

decreasing memory buoyancy

Use of tiers

slide-12
SLIDE 12

Contextualized Remembering

Aim: Bring back information into active use in a meaningful way even if a lot of time has passed Aim for semantic level of preservation Based on: Take into account relevant parts of context when moving to archive Increase contextualization of preserved content Consider context evolution over time (evolution-aware contextualization)

  • A. Ceroni, N. K. Tran, N. Kanhabua and C. Niederée, Bridging Temporal Context Gaps using

Time-Aware Re-Contextualization, (To appear) SIGIR’2014

slide-13
SLIDE 13

Evolution-aware Contextualization & Re-contextualization

Context of Interpretation

t

C C‘

Archival Information System

Pres(D‘) Pres(C‘)

Information System

Human Forgetting Change in focus Structural changes

C‘‘

Evolution-aware Contextualization Re-contextualization

Pres(D‘) Pres(C‘‘)

Semantic evolution Structural evolution Terminology evolution

Pres(D‘) Pres(C‘‘)

D

Contextualization

C‘‘‘ D

Context-aware Preservation Semantic Evolution Detection

D D

slide-14
SLIDE 14

Work Package 3: Managed Forgetting

  • V. Mayer-Schönberger. Delete - The Virtue of Forgetting

in the Digital Age. Morgan Kaufmann Publishers, 2009.

slide-15
SLIDE 15

WP3 Objectives

  • Conceptual model for managed forgetting

 Foundations of human-brain inspired managed forgetting

  • Development of managed forgetting methods

 Information value assessment  Set of methods for Preserve-or-Forget  Policy-driven approach to managed forgetting (Y2)

Focus of Year 1

  • Conceptual model for managed forgetting
  • Design and implement the core managed forgetting process
  • Exploratory research of information value assessment

Objectives of WP3 and Year 1 Focus

slide-16
SLIDE 16

Role in Preserve-or-Forget Architecture

slide-17
SLIDE 17

Research questions and first ideas for complementing human memory (co-worked with WP2, D3.1)

  • Episodic memory: reconstruct lifetime memories and support reminiscence
  • Working memory: better focus in current information use

Information value assessment (co-worked with WP9, D3.2)

  • Data model and a computation method based on Semantic Web technologies
  • Integration to PIMO semantic desktop and Preserve-or-Forget middleware

Exploratory studies (D3.2)

  • Analyzing collective memory of public events in Wikipedia
  • Analyzing high-impact features for content retention in the Social Web
  • Feature selection for efficiency and scalability

Achievements in Year 1

slide-18
SLIDE 18

Goal: understand how to complement human memory processes Focus on two types of memories:

  • Episodic memory: support reminiscence of long-term autobiographical events
  • Working memory: better focus in current information use, e.g. de-cluttering

personal information spaces

Two information values: memory buoyancy, and preservation value

Complementing Human Memory: Our First Ideas

slide-19
SLIDE 19

Memory buoyancy

  • Information objects sinking down with decreasing importance, usage, etc.

Preservation value

  • Used to decide which information object will be preserved or archived

Information Value Assessment

Memory Buoyancy Preservation Value Short-/Mid-term current interests E.g. meeting or travel documents Long-term need for future use E.g. important life events Subjective metrics + usage logs (views, edits, modifies) + time, e.g., aging or recency + social context, external influences Objective metrics + diversity, coverage, quality

slide-20
SLIDE 20

Rapidly forget details -> “less redundancy” Reconstruct from similar events, context Rely on common patterns -> “false memory” Our first ideas:

  • Store details differing among similar event types forgotten in human memory
  • Event-centric organization of digital items can play an important role

Forgetting in Episodic Memory

slide-21
SLIDE 21

Memory bumps or peaks in the forgetting curve Reminded or triggered the original memory by:

  • A physical object (e.g. a printed photo)
  • A digital memory system
  • Different subsequent events

Our ideas:

  • Propagate increased interest in an event to related events
  • Consider common things, e.g., same entities, or similar event types
  • Increase relevance level or use of memory buoyancy

Triggering of Memories

slide-22
SLIDE 22

Analyzing Collective Memory in Wikipedia

Identify catalysts for reviving memories Analyze re-visiting behaviors

  • Page views of a large set of events
  • Time series analysis

11 Wikipedia categories

  • Number of triggering events
  • Number of events possibly triggered
slide-23
SLIDE 23

Temporal and spatial distributions

  • Strong focus on more recent events
  • Better coverage with increasing popularity
  • Most frequent locations depending on event types

Temporal and Spatial Distributions

slide-24
SLIDE 24

Our Approach and Results

Remembering score as a function (e.g., detecting co-peaks in views)

  • f revisiting behavior

Correlate remembering scores vs. time and location similarities

Hurricane Sandy

Findings:

  • Hurricane Sandy triggers 1991 Perfect Storm

initially formed around Canada area, which is high impact (most destructive and costly) ones

  • 2011 Christchurch earthquake triggers recent

events in the same region, i.e., 2010 Canterbury earthquake

slide-25
SLIDE 25

Our Approach and Results

Remembering score as a function (e.g., detecting co-peaks in views)

  • f revisiting behavior

Correlate remembering scores vs. time and location similarities

Hurricane Sandy 2011 Christchurch earthquake

Findings:

  • Hurricane Sandy triggers 1991 Perfect Storm

initially formed around Canada area, which is high impact (most destructive and costly) ones

  • 2011 Christchurch earthquake triggers recent

events in the same region, i.e., 2010 Canterbury earthquake

slide-26
SLIDE 26

Memory Buoyancy: Simplified Computation

Memory Buoyancy Time Compute: MB(D, t) Time Access Logs t1 t2

slide-27
SLIDE 27

Memory Buoyancy: Simplified Computation

Memory Buoyancy Time Compute: MB(D, t) Time Access Logs t1 t2

slide-28
SLIDE 28

Memory Buoyancy: Simplified Computation

Memory Buoyancy Time Compute: MB(D, t) Time Access Logs t1 t2

slide-29
SLIDE 29

Proposed MB assessment framework:

  • Initialize MB values of resources

using a time-decay forgetting function:

  • Incrementally update MB using

Random Walk on resource graph:

Memory Buoyancy Assessment

 

| ' | ) (

) (

t t t

DecayRate r mb

r

e2

Edfringe photo (2011) Photos @ iPhone

e3

Folder @ computer

e1

Shortcut folder @ desktop

e4 e6

Photo @ ForgetIT Meeting (2013)

contains contains contains hasSamePlace hasSamePlace e5 hasEntity

Whiskey photo (2012)

         

2 ) ( 1 ) ( 2 1 ) (

4 ) ( 6 ) ( ) 1 (

e mb e mb r mb

t Dash t Dash t Dash

Averaged value over two inlinked resources Less propagation account for two outlinks

hasSamePlace e5

Whiskey Tour (2009)

hasSamePlace

slide-30
SLIDE 30

Social Web apps gain popularity Personal Web archives Study: Identifying memorable content

  • 20 participants, 15 male and 5 female
  • Rate (3,330) posts by relevance for future

Content Retention in Social Web Applications

Year in Review: photo from the Internet

slide-31
SLIDE 31

Machine learning techniques

  • Support vector machine, Bayesian network, and decision tree (J48)

80 features from categories:

  • Content types + meta data
  • Social interactions
  • Temporal
  • Privacy
  • Graph

Correlation-based feature selection (CFS)

  • Temporal: highest impact features
  • Graph: low impact for memorable posts

Learning to Classify Memorable Content

slide-32
SLIDE 32

Classification results:

  • Baseline Features (CS): No. of likes, comments, and shares
  • Baseline 69% (F-Measure)
  • Top 9 features 79% (F-Measure)

Classification Results

slide-33
SLIDE 33

Classification results:

  • Baseline Features (CS): No. of likes, comments, and shares
  • Baseline 69% (F-Measure)
  • Top 9 features 79% (F-Measure)

Classification Results

slide-34
SLIDE 34

1.

  • M. Georgescu, D. D. Pham, N. Kanhabua, S. Zerr, S. Siersdorfer and W. Nejdl, Temporal Summarization of

Event-Related Updates in Wikipedia (demo), Proceedings of the 22nd International World Wide Web Conference (WWW'13), May, 2013.

2.

  • M. Georgescu, N. Kanhabua, D. Krause, W. Nejdl and S. Siersdorfer, Extracting Event-Related Information from

Article Updates in Wikipedia, Proceedings of the 35th European conference on Advances in Information Retrieval (ECIR'13), March, 2013.

3.

  • N. Kanhabua and C. Niederée, Preservation and Forgetting: Friends or Foes?, In the First International

Workshop on Archiving Community Memories (in conjunction with iPRES'2013), September, 2013.

4.

  • N. Kanhabua, C. Niederée and W. Siberski, Towards Concise Preservation by Managed Forgetting: Research

Issues and Case Study, Proceedings of the 10th International Conference on Preservation of Digital Objects (iPRES'2013), September, 2013.

5.

  • K. D. Naini and I.S. Altingovde, Exploiting Result Diversification Methods for Feature Selection in Learning to

Rank, Proceedings of the 36th European conference on Advances in Information Retrieval (ECIR'2014), April, 2014.

6.

  • A. Ceroni and M. Fisichella, Towards an Entity-based Automatic Event Validation, Proceedings of the 36th

European conference on Advances in Information Retrieval (ECIR'2014), April, 2014.

7.

  • T. N. Nguyen and N. Kanhabua, Leveraging Dynamic Query Subtopics for Time-aware Search Result

Diversification, Proceedings of the 36th European conference on Advances in Information Retrieval (ECIR'2014), April, 2014.

8.

  • K. D. Naini, R. Kawase, N. Kanhabua and C. Niederée, Characterizing High-impact Features for Content

Retention in Social Web Applications (poster), Proceedings of the 23rd International World Wide Web Conference (WWW'2014), Seoul, Korea, April, 2014.

9.

  • T. A. Tran, M. Georgescu, X. Zhu and N. Kanhabua, Ars longa, vita brevis: Analysing the Duration of Trending

Topics in Twitter Using Wikipedia (poster), (To appear) Proceedings of the ACM Web Science 2014 Conference (WebSci'2014), Bloomington, USA, June, 2014.

Publications

slide-35
SLIDE 35

Thank you for your attention!