Concise Preservation by combining Managed Forgetting and - - PowerPoint PPT Presentation
Concise Preservation by combining Managed Forgetting and - - PowerPoint PPT Presentation
Concise Preservation by combining Managed Forgetting and Contextualized Remembering Research Talk, May 9, 2014 University of Twente, Enschede Speaker: Nattiya Kanhabua L3S Research Center / University of Hannover ForgetIT Project Consortium
An interdisciplinary team of experts in:
- Preservation, information management, information extraction
- Multimedia analysis, storage computing, cognitive psychology
ForgetIT Project Consortium
Overview of the ForgetIT project
- Motivation
- Example use cases
Work Package 3: Managed forgetting
- Objective
- Achievements in Year 1
Outline
However, we are facing:
- Dramatic increase in content creation (e.g. digital photos)
- Increasing use of mobile devices with restricted capacity
- Information overload and changing professional + private lives
- Inadvertent forgetting in lack of systematic preservation
Forgetting plays a crucial role for human remembering and life (focus, stress on important information, forgetting of details)
A Computer that forgets ? Intentionally ?? And in context of preservation???
Shouldn't there be something like forgetting in digital memories as well?
Forget IT
Motivation
- major progress in
preservation technology
- maturing Information
extraction technology
- storage as service (e.g.
clouds) Opportunities
- increasing amount of
digital content handled over decades
- more or less systematic
backup strategies used
- non-paper practices for
long-term perspective required Needs
- large gap for adoption
- high-up front cost
- no established
practices
- lack of understanding
- f benefit
- reluctance to invest
Major Obstacles
Vision: Building a Bridge
- major progress in
preservation technology
- maturing information
extraction technology
- storage as service (e.g.
clouds) Opportunities
- increasing amount of
digital content handled over decades
- more or less systematic
backup strategies used
- non-paper practices for
long-term perspective required Needs
ForgetIT
Enabling smooth transition to preservation Creating immediate benefit + reducing effort Opening alternatives to “keep it all” and “forgetting by accident” Easing interpretation in the long run taking inspiration from and complementing human memory
- large gap for adoption
- high-up front cost
- no established
practices
- lack of understanding
- f benefit
- reluctance to invest
Major Obstacles
Building the Bridge
Managed Forgetting
Synergetic Preservation Contextualized Remembering
- bringing back information
into active use in a meaningful way
- as opposed to the current
“forgetting by accident”
- inspired by human
forgetting
- couples information
management and preservation management
- High
awareness
- f trip details
- Showing of
pictures
- Sorting out
redundant pictures
- Sub-
grouping and sorting
Simple Example: Holidays
+20 Years +5-10 Years +1 Years after trip +1 month
- Trip to
Paris with Friends
- Thousands
- f pictures
- Life goes on
- Pictures go
- ut of focus
- Creation of a
small diverse subset for showing
- ccasionally
- Creation of
summary page
- Addition of
context info
- Further
reduction of redundancy
- Rest of
pictures into archive
February 2015 Paris Team: Me, Mary Christine, Tom
- Changes in
life (e.g. marriage)
- Addition/
update of context information
- Dealing
with preservatio n issues
girlfriend
- High
awareness
- f trip details
- Showing of
pictures
- Sorting out
redundant pictures
- Sub-
grouping and sorting
Simple Example: Holidays
+20 Years +5-10 Years +1 Years after trip +1 month
- Trip to
Paris with Friends
- Thousands
- f pictures
- Life goes on
- Pictures go
- ut of focus
- Creation of a
small diverse subset for showing
- ccasionally
- Creation of
summary page
- Addition of
context info
- Further
reduction of redundancy
- Rest of
pictures into archive
February 2015 Paris Team: Me, Mary Christine, Tom
- Changes in
life (e.g. marriage)
- Addition/
update of context information
- Dealing
with preservatio n issues
girlfriend Girlfriend wife
- High
awareness
- f trip details
- Showing of
pictures
- Sorting out
redundant pictures
- Sub-
grouping and sorting
Simple Example: Holidays
+20 Years +5-10 Years +1 Years after trip +1 month
- Trip to
Paris with Friends
- Thousands
- f pictures
- Life goes on
- Pictures go
- ut of focus
- Creation of a
small diverse subset for showing
- ccasionally
- Creation of
summary page
- Addition of
context info
- Further
reduction of redundancy
- Rest of
pictures into archive
February 2015 Paris Team: Me, Mary Christine, Tom
- Changes in
life (e.g. marriage)
- Addition/
update of context information
- Dealing
with preservatio n issues
girlfriend Girlfriend wife
- Revisiting
- f Photo of
trip photos
- Re-
integration into overall photo collection (link into context)
Managed Forgetting
Inspired by central role of human forgetting:
- help in identifying and focus on relevant information
- support preservation content selection
- replace inadvertent forgetting
Based on:
- Careful information value assessment
- Forgetting strategies via policies
- Forgetting options to integrate final manual checking
before deletion
- Combination with multi-tier storage solution
possible
Managed forgetting ≠ automatic deletion Instead: range of forgetting options e.g.
- resource condensation
- change of indexing & ranking
- reduction of redundancy
decreasing memory buoyancy
Use of tiers
Contextualized Remembering
Aim: Bring back information into active use in a meaningful way even if a lot of time has passed Aim for semantic level of preservation Based on: Take into account relevant parts of context when moving to archive Increase contextualization of preserved content Consider context evolution over time (evolution-aware contextualization)
- A. Ceroni, N. K. Tran, N. Kanhabua and C. Niederée, Bridging Temporal Context Gaps using
Time-Aware Re-Contextualization, (To appear) SIGIR’2014
Evolution-aware Contextualization & Re-contextualization
Context of Interpretation
t
C C‘
Archival Information System
Pres(D‘) Pres(C‘)
Information System
Human Forgetting Change in focus Structural changes
C‘‘
Evolution-aware Contextualization Re-contextualization
Pres(D‘) Pres(C‘‘)
Semantic evolution Structural evolution Terminology evolution
Pres(D‘) Pres(C‘‘)
D
Contextualization
C‘‘‘ D
Context-aware Preservation Semantic Evolution Detection
D D
Work Package 3: Managed Forgetting
- V. Mayer-Schönberger. Delete - The Virtue of Forgetting
in the Digital Age. Morgan Kaufmann Publishers, 2009.
WP3 Objectives
- Conceptual model for managed forgetting
Foundations of human-brain inspired managed forgetting
- Development of managed forgetting methods
Information value assessment Set of methods for Preserve-or-Forget Policy-driven approach to managed forgetting (Y2)
Focus of Year 1
- Conceptual model for managed forgetting
- Design and implement the core managed forgetting process
- Exploratory research of information value assessment
Objectives of WP3 and Year 1 Focus
Role in Preserve-or-Forget Architecture
Research questions and first ideas for complementing human memory (co-worked with WP2, D3.1)
- Episodic memory: reconstruct lifetime memories and support reminiscence
- Working memory: better focus in current information use
Information value assessment (co-worked with WP9, D3.2)
- Data model and a computation method based on Semantic Web technologies
- Integration to PIMO semantic desktop and Preserve-or-Forget middleware
Exploratory studies (D3.2)
- Analyzing collective memory of public events in Wikipedia
- Analyzing high-impact features for content retention in the Social Web
- Feature selection for efficiency and scalability
Achievements in Year 1
Goal: understand how to complement human memory processes Focus on two types of memories:
- Episodic memory: support reminiscence of long-term autobiographical events
- Working memory: better focus in current information use, e.g. de-cluttering
personal information spaces
Two information values: memory buoyancy, and preservation value
Complementing Human Memory: Our First Ideas
Memory buoyancy
- Information objects sinking down with decreasing importance, usage, etc.
Preservation value
- Used to decide which information object will be preserved or archived
Information Value Assessment
Memory Buoyancy Preservation Value Short-/Mid-term current interests E.g. meeting or travel documents Long-term need for future use E.g. important life events Subjective metrics + usage logs (views, edits, modifies) + time, e.g., aging or recency + social context, external influences Objective metrics + diversity, coverage, quality
Rapidly forget details -> “less redundancy” Reconstruct from similar events, context Rely on common patterns -> “false memory” Our first ideas:
- Store details differing among similar event types forgotten in human memory
- Event-centric organization of digital items can play an important role
Forgetting in Episodic Memory
Memory bumps or peaks in the forgetting curve Reminded or triggered the original memory by:
- A physical object (e.g. a printed photo)
- A digital memory system
- Different subsequent events
Our ideas:
- Propagate increased interest in an event to related events
- Consider common things, e.g., same entities, or similar event types
- Increase relevance level or use of memory buoyancy
Triggering of Memories
Analyzing Collective Memory in Wikipedia
Identify catalysts for reviving memories Analyze re-visiting behaviors
- Page views of a large set of events
- Time series analysis
11 Wikipedia categories
- Number of triggering events
- Number of events possibly triggered
Temporal and spatial distributions
- Strong focus on more recent events
- Better coverage with increasing popularity
- Most frequent locations depending on event types
Temporal and Spatial Distributions
Our Approach and Results
Remembering score as a function (e.g., detecting co-peaks in views)
- f revisiting behavior
Correlate remembering scores vs. time and location similarities
Hurricane Sandy
Findings:
- Hurricane Sandy triggers 1991 Perfect Storm
initially formed around Canada area, which is high impact (most destructive and costly) ones
- 2011 Christchurch earthquake triggers recent
events in the same region, i.e., 2010 Canterbury earthquake
Our Approach and Results
Remembering score as a function (e.g., detecting co-peaks in views)
- f revisiting behavior
Correlate remembering scores vs. time and location similarities
Hurricane Sandy 2011 Christchurch earthquake
Findings:
- Hurricane Sandy triggers 1991 Perfect Storm
initially formed around Canada area, which is high impact (most destructive and costly) ones
- 2011 Christchurch earthquake triggers recent
events in the same region, i.e., 2010 Canterbury earthquake
Memory Buoyancy: Simplified Computation
Memory Buoyancy Time Compute: MB(D, t) Time Access Logs t1 t2
Memory Buoyancy: Simplified Computation
Memory Buoyancy Time Compute: MB(D, t) Time Access Logs t1 t2
Memory Buoyancy: Simplified Computation
Memory Buoyancy Time Compute: MB(D, t) Time Access Logs t1 t2
Proposed MB assessment framework:
- Initialize MB values of resources
using a time-decay forgetting function:
- Incrementally update MB using
Random Walk on resource graph:
Memory Buoyancy Assessment
| ' | ) (
) (
t t t
DecayRate r mb
r
e2
Edfringe photo (2011) Photos @ iPhone
e3
Folder @ computer
e1
Shortcut folder @ desktop
e4 e6
Photo @ ForgetIT Meeting (2013)
contains contains contains hasSamePlace hasSamePlace e5 hasEntity
Whiskey photo (2012)
2 ) ( 1 ) ( 2 1 ) (
4 ) ( 6 ) ( ) 1 (
e mb e mb r mb
t Dash t Dash t Dash
Averaged value over two inlinked resources Less propagation account for two outlinks
hasSamePlace e5
Whiskey Tour (2009)
hasSamePlace
Social Web apps gain popularity Personal Web archives Study: Identifying memorable content
- 20 participants, 15 male and 5 female
- Rate (3,330) posts by relevance for future
Content Retention in Social Web Applications
Year in Review: photo from the Internet
Machine learning techniques
- Support vector machine, Bayesian network, and decision tree (J48)
80 features from categories:
- Content types + meta data
- Social interactions
- Temporal
- Privacy
- Graph
Correlation-based feature selection (CFS)
- Temporal: highest impact features
- Graph: low impact for memorable posts
Learning to Classify Memorable Content
Classification results:
- Baseline Features (CS): No. of likes, comments, and shares
- Baseline 69% (F-Measure)
- Top 9 features 79% (F-Measure)
Classification Results
Classification results:
- Baseline Features (CS): No. of likes, comments, and shares
- Baseline 69% (F-Measure)
- Top 9 features 79% (F-Measure)
Classification Results
1.
- M. Georgescu, D. D. Pham, N. Kanhabua, S. Zerr, S. Siersdorfer and W. Nejdl, Temporal Summarization of
Event-Related Updates in Wikipedia (demo), Proceedings of the 22nd International World Wide Web Conference (WWW'13), May, 2013.
2.
- M. Georgescu, N. Kanhabua, D. Krause, W. Nejdl and S. Siersdorfer, Extracting Event-Related Information from
Article Updates in Wikipedia, Proceedings of the 35th European conference on Advances in Information Retrieval (ECIR'13), March, 2013.
3.
- N. Kanhabua and C. Niederée, Preservation and Forgetting: Friends or Foes?, In the First International
Workshop on Archiving Community Memories (in conjunction with iPRES'2013), September, 2013.
4.
- N. Kanhabua, C. Niederée and W. Siberski, Towards Concise Preservation by Managed Forgetting: Research
Issues and Case Study, Proceedings of the 10th International Conference on Preservation of Digital Objects (iPRES'2013), September, 2013.
5.
- K. D. Naini and I.S. Altingovde, Exploiting Result Diversification Methods for Feature Selection in Learning to
Rank, Proceedings of the 36th European conference on Advances in Information Retrieval (ECIR'2014), April, 2014.
6.
- A. Ceroni and M. Fisichella, Towards an Entity-based Automatic Event Validation, Proceedings of the 36th
European conference on Advances in Information Retrieval (ECIR'2014), April, 2014.
7.
- T. N. Nguyen and N. Kanhabua, Leveraging Dynamic Query Subtopics for Time-aware Search Result
Diversification, Proceedings of the 36th European conference on Advances in Information Retrieval (ECIR'2014), April, 2014.
8.
- K. D. Naini, R. Kawase, N. Kanhabua and C. Niederée, Characterizing High-impact Features for Content
Retention in Social Web Applications (poster), Proceedings of the 23rd International World Wide Web Conference (WWW'2014), Seoul, Korea, April, 2014.
9.
- T. A. Tran, M. Georgescu, X. Zhu and N. Kanhabua, Ars longa, vita brevis: Analysing the Duration of Trending
Topics in Twitter Using Wikipedia (poster), (To appear) Proceedings of the ACM Web Science 2014 Conference (WebSci'2014), Bloomington, USA, June, 2014.