terms in time and times in context a graph based term
play

Terms in Time and Times in Context: A Graph-based Term-Time Ranking - PowerPoint PPT Presentation

Terms in Time and Times in Context: A Graph-based Term-Time Ranking Model Andreas Spitz, Jannik Str otgen, Thomas B ogel and Michael Gertz Heidelberg University Institute of Computer Science Database Systems Research Group


  1. Terms in Time and Times in Context: A Graph-based Term-Time Ranking Model Andreas Spitz, Jannik Str¨ otgen, Thomas B¨ ogel and Michael Gertz Heidelberg University Institute of Computer Science Database Systems Research Group http://dbs.ifi.uni-heidelberg.de spitz@informatik.uni-heidelberg.de 5th Temporal Web Analytics Workshop Florence, May 18, 2015

  2. Motivation Co-ooccurrence Graphs Term-Ranking Projection Application Summary What happened on June 15, 1215? A simple question. How simple is the answer? Terms in Time and Times in Context Andreas Spitz 1 of 24

  3. Motivation Co-ooccurrence Graphs Term-Ranking Projection Application Summary With structured data: Based on unstructured text data: quite simple much more challenging Terms in Time and Times in Context Andreas Spitz 2 of 24

  4. Motivation Co-ooccurrence Graphs Term-Ranking Projection Application Summary Data Set and Approach A corpus of all English Wikipedia articles: • Only text is considered, no info-boxes • 3 , 079 , 620 documents with time expressions Problem statement, given such a corpus: • Extract and normalize temporal expressions (dates) • Find key terms that best summarize a given date Terms in Time and Times in Context Andreas Spitz 3 of 24

  5. Motivation Co-ooccurrence Graphs Term-Ranking Projection Application Summary Outline Outline of the approach: • Represent date-term co-occurrences efficiently • Extract and normalize temporal expressions (dates) • Extract content words that co-occur with dates • Generate an efficient data structure • Based on this representation • Identify relevant terms for any given date • Identify similar dates for any given date • Example applications Terms in Time and Times in Context Andreas Spitz 4 of 24

  6. Motivation Co-ooccurrence Graphs Term-Ranking Projection Application Summary Extraction of Temporal Expressions • Normalization, e.g., May 18, 2015 → 2015-05-18 • Handling relative temporal expressions, e.g., in May • Considering the document type Source: Str¨ otgen, Gertz Multilingual and Cross-domain Temporal Tagging (2013) Terms in Time and Times in Context Andreas Spitz 5 of 24

  7. Motivation Co-ooccurrence Graphs Term-Ranking Projection Application Summary Coverage of Dates We use a combination of dates of three granularities: • YYYY-MM-DD (day) • YYYY-MM (month) • YYYY (year) Percentage of dates that are included in the data per year 100 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● coverage in % ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 75 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 50 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 25 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 1000 1100 1200 1300 1400 1500 1600 1700 1800 1900 2000 year Terms in Time and Times in Context Andreas Spitz 6 of 24

  8. Motivation Co-ooccurrence Graphs Term-Ranking Projection Application Summary Extraction of Terms and Representation For all sentences s in any Wikipedia document: Terms in Time and Times in Context Andreas Spitz 7 of 24

  9. Motivation Co-ooccurrence Graphs Term-Ranking Projection Application Summary Extraction of Terms and Representation Identify/normalize dates and remove stop words Terms in Time and Times in Context Andreas Spitz 7 of 24

  10. Motivation Co-ooccurrence Graphs Term-Ranking Projection Application Summary Extraction of Terms and Representation Create a bipartite graph G s = ( T s ∪ D s , E s ) with weights ω s Terms in Time and Times in Context Andreas Spitz 7 of 24

  11. Motivation Co-ooccurrence Graphs Term-Ranking Projection Application Summary Extraction of Terms and Representation Satisfy the inclusion condition for dates Terms in Time and Times in Context Andreas Spitz 7 of 24

  12. Motivation Co-ooccurrence Graphs Term-Ranking Projection Application Summary Extraction of Terms and Representation Satisfy the inclusion condition for dates Terms in Time and Times in Context Andreas Spitz 7 of 24

  13. Motivation Co-ooccurrence Graphs Term-Ranking Projection Application Summary Graph aggregation Aggregate the sentence-graphs G s : • T := � T s • D := � D s • E := � E s • ω ( e ) := � ω s ( e ) We obtain G = ( T ∪ D, E, ω ) with: • | T | = 3 , 748 , 730 terms • | D | = 210 , 375 dates • | E | = 110 , 639 , 525 edges Terms in Time and Times in Context Andreas Spitz 8 of 24

  14. Motivation Co-ooccurrence Graphs Term-Ranking Projection Application Summary Formalising the Question What happened on June 15, 1215? Which terms in the graph co-occur in a significant manner with the date 1215-06-15? Terms in Time and Times in Context Andreas Spitz 9 of 24

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend