temporal information extraction
play

Temporal Information Extraction Vinay Setty Jannik Strtgen - PowerPoint PPT Presentation

Advanced Topics in Information Retrieval Temporal Information Extraction Vinay Setty Jannik Strtgen vsetty@mpi-inf.mpg.de jannik.stroetgen@mpi-inf.mpg.de ATIR June 16, 2016 Motivation Time Temporal Tagging Evaluation HeidelTime


  1. Advanced Topics in Information Retrieval Temporal Information Extraction Vinay Setty Jannik Strötgen vsetty@mpi-inf.mpg.de jannik.stroetgen@mpi-inf.mpg.de ATIR – June 16, 2016

  2. Motivation Time Temporal Tagging Evaluation HeidelTime Temponym Tagging Pipelines Why is temporal information crucial for information retrieval ? � Jannik Strötgen – ATIR-07 c 2 / 84

  3. Motivation Time Temporal Tagging Evaluation HeidelTime Temponym Tagging Pipelines Time in queries temporal information needs are frequent query log analyses 1.5% queries with explicit temporal intent [Nunes et al. 2008] 7% queries with implicit temporal intent [Metzler et al. 2009] 13.8% explicit , 17.1% implicit [Zhang et al. 2010] different types of temporal information in IR time as dimension of relevance more next week time as query topic � Jannik Strötgen – ATIR-07 c 3 / 84

  4. Motivation Time Temporal Tagging Evaluation HeidelTime Temponym Tagging Pipelines Gedankenexperiment What did Alexander von Humboldt do between late 18th century and early 19th century in Latein America? � Jannik Strötgen – ATIR-07 c 4 / 84

  5. Motivation Time Temporal Tagging Evaluation HeidelTime Temponym Tagging Pipelines Let’s search ... Snippets tell us a lot... � Jannik Strötgen – ATIR-07 c 5 / 84

  6. Motivation Time Temporal Tagging Evaluation HeidelTime Temponym Tagging Pipelines Let’s search ... highlighted: terms occurring in the query � Jannik Strötgen – ATIR-07 c 5 / 84

  7. Motivation Time Temporal Tagging Evaluation HeidelTime Temponym Tagging Pipelines Let’s search ... not highlighted: expressions matching query interval / region � Jannik Strötgen – ATIR-07 c 5 / 84

  8. Motivation Time Temporal Tagging Evaluation HeidelTime Temponym Tagging Pipelines Improved snippets expressions matching query interval / region Excerpt of the Wikipedia page Alexander von Humboldt . � Jannik Strötgen – ATIR-07 c 7 / 84

  9. Motivation Time Temporal Tagging Evaluation HeidelTime Temponym Tagging Pipelines Problems of standard IR approaches temporal and geographic expressions (seem to be) treated as regular terms semantics is lost → should be extracted and normalized query functionality how to search for time intervals? how to search for geographic regions? → should be defined and provided results same ranking as for standard text search no time-/geo-centric exploration features → special ranking is required → time-/geo-centric exploration should be possible � Jannik Strötgen – ATIR-07 c 8 / 84

  10. Motivation Time Temporal Tagging Evaluation HeidelTime Temponym Tagging Pipelines Things that need to be done next week temporal information retrieval today temporal information extraction maybe later geographic and event-centric information retrieval � Jannik Strötgen – ATIR-07 c 9 / 84

  11. Motivation Time Temporal Tagging Evaluation HeidelTime Temponym Tagging Pipelines Outline Temporal Information 1 Temporal Tagging 2 Evaluation 3 HeidelTime 4 Temponym tagging 5 NLP Pipeline Architectures 6 � Jannik Strötgen – ATIR-07 c 10 / 84

  12. Motivation Time Temporal Tagging Evaluation HeidelTime Temponym Tagging Pipelines Outline Temporal Information 1 Temporal Tagging 2 Evaluation 3 HeidelTime 4 Temponym tagging 5 NLP Pipeline Architectures 6 � Jannik Strötgen – ATIR-07 c 11 / 84

  13. Motivation Time Temporal Tagging Evaluation HeidelTime Temponym Tagging Pipelines Is time that important? temporal information plays an important role in many types of text documents News articles. Narrative documents. Biographies. � Jannik Strötgen – ATIR-07 c 12 / 84

  14. Motivation Time Temporal Tagging Evaluation HeidelTime Temponym Tagging Pipelines Is time that important? temporal information has important key characteristics Temporal information is well-defined : expressions can be compared with each other Examples: Allen’s interval algebra before: 2010 / 2016 [Allen 1983] overlap: 1960s / 1955 to 1965 Given two intervals X and Y, one during: June 2016 / 2016 of 13 relations holds between them ... � Jannik Strötgen – ATIR-07 c 13 / 84

  15. Motivation Time Temporal Tagging Evaluation HeidelTime Temponym Tagging Pipelines Is time that important? temporal information has important key characteristics Temporal information is well-defined : expressions can be compared with each other 1) X before Y XXX YYY XXX XXX 4) X overlaps Y 6) X starts Y YYY YYYYYY XXX 2) X equal Y YYY XXX XXX 5) X during Y 7) X finishes Y 3) X meets Y XXX YYY YYYYYY YYYYYY Source: [Strötgen & Gertz 2016] � Jannik Strötgen – ATIR-07 c 14 / 84

  16. Motivation Time Temporal Tagging Evaluation HeidelTime Temponym Tagging Pipelines Is time that important? temporal information has important key characteristics Temporal information can be normalized : expressions with same semantics → same value Examples: TimeML TIMEX3 tags, June 16, 2016 value attribute today YYYY-MM-DD“T”HH:mm heute, aujourd’hui, hoy, oggi, ... e.g., 2016-06-16T14:33 → 2016-06-16 → Temporal information is term- and language-independent � Jannik Strötgen – ATIR-07 c 15 / 84

  17. Motivation Time Temporal Tagging Evaluation HeidelTime Temponym Tagging Pipelines Is time that important? temporal information has important key characteristics Temporal information can be normalized : expressions with same semantics → same value 2015-10-12 t today one month ago tomorrow heute October 12, 2015 last Monday hoy t ref 2015-10-11 2015-10-12 2015-10-15 2015-11-12 Source: [Strötgen & Gertz 2016] � Jannik Strötgen – ATIR-07 c 16 / 84

  18. Motivation Time Temporal Tagging Evaluation HeidelTime Temponym Tagging Pipelines Is time that important? temporal information has important key characteristics Temporal information can be organized hierarchically : expressions of different granularities ... 2014 2015 2016 ... 2015-03 2015-04 ... 2015-03-11 2015-03-12 ... � Jannik Strötgen – ATIR-07 c 17 / 84

  19. Motivation Time Temporal Tagging Evaluation HeidelTime Temponym Tagging Pipelines Is time that important? 1970s 1980s 1990s 2000s 2010s t decade 1990 1991 1992 1993 1999 t year 1992-Q1 1992-Q2 1992-Q3 1992-Q4 1993-Q1 t quarter 1992-06 1992-07 1992-08 1992-09 1992-10 t month 1992-08-01 1992-08-02 1992-08-03 1992-08-04 1992-08-31 t day 1992-08-03T00 1992-08-03T01 1992-08-03T02 1992-08-03T03 1992-08-03T23 t hour Source: [Strötgen & Gertz 2016] points in time on timelines of different granularities � Jannik Strötgen – ATIR-07 c 18 / 84

  20. Motivation Time Temporal Tagging Evaluation HeidelTime Temponym Tagging Pipelines Temporal Tagging temporal expressions a special type of “named entity” extraction sometimes covered by NER tools intuitively: normalization is very important temporal tagging extraction and normalization of temporal expressions � Jannik Strötgen – ATIR-07 c 19 / 84

  21. Motivation Time Temporal Tagging Evaluation HeidelTime Temponym Tagging Pipelines Outline Temporal Information 1 Temporal Tagging 2 Evaluation 3 HeidelTime 4 Temponym tagging 5 NLP Pipeline Architectures 6 � Jannik Strötgen – ATIR-07 c 20 / 84

  22. Motivation Time Temporal Tagging Evaluation HeidelTime Temponym Tagging Pipelines Temporal Tagging the two tasks of temporal taggers 1. extraction of temporal expressions main challenge ambiguities , e.g., may, march, fall � Jannik Strötgen – ATIR-07 c 21 / 84

  23. Motivation Time Temporal Tagging Evaluation HeidelTime Temponym Tagging Pipelines Temporal Tagging the two tasks of temporal taggers 1. extraction of temporal expressions 2. normalization of temporal expressions tonight → 2011-09-20TNI yesterday → 2011-09-19 next week → 2011-W39 Sept. 20, 2011 → 2011-09-20 next month → 2011-10 main challenge normalization of relative and underspecified expressions � Jannik Strötgen – ATIR-07 c 21 / 84

  24. Motivation Time Temporal Tagging Evaluation HeidelTime Temponym Tagging Pipelines Temporal Expressions different types of temporal expressions temporal markup language TimeML defines four types: [Pustejovsky et al. 2005] ( http://timeml.org/ ) Dates Durations → June 24, 2013 → two weeks → September 2000 → 12.5 hours → two weeks ago → several months Times Sets → 3 p.m. → every day → yesterday morning → annually → 2012-06-28T16:25 → twice a month dates and times particularly valuable for IR � Jannik Strötgen – ATIR-07 c 22 / 84

  25. Motivation Time Temporal Tagging Evaluation HeidelTime Temponym Tagging Pipelines Temporal Expressions different realizations of temporal expressions explicit relative → June 24, 2013 → two weeks ago → the 20th century → yesterday → easy to normalize → reference time implicit underspecified → Christmas 2012 → Monday → Columbus Day 2006 → June 24 → additional knowledge → reference time and relation to it main challenge for temporal taggers normalization of relative and underspecified expressions � Jannik Strötgen – ATIR-07 c 23 / 84

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend