Enrichir des vidos d'actualits par la cration d'instantans - - PowerPoint PPT Presentation

enrichir des vid os d actualit s par la cr ation d
SMART_READER_LITE
LIVE PREVIEW

Enrichir des vidos d'actualits par la cration d'instantans - - PowerPoint PPT Presentation

Enrichir des vidos d'actualits par la cration d'instantans smantiques et contextualiss Raphael Troncy <raphael.troncy@eurecom.fr> Multimedia Semantics, EURECOM @rtroncy @peputo The Use Case: Contextualizing News Edward


slide-1
SLIDE 1

Enrichir des vidéos d'actualités par la création d'instantanés sémantiques et contextualisés

Raphael Troncy <raphael.troncy@eurecom.fr> Multimedia Semantics, EURECOM @rtroncy @peputo

slide-2
SLIDE 2

The Use Case: Contextualizing News

http://www.bbc.com/news/world-europe-23339199#t=34.1,39.8

(Media Fragment URI 1.0)

Edward Snowden

(NE over Subtitles) Sarah Harrison

WikiLeaks Editor Airport in Moscow Sheremetyevo

14/03/2016 - Computational Journalism - Rennes

  • 2
slide-3
SLIDE 3

Going deep down…

It is always challenging

What is on top:

Entities explicitly appearing in the documents

Laura Poitras Anatoly Kucherena Edward Snowden

The News Semantic Snapshot (NSS)

14/03/2016 - Computational Journalism - Rennes

  • 3
slide-4
SLIDE 4

NSS for Feeding Second Screen Applications

News Semantic Snapshot (NSS)

[Redondo_ICWE’15]

14/03/2016 - Computational Journalism - Rennes

  • 4
slide-5
SLIDE 5

The News Semantic Snapshot: Gold Standard

◎ High Level of detail, significant human Intervention:

(Experts in the news domain + users)

◎ Entities in 5 Dimensions: (Visual & Text)

(1) Video Subtitles (2) Image in the video (4) Suggestions of an expert (5) Related articles USER SURVEY “We don't have any extradition treaty with Russia. Broadly speaking our policy remains the same: that we'd like him returned (3) Text in the video image (2) (3) (1) [Romero_TVX’14]

14/03/2016 - Computational Journalism - Rennes

  • 5
slide-6
SLIDE 6

The News Semantic Snapshot: Gold Standard

Play with the data and help us to extend it at: https://github.com/jluisred/NewsConceptExpansi

  • n/wiki/Golden-Standard-Creation

25

14/03/2016 - Computational Journalism - Rennes

  • 6
slide-7
SLIDE 7

b) Expanded Entities a) Entities from Seed Document DS

Generating the NSS: General Method

[Redondo_SNOW’14] (2) c) News Semantic Snapshot

14/03/2016 - Computational Journalism - Rennes

  • 7

Other documents similar to DS

slide-8
SLIDE 8

1 ontology http://nerd.eurecom.fr/ontology 2 API http://nerd.eurecom.fr/api/application.wadl 3 UI http://nerd.eurecom.fr

Named Entity Recognition

nerd:Product S-Bahn nerd:Person Obama nerd:Person Michelle nerd:Location Berlin

http://data.linkedtv.eu/media/e2899e7f#t=840,900

https://github.com/giusepperizzo/nerdml

ML

[Rizzo_LREC’14]

14/03/2016 - Computational Journalism - Rennes

  • 8
slide-9
SLIDE 9

Generating the NSS: Expansion’s Settings

Query:

  • Title
  • 5 W’s over Subtitles Entities

Web sites to be crawled:

  • Google
  • L1 : A set of 10 internationals

English speaking newspapers

  • L2 : A set of 3 international

newspapers used in GS

Temporal Window:

  • 1W:
  • 2W:

Annotation filtering

  • Schema.org

[Redondo_ICWE’15]

Parameters:

14/03/2016 - Computational Journalism - Rennes

  • 9

Available @ http://linkedtv.eurecom.fr/entitycontext/api/

slide-10
SLIDE 10

b) Expanded Entities a) Entities from Seed Document DS

Generating the NSS: Expansion Results

[Redondo_SNOW’14] (2) c) News Semantic Snapshot Recall (E. Expansion) = 0.91 Recall (NER on Subtitles) = 0.42

14/03/2016 - Computational Journalism - Rennes

  • 10
slide-11
SLIDE 11

Generating the NSS: The Selection problem

(NSS)

N

FIdeal(ei) (NSS) FX(ei)

=?

Expansion

14/03/2016 - Computational Journalism - Rennes

  • 11
slide-12
SLIDE 12

Generating the NSS: Measures

1 Precision / Recall @ N

  • Popular
  • Easy to interpret

2 Mean Normalized Discounted Cumulative Gain (MNDCG) @ N:

  • Considers ranking
  • Relevant documents at the top positions

3 Compactness for Recall R:

  • Compromise between: Recall and NSS size

14/03/2016 - Computational Journalism - Rennes

  • 12
slide-13
SLIDE 13

Generating the NSS: Compactness Example

Recall: 22/33 = 0.66 Sa = 27 Sb = 33 Sc = 54

Sa = 27 Sb = 33 Sc= 54

(NSS)

A B C A B C > >

14/03/2016 - Computational Journalism - Rennes

  • 13
slide-14
SLIDE 14

Generating the NSS: The Approaches

1 Frequency-Based Ranking

  • Leverages on biggest sample provided by expansion
  • Prioritizes representativeness

2 Multidimensional Entity Relevance Ranking

  • Relevancy of entities is ground on different

dimensions

3 Concentric Based Approach

  • Core / Crust model
  • Alleviates the problem of dealing with many

dimensions

[Redondo_SNOW’14] [Redondo_ICWE’15] [Redondo_KCAP’15A]

14/03/2016 - Computational Journalism - Rennes

  • 14
slide-15
SLIDE 15

Generating the NSS: (1) Frequency-Based

[Redondo_SNOW’14] A

14/03/2016 - Computational Journalism - Rennes

  • 15
slide-16
SLIDE 16

Generating the NSS: (2) Multidimensional

[Redondo_ICWE2015]

14/03/2016 - Computational Journalism - Rennes

  • 16
slide-17
SLIDE 17

Generating the NSS: (2) Multidimensional

POPULARITY (FPOP) EXPERT RULES (FEXP)

17

  • Based on Google Trends
  • w = 2 months
  • μ + 2*σ (2.5%)

Example:

  • [ Location, = 0.43]
  • [ Person, = 0.78]
  • [ Organization, = 0.95 ]
  • [ < 2 , = 0.0 ]

14/03/2016 - Computational Journalism - Rennes

  • 17
slide-18
SLIDE 18

Experiment 1: Frequency vs Multidimensional

20 x 4 x 4 =

320 formulas

14/03/2016 - Computational Journalism - Rennes

  • 18
slide-19
SLIDE 19

Experiment 1: Frequency vs Multidimensional

◎ News Entity Expansion & Dimensions  Generate NSS ◎ Frequency-based score: 0.473 MNDCG @ 10 ◎ Best score: 0.698 MNDCG @ 10

  • Collection:
  • CSE (Google + 2W + Schema.org)
  • Ranking:
  • Expert Rules
  • Popularity

Multidimensional Nature of the NSS

14/03/2016 - Computational Journalism - Rennes

  • 19
slide-20
SLIDE 20

Experiment 1: Frequency vs Multidimensional

(NSS)

FREQ

(NSS)

F(Laura Poitras) = 2 F(Glenn Greenwald) = 1

14/03/2016 - Computational Journalism - Rennes

  • 20
slide-21
SLIDE 21

Experiment 1: Frequency vs Multidimensional

(NSS)

(Expansion) FREQ POP EXP +

+

=

(NSS)

14/03/2016 - Computational Journalism - Rennes

  • 21
slide-22
SLIDE 22

Experiment 2: Multidimensional ++

  • 1. Exploit Google relevance (+1.80%)
  • 2. Promote subtitle entities (+2.50%)
  • 3. Exploit named entity extractor’s confidence

(+0.20%)

  • 4. Interpret popularity dimension (+1.40%)
  • 5. Performing clustering before filtering (-0.60%)
  • NO SIGNIFICANT IMPROVEMENT -

NMDCG @ 10:

14/03/2016 - Computational Journalism - Rennes

  • 22
slide-23
SLIDE 23

Experiment 2: Multidimensional ++

Tune Function X

FREQ POP EXP

Re-Shuffle Original

(NSS)

14/03/2016 - Computational Journalism - Rennes

  • 23
slide-24
SLIDE 24

Re-thinking the problem: measures

MNDCG:

  • Too focused on success at first positions (decay Function)
  • NSS intends to be flexible, ranking is application-dependent

COMPACTNESS:

  • Prioritizes coverage over ranking while minimizing NSS size

14/03/2016 - Computational Journalism - Rennes

  • 24
slide-25
SLIDE 25

Re-thinking the problem: dimensions

Duality in news entity spectrum:

  • Representative entities:
  • Driving the plot of the story
  • Relevant entities
  • Related to former via specific reasons
  • Exploit the entity semantic relations

Unexpected?

14/03/2016 - Computational Journalism - Rennes

  • 25
slide-26
SLIDE 26

Generating the NSS: (3) Concentric Approach

◎Core

  • Representative entities
  • Spottable via frequency

dimensions

  • High degree of

cohesiveness

◎Crust

  • Attached to the Core via

semantic relations

  • Agnostic to relevancy

nature: informativeness, interestingness, etc.

[Redondo_KCAP2015A]

14/03/2016 - Computational Journalism - Rennes

  • 26
slide-27
SLIDE 27

Generating the NSS: (3) Core Creation

a) Spot representative entities: Frequency Dimension (NSS) b) Cohesiveness (DBpedia)

14/03/2016 - Computational Journalism - Rennes

  • 27
slide-28
SLIDE 28

Generating the NSS: (3) Crust Creation

The number of Web documents talking simultaneously about a particular entity e and the Core:

?

14/03/2016 - Computational Journalism - Rennes

  • 28
slide-29
SLIDE 29

Experiment 3: Multidimensional vs Concentric

  • 1. Entity Frequency

○ Core1: Jaro-Winkler > 0.9 ○ Core2: Frequency based on Exact String matching

  • 2. Cohesiveness:

○ Everything is Connected Engine, Skb(e1, e2) > 0.125

Everything is Connected Engine: https://github.com/mmlab/eice

Concentric Core:

14/03/2016 - Computational Journalism - Rennes

  • 29
slide-30
SLIDE 30

Experiment 3: Multidimensional vs Concentric

  • 1. Candidates for CRUST generation:

○ Ex1: 1° ICWE2015 by R*(50): L2+Google, F3 1W, Gauss+ POP ○ Ex2: 2° ICWE 2015 by R*(50): L2+Google, F3 1W, Freq + POP

  • 2. Function for attaching entities to CORE:

○ SWEB(ei, Core) over Google CSE, default configuration

Concentric Crust:

14/03/2016 - Computational Journalism - Rennes

  • 30
slide-31
SLIDE 31

Experiment 3: Multidimensional vs Concentric

Combining CORE and CRUST:

Core+Crust CrustOnly

14/03/2016 - Computational Journalism - Rennes

  • 31
slide-32
SLIDE 32

Experiment 3: Multidimensional vs Concentric

36.9% more compact than Multidimensional (NSS’s size decrease)

IdealGT: size of SSN according to Gold Standard (2*2*2 + 2) Runs

14/03/2016 - Computational Journalism - Rennes

  • 32
slide-33
SLIDE 33

Experiment 3: Multidimensional vs Concentric

NSS Gold Standard

Fukushima Disaster 2013

n=22

14/03/2016 - Computational Journalism - Rennes

  • 33
slide-34
SLIDE 34

Multidimensio nal Concentric

Experiment 3: Multidimensional vs Concentric

14/03/2016 - Computational Journalism - Rennes

  • 34
slide-35
SLIDE 35

NSS: Suitable model for news applications ?

14/03/2016 - Computational Journalism - Rennes

  • 35
slide-36
SLIDE 36

NSS Consumption: News Prototypes

… short summaries, previews, hotspots … … advanced graphs and diagrams, timelines, in- depth summaries … … second screen apps, slideshows, info-boxes …

14/03/2016 - Computational Journalism - Rennes

  • 36
slide-37
SLIDE 37

NSS Consumption: Consumptions Phases

The Before The During The After

14/03/2016 - Computational Journalism - Rennes

  • 37
slide-38
SLIDE 38

NSS Consumption: Phases VS Layers

[Redondo_KCAP’15B]

14/03/2016 - Computational Journalism - Rennes

  • 38
slide-39
SLIDE 39

Conclusions

  • a. Proposed the NSS model and a Gold Standard
  • b. The multidimensional nature of the entity relevance
  • Gaussian function, popularity, experts rules…
  • c. Concentric model better reproduces the NSS:
  • Better Compactness: 36.9% over BAS01 (similar recall, smaller size)
  • Core/Crust brings up relevant entities without having to deal with

fuzzy dimensions

  • d. NSS better supports the news consumption phases:

(Before, During, After)

14/03/2016 - Computational Journalism - Rennes

  • 39
slide-40
SLIDE 40

Future Work

  • [S] Publish generated NSS on the Web (Linked Data)
  • [S] Extend the Gold Standard:
  • From 5 to 23 videos, concentric based model for candidate selection
  • Submission to TOIS
  • [S] Not depending on “big players” for retrieving knowledge

during the expansion phase (Terrier VS Google experiments)

14/03/2016 - Computational Journalism - Rennes

  • 40
slide-41
SLIDE 41

Future Work

  • [L] Spot not only the strength of the relationships

between Crust and the Core, but also the predicates

Editor in WikiLeaks Generating Explanations analyzing documents considered in Sweb

14/03/2016 - Computational Journalism - Rennes

  • 41
slide-42
SLIDE 42

Credits

14/03/2016 - Computational Journalism - Rennes

  • 42

http://jluisred.github.io/ http://giusepperizzo.github.io/