Exploiting FrameNet for Content-Based Book Recommendation Orph ee - - PowerPoint PPT Presentation

exploiting framenet for content based book recommendation
SMART_READER_LITE
LIVE PREVIEW

Exploiting FrameNet for Content-Based Book Recommendation Orph ee - - PowerPoint PPT Presentation

Exploiting FrameNet for Content-Based Book Recommendation Orph ee De Clercq , Michael Schuhmacher, Simone Paolo Ponzetto and V eronique Hoste LT 3 , Language and Translation Technology Team Ghent University orphee.declercq@ugent.be


slide-1
SLIDE 1

Exploiting FrameNet for Content-Based Book Recommendation

Orph´ ee De Clercq, Michael Schuhmacher, Simone Paolo Ponzetto and V´ eronique Hoste

LT3, Language and Translation Technology Team Ghent University

  • rphee.declercq@ugent.be

CBRecSys, 6 October 2014

De Clercq et al. (LT3) CBRecSys 10-06-2014 1 / 26

slide-2
SLIDE 2

Tastes differ

De Clercq et al. (LT3) CBRecSys 10-06-2014 2 / 26

slide-3
SLIDE 3

Tastes differ

De Clercq et al. (LT3) CBRecSys 10-06-2014 2 / 26

slide-4
SLIDE 4

Tastes differ

De Clercq et al. (LT3) CBRecSys 10-06-2014 3 / 26

slide-5
SLIDE 5

Tastes differ

De Clercq et al. (LT3) CBRecSys 10-06-2014 3 / 26

slide-6
SLIDE 6

Content-Based Book Recommendation

Could techniques developed for NLP help? Books

  • Chronological sequence
  • Actions and Events
  • Infer more based on actual content = plot

De Clercq et al. (LT3) CBRecSys 10-06-2014 4 / 26

slide-7
SLIDE 7

Content-Based Book Recommendation

Could techniques developed for NLP help? Books

  • Chronological sequence
  • Actions and Events
  • Infer more based on actual content = plot

Semantic text processing

  • Semantically enriched text features
  • Text-external = Linked Open Data (LOD)
  • Text-internal = semantic frame parsing

De Clercq et al. (LT3) CBRecSys 10-06-2014 4 / 26

slide-8
SLIDE 8

Content-Based Book Recommendation

Could techniques developed for NLP help? Books

  • Chronological sequence
  • Actions and Events
  • Infer more based on actual content = plot

Semantic text processing

  • Semantically enriched text features
  • Text-external = Linked Open Data (LOD)
  • Text-internal = semantic frame parsing

→ Add and combine semantic information

De Clercq et al. (LT3) CBRecSys 10-06-2014 4 / 26

slide-9
SLIDE 9

Overview

1 Frame-Enhancement 2 Experiments 3 Results 4 Conclusion

De Clercq et al. (LT3) CBRecSys 10-06-2014 5 / 26

slide-10
SLIDE 10

Frame-Enhancement

Frames

Frame semantics (Filmore, 1982) Describe the meaning of a sentence by characterizing the background knowledge required to understand it.

De Clercq et al. (LT3) CBRecSys 10-06-2014 6 / 26

slide-11
SLIDE 11

Frame-Enhancement

Frames

Frame semantics (Filmore, 1982) Describe the meaning of a sentence by characterizing the background knowledge required to understand it.

Frame: KILLING The KILLER or CAUSE causes the death of the VICTIM. FEs KILLER John drawned Martha. VICTIM I saw heretics beheaded. CAUSE The rockslide killed nearly half of the climbers. INSTRUMENT It’s difficult to suicide with only a pocketknife. LUs ..., kill.v, killer.n, killing.n, lethal.a, liquidate.v, liqui- dation.n, liquidator.n, lynch.v, massacre.n,massacre.v, matricide.n, murder.n, murder.v, murderer.n,... De Clercq et al. (LT3) CBRecSys 10-06-2014 6 / 26

slide-12
SLIDE 12

Frame-Enhancement

Frames

Frame semantics (Filmore, 1982) Describe the meaning of a sentence by characterizing the background knowledge required to understand it.

Frame: KILLING The KILLER or CAUSE causes the death of the VICTIM. FEs KILLER John drawned Martha. VICTIM I saw heretics beheaded. CAUSE The rockslide killed nearly half of the climbers. INSTRUMENT It’s difficult to suicide with only a pocketknife. LUs ..., kill.v, killer.n, killing.n, lethal.a, liquidate.v, liqui- dation.n, liquidator.n, lynch.v, massacre.n,massacre.v, matricide.n, murder.n, murder.v, murderer.n,...

FrameNet (Filmore et al., 2003)

  • Lexical units (LUs) evoking the frame
  • Semantic roles or frame elements (FEs)
  • Release 1.5 = 877 frames and 155K sentences

De Clercq et al. (LT3) CBRecSys 10-06-2014 6 / 26

slide-13
SLIDE 13

Frame-Enhancement

Relations

Hierarchy (Ruppenhofer et al., 2005)

  • Inheritance or is-a like relation
  • Child frame is a subtype of the parent frame
  • Additional semantic properties

De Clercq et al. (LT3) CBRecSys 10-06-2014 7 / 26

slide-14
SLIDE 14

Frame-Enhancement

Relations

Hierarchy (Ruppenhofer et al., 2005)

  • Inheritance or is-a like relation
  • Child frame is a subtype of the parent frame
  • Additional semantic properties

De Clercq et al. (LT3) CBRecSys 10-06-2014 7 / 26

slide-15
SLIDE 15

Frame-Enhancement

Book Dataset

ESWC challenge

  • LibraryThing dataset
  • Mapped to DBpedia URIs

De Clercq et al. (LT3) CBRecSys 10-06-2014 8 / 26

slide-16
SLIDE 16

Frame-Enhancement

Book Dataset

ESWC challenge

  • LibraryThing dataset
  • Mapped to DBpedia URIs

Dataset

  • Downloaded all plots from Wikipedia
  • Uniform/unambiguous link
  • 5,063 books with plot

De Clercq et al. (LT3) CBRecSys 10-06-2014 8 / 26

slide-17
SLIDE 17

Frame-Enhancement

Book Dataset

DBpedia mapped dataset

De Clercq et al. (LT3) CBRecSys 10-06-2014 9 / 26

slide-18
SLIDE 18

Frame-Enhancement

Book Dataset

DBpedia mapped dataset Wikipedia page

De Clercq et al. (LT3) CBRecSys 10-06-2014 9 / 26

slide-19
SLIDE 19

Frame-Enhancement

Frame parsing

SEMAFOR (Dipanjan et al., 2014)

  • state-of-the-art parser
  • sentence-per-sentence basis

De Clercq et al. (LT3) CBRecSys 10-06-2014 10 / 26

slide-20
SLIDE 20

Frame-Enhancement

Frame parsing

SEMAFOR (Dipanjan et al., 2014)

  • state-of-the-art parser
  • sentence-per-sentence basis

PLOT The [Prince], the protagonist, is [named] Alexander. His [father], [Prince] Baudouin, is [murdered] by the [King] of Cornwall, [King] [March]. [When] Alexander [comes] of [age], he [sets out] to Camelot to [seek] justice from [King] Arthur and to [avenge] his [father]’s [death]... FRAMES Leadership, Appointing, Kinship, Leadership, Killing, Leadership, Leadership, Calendric unit, Temporal collocation,Arriving, Calendric unit, Departing,Seeking to achieve, Leadership, Revenge, Death, Kinship.

De Clercq et al. (LT3) CBRecSys 10-06-2014 10 / 26

slide-21
SLIDE 21

Frame-Enhancement

Frame parsing

PLOT The [Prince], the protagonist, is [named] Alexander. His [father], [Prince] Baudouin, is [murdered] by the [King] of Cornwall, [King] [March]. [When] Alexander [comes] of [age], he [sets out] to Camelot to [seek] justice from [King] Arthur and to [avenge] his [father]’s [death]... FRAMES Leadership, Appointing, Kinship, Leadership, Killing, Leadership, Leadership, Calendric unit, Temporal collocation,Arriving, Calendric unit, Departing,Seeking to achieve, Leadership, Revenge, Death, Kinship.

De Clercq et al. (LT3) CBRecSys 10-06-2014 11 / 26

slide-22
SLIDE 22

Frame-Enhancement

Frame parsing

PLOT The [Prince], the protagonist, is [named] Alexander. His [father], [Prince] Baudouin, is [murdered] by the [King] of Cornwall, [King] [March]. [When] Alexander [comes] of [age], he [sets out] to Camelot to [seek] justice from [King] Arthur and to [avenge] his [father]’s [death]... FRAMES Leadership, Appointing, Kinship, Leadership, Killing, Leadership, Leadership, Calendric unit, Temporal collocation,Arriving, Calendric unit, Departing,Seeking to achieve, Leadership, Revenge, Death, Kinship.

Revenge story → also royalty, family ,...

# Avg Stdev # Avg unique Stdev Frames 197 205 96 61 Events 42 45 22 15 De Clercq et al. (LT3) CBRecSys 10-06-2014 11 / 26

slide-23
SLIDE 23

Frame-Enhancement

Frames vs LOD

Manual genre subdivision (Schuhmacher and Meilicke, 2014)

  • Parsing the abstract, genre and subject (DBpedia)
  • 30 distinct genres

De Clercq et al. (LT3) CBRecSys 10-06-2014 12 / 26

slide-24
SLIDE 24

Frame-Enhancement

Frames vs LOD

Manual genre subdivision (Schuhmacher and Meilicke, 2014)

  • Parsing the abstract, genre and subject (DBpedia)
  • 30 distinct genres

De Clercq et al. (LT3) CBRecSys 10-06-2014 12 / 26

slide-25
SLIDE 25

Frame-Enhancement

Frames vs LOD

Genre classification

  • Consider frames as features
  • Calculate gain ratio

De Clercq et al. (LT3) CBRecSys 10-06-2014 13 / 26

slide-26
SLIDE 26

Frame-Enhancement

Frames vs LOD

Genre classification

  • Consider frames as features
  • Calculate gain ratio

Science Fiction History Children Crime Beyond compare Representing Memorization Extradition Becoming dry Intentional traversing Measure area Go into shape Containment relation Dominate competitor Estimated value Exporting Dunking Getting vehicle underway Rope manipulation Becoming dry Exclude member Cause to rot Degree of processing Arson Representing Beyond compare Jury deliberation Measure area Jury deliberation Probability Bond maturation Dominate competitor Cause to rot Jury deliberation Intentional traversing Containment relation Medium Color qualities Cause to be dry Reading aloud Cause change of phase Get a job Drop in on Extreme point Change of consistency Eventive affecting Intentionally affect Endangering Immobilization Historic event Examination Posing as Execute plan Extradition Absorb heat Experience bodily harm Cause impact Surrendering possession Cause to experience Enforcing Reparation Corroding caused Fighting activity Cause to be wet Eventive affecting Dodging Dodging Intentionally affect Get a job Clemency Rope manipulation Intercepting Cause to be sharp Intentional traversing Intentional traversing Change resistance Cause to rot Cause to rot Drop in on Go into shape Cause change of phase Get a job Cause to be dry Extradition De Clercq et al. (LT3) CBRecSys 10-06-2014 13 / 26

slide-27
SLIDE 27

Experiments

Overview

1 Frame-Enhancement 2 Experiments 3 Results 4 Conclusion

De Clercq et al. (LT3) CBRecSys 10-06-2014 14 / 26

slide-28
SLIDE 28

Experiments

Experimental Setup

System

  • ESWC System (Schuhmacher and Meilicke, 2014)
  • two Naive Bayes classifiers

1 Global classifier (background model) 2 Per-user-neighborhood classifier (individual

preferences)

→ Vary the features for item representation

De Clercq et al. (LT3) CBRecSys 10-06-2014 15 / 26

slide-29
SLIDE 29

Experiments

Experimental Setup

System

  • ESWC System (Schuhmacher and Meilicke, 2014)
  • two Naive Bayes classifiers

1 Global classifier (background model) 2 Per-user-neighborhood classifier (individual

preferences)

→ Vary the features for item representation Datasets

  • Training data = 53665 user-item-rating triples
  • Evaluation data = 50654 triples

De Clercq et al. (LT3) CBRecSys 10-06-2014 15 / 26

slide-30
SLIDE 30

Experiments

Semantic Features

Baselines

  • Majority class (0)
  • Bag-of-words: plot’s token unigrams

De Clercq et al. (LT3) CBRecSys 10-06-2014 16 / 26

slide-31
SLIDE 31

Experiments

Semantic Features

Baselines

  • Majority class (0)
  • Bag-of-words: plot’s token unigrams

Frame-related

1 Frames

  • Frames as such (e.g. Killing, Kinship) → 877 unique
  • Lexical units
  • Frame Elements, semantic roles
  • Combinations

2 Events

  • Events as such → 234 unique
  • Lexical units
  • Frame Elements
  • Combinations

De Clercq et al. (LT3) CBRecSys 10-06-2014 16 / 26

slide-32
SLIDE 32

Experiments

Semantic Features

3 Taxonomy

  • Frame one up (frame + direct parent)
  • Frames bottom (leafs, bottom children)
  • Frames with LCS (Resnik, 1995), filtered out too generic

parents

De Clercq et al. (LT3) CBRecSys 10-06-2014 17 / 26

slide-33
SLIDE 33

Experiments

Semantic Features

3 Taxonomy

  • Frame one up (frame + direct parent)
  • Frames bottom (leafs, bottom children)
  • Frames with LCS (Resnik, 1995), filtered out too generic

parents

Linked Open Data

  • DBpedia (author, genre, subject, type, wiki link)
  • Combination with best frames approach

De Clercq et al. (LT3) CBRecSys 10-06-2014 17 / 26

slide-34
SLIDE 34

Experiments

System

Preprocessing

  • Tokenization
  • Stemming
  • Stop word filtering

De Clercq et al. (LT3) CBRecSys 10-06-2014 18 / 26

slide-35
SLIDE 35

Experiments

System

Preprocessing

  • Tokenization
  • Stemming
  • Stop word filtering

Feature weighting

  • Unsupervised attribute weighting (TF-IDF)
  • Attribute selection (Gain Ratio) → feature > 0

De Clercq et al. (LT3) CBRecSys 10-06-2014 18 / 26

slide-36
SLIDE 36

Experiments

System

Preprocessing

  • Tokenization
  • Stemming
  • Stop word filtering

Feature weighting

  • Unsupervised attribute weighting (TF-IDF)
  • Attribute selection (Gain Ratio) → feature > 0

Evaluation

  • System outputs the positive class likelihood
  • RMSE (lower = better)
  • ROC curve and compute AUC

De Clercq et al. (LT3) CBRecSys 10-06-2014 18 / 26

slide-37
SLIDE 37

Results

Overview

1 Frame-Enhancement 2 Experiments 3 Results 4 Conclusion

De Clercq et al. (LT3) CBRecSys 10-06-2014 19 / 26

slide-38
SLIDE 38

Results

Features RMSE Baselines Majority voting (0) 0.7705 Words as such 0.6145

De Clercq et al. (LT3) CBRecSys 10-06-2014 20 / 26

slide-39
SLIDE 39

Results

Features RMSE Baselines Majority voting (0) 0.7705 Words as such 0.6145 Frames Frames as such 0.6272 Lexical units (LUs) 0.6266 Frame elements (FEs) 0.6036 Frames + LUs 0.6259 Frames + LUs + FEs 0.6036 Events Events as such 0.6132 Events + LUs 0.6259 Events + LUs + FEs 0.6237 Taxonomy Frames One up 0.6244 Frames Bottom 0.6253 Frames + LCS 0.6285

De Clercq et al. (LT3) CBRecSys 10-06-2014 20 / 26

slide-40
SLIDE 40

Results

Features RMSE Baselines Majority voting (0) 0.7705 Words as such 0.6145 Frames Frames as such 0.6272 Lexical units (LUs) 0.6266 Frame elements (FEs) 0.6036 Frames + LUs 0.6259 Frames + LUs + FEs 0.6036 Events Events as such 0.6132 Events + LUs 0.6259 Events + LUs + FEs 0.6237 Taxonomy Frames One up 0.6244 Frames Bottom 0.6253 Frames + LCS 0.6285 LOD DBpedia features 0.6022 DBpedia + FEs 0.5982

De Clercq et al. (LT3) CBRecSys 10-06-2014 20 / 26

slide-41
SLIDE 41

Results

Features AUC Baselines Majority voting (0) n/a Words as such 0.5431 Frames Frames as such 0.5377 Lexical units (LUs) 0.5398 Frame elements (FEs) 0.5468 Frames + LUs 0.5389 Frames + LUs + FEs 0.5453 Events Events as such 0.5148 Events + LUs 0.5310 Events + LUs + FEs 0.5296 Taxonomy Frames One up 0.5297 Frames Bottom 0.5370 Frames + LCS 0.5376 LOD DBpedia features 0.5588 DBpedia + FEs 0.5498

De Clercq et al. (LT3) CBRecSys 10-06-2014 21 / 26

slide-42
SLIDE 42

Results

False positive rate True positive rate 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 Words FEs DBpedia DBpedia_FEs Ristoski

De Clercq et al. (LT3) CBRecSys 10-06-2014 22 / 26

slide-43
SLIDE 43

Results

Features RMSE AUC Baselines Majority voting (0) 0.7705 n/a Words as such 0.6145 0.5431 Frames Frames as such 0.6272 0.5377 Lexical units (LUs) 0.6266 0.5398 Frame elements (FEs) 0.6036 0.5468 Frames + LUs 0.6259 0.5389 Frames + LUs + FEs 0.6036 0.5453 Events Events as such 0.6132 0.5148 Events + LUs 0.6259 0.5310 Events + LUs + FEs 0.6237 0.5296 Taxonomy Frames One up 0.6244 0.5297 Frames Bottom 0.6253 0.5370 Frames + LCS 0.6285 0.5376 LOD DBpedia features 0.6022 0.5588 DBpedia + FEs 0.5982 0.5498 (DBpedia + FEs hybrid) (0.5664) (0.5571)

De Clercq et al. (LT3) CBRecSys 10-06-2014 23 / 26

slide-44
SLIDE 44

Conclusion

Overview

1 Frame-Enhancement 2 Experiments 3 Results 4 Conclusion

De Clercq et al. (LT3) CBRecSys 10-06-2014 24 / 26

slide-45
SLIDE 45

Conclusion

Conclusion

  • Add semantic information to content-based book

recommender

  • Text-internal (semantic frame)
  • Text-external (LOD)
  • Frames add information, taxonomy Events
  • Combine both information sources works best

De Clercq et al. (LT3) CBRecSys 10-06-2014 25 / 26

slide-46
SLIDE 46

Conclusion

Conclusion

  • Add semantic information to content-based book

recommender

  • Text-internal (semantic frame)
  • Text-external (LOD)
  • Frames add information, taxonomy Events
  • Combine both information sources works best

Future Work

  • One parser: filtering, event extraction, other parsers
  • Wikipedia: other information (GoogleBooks,

Amazon, GoodReads)

  • Hybrid design, collaborative-filtering approach
  • ...

De Clercq et al. (LT3) CBRecSys 10-06-2014 25 / 26

slide-47
SLIDE 47

Conclusion

Thank you

Questions?

  • rphee.declercq@ugent.be

De Clercq et al. (LT3) CBRecSys 10-06-2014 26 / 26

slide-48
SLIDE 48

References

  • C. J. Filmore. Frame semantics. Linguistics in the Morning Calm, pages 111–137, 1982.
  • C. J. Filmore, C. R. Johnson, and M. R. L. Petruck. Background to framenet.

International Journal of Lexicography, 16(3):235–250, 2003.

  • J. Ruppenhofer, M. Ellsworth, M. R. L. Petruck, and C. R. Johnson. Framenet ii:

Extended theory and practice. Technical report, 2005.

  • D. Dipanjan, A. F. T. Martins, N. Schneider, and N. A. Smith. Frame-semantic parsing.

Computational Linguistics, 40(1):9–56, 2014.

  • M. Schuhmacher and C. Meilicke. Popular Books and Linked Data: Some Results for the

ESWC-14 RecSys Challenge. In LOD-enabled Recommender Systems Challenge (ESWC 2014), 2014.

  • P. Resnik. Using information content to evaluate semantic similarity in a taxonomy. In

Proceedings of the 14th international joint conference on Artificial intelligence (IJCAI’95), 1995. De Clercq et al. (LT3) CBRecSys 10-06-2014 27 / 26