June 2012 Entity Detection Structured Recommendations Play - - PowerPoint PPT Presentation

june 2012 entity detection structured recommendations
SMART_READER_LITE
LIVE PREVIEW

June 2012 Entity Detection Structured Recommendations Play - - PowerPoint PPT Presentation

Patrick Pantel Joint work with : Tom Lin (UW), Michael Gamon (MSR), Anitha Kannan (MSR), Ariel Fuxman (MSR) June 2012 Entity Detection Structured Recommendations Play trailer Structured Data Price prediction Task completion Aggregate


slide-1
SLIDE 1

Patrick Pantel

Joint work with: Tom Lin (UW), Michael Gamon (MSR), Anitha Kannan (MSR), Ariel Fuxman (MSR) June 2012

slide-2
SLIDE 2

Structured Data Entity Detection Play trailer Structured Recommendations

slide-3
SLIDE 3

Price prediction Task completion Task completion Aggregate ratings

slide-4
SLIDE 4

Active Objects 4

Direct Answer Structured Data

slide-5
SLIDE 5

Current Experience

Active Objects 21

slide-6
SLIDE 6

Better experience

Active Objects 23

Recognize entity in query Actions easily accessible

slide-7
SLIDE 7

Better experience

Active Objects 24

Recognize entity in query Actions easily accessible

slide-8
SLIDE 8

Better experience

Active Objects 25

Recognize entity in query Actions easily accessible

slide-9
SLIDE 9

Politicians

Active Objects 29

Recognize entity in query Actions easily accessible

slide-10
SLIDE 10

Films

Active Objects 30

Recognize entity in query Actions easily accessible

slide-11
SLIDE 11

32

t t Q Q t t n 2 2 q q T T a a e w w K K c s s s f f K K y y T T K K

Active Objects

slide-12
SLIDE 12

Actions vs Intents

Active Objects 36

User Intents and Goals Query Intent plan vacation Query hilton orlando reviews sea world location Finer-grained Intents Actions

  • n Entities

read reviews(hotel) get address(landmark) add to Netflix queue(film) buy(camera) Navigational Informational Transactional [Broder, 2002] Advice Locate … Download Obtain Interact … [Rose and Levinson, 2004] get in shape how to lose weight

slide-13
SLIDE 13

Do web queries contain entities?

Active Objects 38

43% entity

(e.g., “GoldenEye”, “Horne Auto”)

14% entity category

(e.g., “golf cart battery”, “global sim card”)

15% no entity

(e.g., “xxx”, “good reading quotes”)

28% website

(e.g., “yahoo mail”, “girlybox.com”)

category 4% category + refiner 10% entity 29% entity + refiner 14% website 28%

  • ther

15%

Entity Distribution in Web Search Queries * From a query traffic-weighted sample

creativework 40%

  • rganization

37%

product 9% person 8% event 3%

  • ther

3%

Schema.org types for entity- bearing queries

slide-14
SLIDE 14

Ontology of Actions

Active Objects 43

Navigational 10x Login Action (on a Website entity) 4x Search Action (on a Website entity) Informational (need satisfied by reading content, or could be satisfied by written transcript of content) 1x Find Location(s) (on an Organization entity) 1x Find Lyrics (on a CreativeWork / MusicalTrack entity) 2x Find Recipe For (on a food) 1x Find Where to Buy (on a Product entity) 2x Get Contact Information (on an Organization entity) 1x Get Directions To (on an Organization / Location entity) 2x Get Domain Information (on a Website entity) 1x Get Event Details (on an Event entity) 2x Get Event Results (on an Event entity) 4x Product Detail (on a Product entity) 29x Learn (on any entity) 6x Learn / Educational (on a Person / Product / Organization entity) 1x Learn / Trivia (on any entity) 1x Operating Hours (on an Organization entity) 3x Read Articles (on a News / Magazine entity) 1x Read Guide (on a Product entity) 1x Read Help (on a Product entity) 8x Read News About (on any entity) 3x Read Reviews (Shopping on a CreativeWork / Product / Service entity) 1x Read Spoilers (on a CreativeWork) 8x Research (focused information gathering, on any entity) 8x Search Database of (e.g., obituaries, on an Organization / Website) See Menu (on a Restaurant) 3x See Pictures (on a Person / Product / Organization entity) Side Effects / Safety (on a Product entity) Stock Price (on an Organization entity) Transactional (navigating to a web-mediated action) 1x Apply for Job (on a LocalBusiness / Organization entity) Buy (Shopping on a Product entity) Buy Tickets (on an Event / Product / Person entity) 3x Content Creation (on a Website entity) Discuss Online (on any entity) 5x Download (on a CreativeWork or Software entity) 1x Listen to Music (on a CreativeWork or Website entity) Manage Account (on a Local Business / Website / Org entity) Pay Bill (on a Website / Organization entity) 14x Play Game (on a Game entity) Rent (on a CreativeWork / Product entity) 2x Reservation (on a Hotel entity) Schedule Appointment (on a LocalBusiness entity) Sell (Shopping on a Product entity) 1x Use Service On (e.g., translate, on a Website) 6x Watch Video About (on any entity) 1x Web Chat Other 13x Shopping (category of actions including reviews and buying) 19x Various/Unknown

Actions are tied to entity types 47 actions in current list

Note: No existing Actions equivalent for Schema.org

slide-15
SLIDE 15

How many Actions should there be?

Active Objects 44

1 2 3 4 5 6 7 8 9 10 50 100 150 200 250 New Actions per 10 Annotations (average) Annotations

Discovery Rate of New Actions Rapidly decreasing discovery rate

slide-16
SLIDE 16

45

t t Q Q t t n 2 2 q q T T a a e w w K K c s s s f f K K y y T T K K

Active Objects

slide-17
SLIDE 17

Learning Actions from Web Usage Logs

Active Objects 49

21 types 58,123 hosts 2,164,579 (query, host) pairs

  • ver 3 months

129,088 contexts 235,385 entities

  • Three months of us-en

web logs

  • Annotate with Freebase

entities

  • Keep queries with an entity in set of 21 types
  • Filter out navigational queries
  • Filter out clicked hosts that weren’t clicked at least 100 times

Orlando hotel reviews Does Hope Solo have a boyfriend? Free Winzip download watch family guy online get reviews read biography download software watch shows online ←

slide-18
SLIDE 18

Model 1

Active Objects 52

Action Context

Goal: Define a theory for how actionable queries are generated.

The story for p(actionable query),

  • r more formally

The story for p(f,q,a,n | a,b) For each action a fa ~ Dirichlet(b) For each query q q ~ Dirichlet(a) For each context position in q (pre or post) action a ~ Multinomial(q) ngram n ~ Multinomial(fa)

Model 1.01 Model 1.01 b b f f K K a a q q Q Q a1 a1 n1 a2 a2 n2

Star Wars “ebert review" read reviews action “______ ______” read reviews action

(action→contexts die)

slide-19
SLIDE 19

Active Objects 53

Model 2

Action Context

The story for p(f,q,a,n | a,b) For each action a fa ~ Dirichlet(b) For each query q q ~ Dirichlet(a) For each context position in q (pre or post) action a ~ Multinomial(q) ngram n ~ Multinomial(fa) action a ~ Multinomial(q) ngram n1 ~ Multinomial(fa) ngram n2 ~ Multinomial(fa)

a a q q n1 n2 Q Q b b f f K K Model 1.02 Model 1.02 a a

Star Wars “ebert review" read reviews action

slide-20
SLIDE 20

Active Objects 54

Model 3

Clicked hosts matter…

Action Context buy action read reviews action amazon.com ebay.com walmart.com rottentomatoes.com metacritic.com efilmcritic.com

slide-21
SLIDE 21

“______ ______” read reviews action

Active Objects

  • 55 -

Model 1.13 Model 1.13

Model 3

The story for P(f,q,w,a,n,c | a,b,) For each action a fa ~ Dirichlet(b) For each query q q ~ Dirichlet(a) action a ~ Multinomial(q) ngram n1 ~ Multinomial(fa) ngram n2 ~ Multinomial(fa)

“ebert review” Action Host Context

b b f f K K   w w K K a a q q Q Q n 2 2 c a a

click c ~ Multinomial(wa) wa ~ Dirichlet(i)

www.rottentomatoes.com

w, ,c , (to contexts) (to clicks)

Star Wars

slide-22
SLIDE 22

Active Objects 57

Model 3

The type matters…

Action Host Context

slide-23
SLIDE 23

Active Objects 58

Model 1.14 Model 1.14

Model 4

Action Host Context Type

Model 1.03 Model 1.03 b b f f K K n 2 2 a a   w w K K c g g t t T T a a q q Q Q t t

The story for P(f,q,t,w,t,a,n,c | a,b,g,) For each action a fa ~ Dirichlet(b) wa ~ Dirichlet(i) For each query q q ~ Dirichlet(a) action a ~ Multinomial(q) ngram n1 ~ Multinomial(fa) ngram n2 ~ Multinomial(fa) click c ~ Multinomial(wa)

a a q q Q Q

For each type t tt ~ Dirichlet(g) type t ~ Multinomial(q) action a ~ Multinomial(q) action a ~ Multinomial(tt)

“ebert review” “______ ______” film type read reviews action www.rottentomatoes.com

t, t, g, (to contexts) (to clicks) (to actions)

Star Wars

slide-24
SLIDE 24

Active Objects 60

Model 3

We also have entity data…

Action Host Context Type

slide-25
SLIDE 25

Active Objects

  • 61 -

a a q q Q Q b b f f K K Model 1.05b Model 1.05b t t n 2 2 g g t t T T a a   w w K K c

Model 5

Action Host Context Type Entity

h h y y T T e

The story for P(f,q,t,y,w,t,a,e,n,c | a,b,g,h,) For each action a fa ~ Dirichlet(b) wa ~ Dirichlet(i) For each type t tt ~ Dirichlet(g) For each query q q ~ Dirichlet(a) type t ~ Multinomial(q) action a ~ Multinomial(tt) ngram n1 ~ Multinomial(fa) ngram n2 ~ Multinomial(fa) click c ~ Multinomial(wa) yt ~ Dirichlet(h) entity e ~ Multinomial(yt) y, e, h,

slide-26
SLIDE 26

Active Objects

  • 63 -

Model 6

Action Host Context Type Entity Empty

The story for P(f,q,t,y,w,s,t,a,e,s,n,c | a,b,g,h,,e) For each action/type pair, {a, t} fa ~ Dirichlet(b) wa ~ Dirichlet() sa,t ~ Beta(e) For each type t tt ~ Dirichlet(g) yt ~ Dirichlet(h) For each query q q ~ Dirichlet(a) type t ~ Multinomial(q) action a ~ Multinomial(tt) entity e ~ Multinomial(yt) switch s1 ~ Bernou (sa) switch s2 ~ Bernoulli(sa,t) if (s1) ngram n1 ~ Multinomial(fa) if (s2) ngram n2 ~ Multinomial(fa) click c ~ Multinomial(wa) The story for P(f,q,t,y,w,s,t,a,e,s,n,c | a,b,g,h,,e) For each action/type pair, {a, t} fa ~ Dirichlet(b) wa ~ Dirichlet() sa ~ Beta(e) switch s1 ~ Bernoulli(sa) switch s2 ~ Bernoulli(sa) if (s1) if (s2) s,

a a q q Q Q Model 1.06b Model 1.06b t t n 2 2 g g t t T T a a e   w w K K c b b f f K K h h y y T T e e s s K K s

slide-27
SLIDE 27

Apply Generative Model

Active Objects 66 66

Action (hidden) Clicked Host (observed) Query Context (observed) Type (observed) P(Action|Type) P(Host|Action) P(Context|Action) A new query comes in: (e.g., “New York City hotels”) Entity Recognition(query) → entity (“New York City”) entity → types (“city”, “employer”, “travel destination”) (query, entity) → context (“Ø”, “hotels”) Historical Data(query) → distribution over hosts

  • EM Posterior Probabilities to give us likelihood of

each action cluster.

  • action cluster → action phrase (“book hotel in”)
  • (action cluster, action phrase, historical data) →

best hosts (“travel.bing.com”) query: jetbeam rrt-0

slide-28
SLIDE 28

Active Objects 69

slide-29
SLIDE 29

Action words from Web Trigrams

Active Objects 70

  • Patterns (similar to Hearst patterns) on a Web Trigram corpus

to get actions.

Web Trigrams Pattern Match “want to (x)” “have to (x)” “you can (x)” “I can (x)” Filter Adverbs (e.g., “honestly”, “quickly”) Filter noise (the 25% with lowest

𝑔𝑠𝑓𝑟𝑣𝑓𝑜𝑑𝑧 𝑣𝑜𝑗𝑕𝑠𝑏𝑛 𝑑𝑝𝑣𝑜𝑢

e.g., “a”, “boy”) 13,417 action words make download find torrent say eBay Pay login Buy podcast help … … Method scales to longer actions, e.g., 4-grams for 2-word actions (“read review”) Finds modern/web actions that older annotated corpora might miss. ↑ ↑

slide-30
SLIDE 30

Web Action words from Trigrams

Active Objects 71

  • Not all actions can be recommended over the Web (e.g.,

“shock” or “kill”). How do we find the ones that can?

Web Action ngram pattern “(x) at (y)” where y has the form of a web site URL “buy at Amazon.com” “download at cnet.com” 13,417 action words make download find torrent say eBay Pay login Buy podcast help … … 1,279 web actions buy review shop unsubscribe book download …

slide-31
SLIDE 31

Human Annotation of Action Phrases

Active Objects 73

19, Read biography of 19, See pictures of 19, Read blog of 19, Contact 19, Read interview with 19, Watch video of 12, Download 12, Find reviews of 12, Update 12, Get help for 24, Apply for jobs at 24, View career options at 24, View map of 24, Read news about 24, Find locations of 24, Find address of 24, Get stock quote of

From each model, we first automatically generate:

Use as training data

slide-32
SLIDE 32

Evaluation Setup

Active Objects 79

a a q q n1 n2 Q Q b b f f K K Model 1.02 Model 1.02 a a

Model 2 Model 3 Model 4 Model 5 Model 6 +clicked host +entity types +entity +empty switch action → context

a a q q Q Q b b f f K K Model 1.13 Model 1.13 n 2 2 a a   w w K K c a a q q Q Q b b f f K K Model 1.14 Model 1.14 t t n 2 2 g g t t T T a a   w w K K c a a q q Q Q b b f f K K Model 1.05b Model 1.05b t t n 2 2 g g t t T T a a h h y y T T e   w w K K c a a q q Q Q Model 1.06b Model 1.06b t t n 2 2 g g t t T T a a e   w w K K c s e e s s K K b b f f K K h h y y T T

21 types 58,123 hosts 2,164,579 (query, host) pairs

  • ver 3 months

129,088 contexts 235,385 entities Data + Filter out Navigational Queries 50 clusters, 2-step learning over 100 total EM iterations, 2 folds per model

slide-33
SLIDE 33

Evaluation Framework

Active Objects 82

Combined set of actions over all models x 7 UHRS Fair κ agreement

slide-34
SLIDE 34

Performance on HEAD vs. TAIL vs. Type-Balanced queries

Active Objects 87

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 HEAD TAIL Type-Balanced nDCG

nDCG vs. Query Sets

(with 95% confidence bounds) Model 2 Model 3 Model 4 Model 5 Model 6

Tail is dominated by “People” type

+ click + type + entity + switch

slide-35
SLIDE 35

Action Discovery, Diversity

Active Objects 89

0% 100% 200% 300% 400% 500% 600% 700% 800% 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Total P(action | type) Cluster Rank Model 4 0% 100% 200% 300% 400% 500% 600% 700% 800% 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Total P(action | type) Cluster Rank Model 4 Model 5 0% 100% 200% 300% 400% 500% 600% 700% 800% 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Total P(action | type) Cluster Rank Model 4 Model 5 Model 6

slide-36
SLIDE 36

Examples

Active Objects 92

Query: Webster University Entity: Webster University Context: (“”, “”) Types: /business/employer, education/university, /location/location

Model 2 (context) Model 3 (+click) Model 4 (+type) Model 5 (+entity) Model 6 (+switch)

  • 1. Torrent
  • 2. Read biography
  • 3. Find adult

pictures of

  • 4. Watch videos
  • 5. See picture of
  • 6. Get quotes from
  • 7. Apply for jobs at
  • 1. Torrent
  • 2. Read biography
  • 3. Read news

about

  • 4. See pictures of
  • 5. Apply for jobs at
  • 6. Get quotes from
  • 7. See videos with
  • 1. Read reviews of
  • 2. See map of
  • 3. Follow sports

teams of

  • 4. Get weather in
  • 5. Apply for jobs at
  • 6. Find address of
  • 7. See rankings of
  • 1. Read reviews of
  • 2. See map of
  • 3. Follow sports

teams of

  • 4. Get weather in
  • 5. Apply for jobs at
  • 6. Find address of
  • 7. See tuition of
  • 1. Find address
  • 2. See pictures of
  • 3. Find map of
  • 4. Read news

about

  • 5. Apply for jobs at
  • 6. See cost of
  • 7. See ranking of

Models 4, 5, 6 automatically generate reasonable actions for this query

Model 2 (context) Model 3 (+click) Model 4 (+type) Model 5 (+entity) Model 6 (+switch)

  • 1. Torrent
  • 2. Read biography
  • 3. Find adult

pictures of

  • 4. Watch videos
  • 5. See picture of
  • 6. Get quotes from
  • 7. Apply for jobs at
  • 1. Torrent
  • 2. Read biography
  • 3. Read news

about

  • 4. See pictures of
  • 5. Apply for jobs at
  • 6. Get quotes from
  • 7. See videos with
  • 1. Read reviews of
  • 2. See map of
  • 3. Follow sports

teams of

  • 4. Get weather in
  • 5. Apply for jobs at
  • 6. Find address of
  • 7. See rankings of
  • 1. Read reviews of
  • 2. See map of
  • 3. Follow sports

teams of

  • 4. Get weather in
  • 5. Apply for jobs at
  • 6. Find address of
  • 7. See tuition of
  • 1. Find address
  • 2. See pictures of
  • 3. Find map of
  • 4. Read news

about

  • 5. Apply for jobs at
  • 6. See cost of
  • 7. See ranking of

+ User Model?

slide-37
SLIDE 37

Examples

Active Objects 93

Query: download Skype Entity: Skype Context: (“download”, “”) Types: /computer/software,/business/employer,/business/business_operation

Model 2 (context) Model 3 (+click) Model 4 (+type) Model 5 (+entity) Model 6 (+switch)

  • 1. Download
  • 2. Login to
  • 3. Read celebrity

gossip about

  • 4. Watch videos

with

  • 5. Hack
  • 6. Find games with
  • 7. Watch movies

with

  • 1. Download
  • 2. Play games
  • 3. Hack
  • 4. Chat at
  • 5. Create account

at

  • 6. Torrent
  • 7. Read biography
  • f
  • 1. Download
  • 2. Login to
  • 3. Hack
  • 1. Find on social

networks

  • 2. Download
  • 3. Hack
  • 1. Download
  • 2. Find reviews of
  • 3. Update
  • 4. Get help for

Again, we can now automatically generate reasonable actions for queries!

slide-38
SLIDE 38

144

t t Q Q t t n 2 2 q q T T a a e w w K K c s s s f f K K y y T T K K

Active Objects

slide-39
SLIDE 39

Active Objects (Model IM) Active Objects (Model IM) Q Q n 2 2 q q T T a a e w w K K c s s s f f K K y y T T K K

Inferring Type Distributions

  • Entity types are modeled as

latent variables, jointly with the intended actions.

  • Extensions:

– Given admissible types from a KB such as Freebase, learn their priors and contextual disambiguation – Given a new term, induce the types of the term – Automatically induce type list and admissible types for arbitrary entities

Active Objects 145

t t t t

slide-40
SLIDE 40

Prior Art (Guo, Xu, Cheng, and Li, SIGIR’09)

Active Objects 146

Generative process for entity bearing queries. For each query q entity e ~ Multinomial(y) type t ~ Multinomial(te) ngram n1 ~ Multinomial(ft) ngram n2 ~ Multinomial(ft)

Q Q Guo’09 Guo’09 y y e t t t t E E n 2 2 f f T T

slide-41
SLIDE 41

Active Objects 147

y y Q Q t t n 2 2 e f f T T t t E E Guo’09 Guo’09

+ Empty Switch

y y Q Q t t n 2 2 e f f T T t t E E Model M0 Model M0 s s s T T

slide-42
SLIDE 42

Active Objects 148

y y Q Q t t n 2 2 e f f T T t t E E Guo’09 Guo’09

+ Empty Switch + Click

w w T T c s s s T T y y Q Q t t n 2 2 e f f T T t t E E Model M1 Model M1

slide-43
SLIDE 43

Active Objects 149

y y Q Q t t n 2 2 e f f T T t t E E Guo’09 Guo’09

+ Empty Switch + Click + Action

Model IM Model IM t t Q Q t t n 2 2 e f f K K y y T T q q T T a a w w K K c s s s K K

slide-44
SLIDE 44

Experimental Setting

  • Training

– Queries from 3 months of US Bing search usage logs – Entities from 73 Freebase types, accounting for 50% query traffic in US market – Model parameters trained using 2-step learning over 100 EM iterations, 2 folds per model

  • Testing

– Query-weighted random sample of 500 HEAD and 500 TAIL entity- bearing queries – 7 paid independent annotators identified all applicable Freebase types to the entities in the queries

  • Two annotators per query
  • Fleiss’  was 0.445, moderate agreement

Active Objects 150

slide-45
SLIDE 45

Performance Analysis

Active Objects 151

  • † indicates statistical significance over BFB, and ‡ over both BFB and Guo’09.
  • Bold indicates statistical significance over all non-bold models in the

column.

  • M1 (empty context + click signal) significantly outperforms baseline and

Guo’09, on HEAD.

  • IM significantly better over all models and across all metrics

– Biggest gains in first position of its ranking (Prec@1 metric).

slide-46
SLIDE 46

Switch Parameter Analysis

Active Objects 153

  • Switch improves performance across all models
  • More expressive models benefit more from switch

0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 M0 M1 IM

Relative gain of switch vs. no switch

Effect of Empty Switch Parameter (s) on HEAD

No switch nDCG MAP MAPW Prec@1

slide-47
SLIDE 47

Discussion

  • Why performance on TAIL lower than expected?

– TAIL is skewed towards the PEOPLE types – Latent actions are over-expressive and they do not help in differentiating PEOPLE types

  • Inspection of latent Action parameter in IM shows that most PEOPLE

types have all their mass distributed to three generic and common intents (see pictures of, find biographical information about, and see video of)

  • Success case in the TAIL

– “ymca” -> {song, place, educational_institution}

  • Marginalizing out the context words gives the following intent priors:

0.63, 0.29, 0.08

– q1 = “jamestown ymca ny”  IM correctly classified “ymca” as a place – q2 = “ymca palomar”  IM correctly classified “ymca” as a educational_institution

Active Objects 155

slide-48
SLIDE 48

Wrap-Up

slide-49
SLIDE 49

Active Objects 157

slide-50
SLIDE 50

Active Objects 159

  • A hodgepodge of

related strings

  • Only actionable

through search

slide-51
SLIDE 51

Active Objects 160

2009 2010

slide-52
SLIDE 52

Active Objects 161

Big Wins

  • Typed relations

2009 2010

slide-53
SLIDE 53

Active Objects 162

Big Wins

  • Typed relations
  • User Interface accesses structured data

2009 2010

slide-54
SLIDE 54

Active Objects 163

Big Wins

  • Typed relations
  • User Interface accesses structured data
  • Click through experience can now

leverage strongly-typed identifier

2009 2010

slide-55
SLIDE 55

Active Objects 164

Big Wins

  • Typed relations
  • User Interface accesses structured data
  • Click through experience can now

leverage strongly-typed identifier

  • Brokered Actions (one click conversions)

2009 2010

Add to Amazon cart Connect on Twitter Play track

slide-56
SLIDE 56

Active Objects 165

2009 2010

To Do

  • Fixed actions
  • Model tasks
  • Annotate URLs/Apps with actions
  • Active Objects as an entity-centric UI
slide-57
SLIDE 57