Patrick Pantel
Joint work with: Tom Lin (UW), Michael Gamon (MSR), Anitha Kannan (MSR), Ariel Fuxman (MSR) June 2012
June 2012 Entity Detection Structured Recommendations Play - - PowerPoint PPT Presentation
Patrick Pantel Joint work with : Tom Lin (UW), Michael Gamon (MSR), Anitha Kannan (MSR), Ariel Fuxman (MSR) June 2012 Entity Detection Structured Recommendations Play trailer Structured Data Price prediction Task completion Aggregate
Joint work with: Tom Lin (UW), Michael Gamon (MSR), Anitha Kannan (MSR), Ariel Fuxman (MSR) June 2012
Structured Data Entity Detection Play trailer Structured Recommendations
Price prediction Task completion Task completion Aggregate ratings
Active Objects 4
Direct Answer Structured Data
Active Objects 21
Active Objects 23
Recognize entity in query Actions easily accessible
Active Objects 24
Recognize entity in query Actions easily accessible
Active Objects 25
Recognize entity in query Actions easily accessible
Active Objects 29
Recognize entity in query Actions easily accessible
Active Objects 30
Recognize entity in query Actions easily accessible
32
t t Q Q t t n 2 2 q q T T a a e w w K K c s s s f f K K y y T T K K
Active Objects
Active Objects 36
User Intents and Goals Query Intent plan vacation Query hilton orlando reviews sea world location Finer-grained Intents Actions
read reviews(hotel) get address(landmark) add to Netflix queue(film) buy(camera) Navigational Informational Transactional [Broder, 2002] Advice Locate … Download Obtain Interact … [Rose and Levinson, 2004] get in shape how to lose weight
Active Objects 38
43% entity
(e.g., “GoldenEye”, “Horne Auto”)
14% entity category
(e.g., “golf cart battery”, “global sim card”)
15% no entity
(e.g., “xxx”, “good reading quotes”)
28% website
(e.g., “yahoo mail”, “girlybox.com”)
category 4% category + refiner 10% entity 29% entity + refiner 14% website 28%
15%
Entity Distribution in Web Search Queries * From a query traffic-weighted sample
creativework 40%
37%
product 9% person 8% event 3%
3%
Schema.org types for entity- bearing queries
Active Objects 43
Navigational 10x Login Action (on a Website entity) 4x Search Action (on a Website entity) Informational (need satisfied by reading content, or could be satisfied by written transcript of content) 1x Find Location(s) (on an Organization entity) 1x Find Lyrics (on a CreativeWork / MusicalTrack entity) 2x Find Recipe For (on a food) 1x Find Where to Buy (on a Product entity) 2x Get Contact Information (on an Organization entity) 1x Get Directions To (on an Organization / Location entity) 2x Get Domain Information (on a Website entity) 1x Get Event Details (on an Event entity) 2x Get Event Results (on an Event entity) 4x Product Detail (on a Product entity) 29x Learn (on any entity) 6x Learn / Educational (on a Person / Product / Organization entity) 1x Learn / Trivia (on any entity) 1x Operating Hours (on an Organization entity) 3x Read Articles (on a News / Magazine entity) 1x Read Guide (on a Product entity) 1x Read Help (on a Product entity) 8x Read News About (on any entity) 3x Read Reviews (Shopping on a CreativeWork / Product / Service entity) 1x Read Spoilers (on a CreativeWork) 8x Research (focused information gathering, on any entity) 8x Search Database of (e.g., obituaries, on an Organization / Website) See Menu (on a Restaurant) 3x See Pictures (on a Person / Product / Organization entity) Side Effects / Safety (on a Product entity) Stock Price (on an Organization entity) Transactional (navigating to a web-mediated action) 1x Apply for Job (on a LocalBusiness / Organization entity) Buy (Shopping on a Product entity) Buy Tickets (on an Event / Product / Person entity) 3x Content Creation (on a Website entity) Discuss Online (on any entity) 5x Download (on a CreativeWork or Software entity) 1x Listen to Music (on a CreativeWork or Website entity) Manage Account (on a Local Business / Website / Org entity) Pay Bill (on a Website / Organization entity) 14x Play Game (on a Game entity) Rent (on a CreativeWork / Product entity) 2x Reservation (on a Hotel entity) Schedule Appointment (on a LocalBusiness entity) Sell (Shopping on a Product entity) 1x Use Service On (e.g., translate, on a Website) 6x Watch Video About (on any entity) 1x Web Chat Other 13x Shopping (category of actions including reviews and buying) 19x Various/Unknown
Note: No existing Actions equivalent for Schema.org
Active Objects 44
1 2 3 4 5 6 7 8 9 10 50 100 150 200 250 New Actions per 10 Annotations (average) Annotations
Discovery Rate of New Actions Rapidly decreasing discovery rate
45
t t Q Q t t n 2 2 q q T T a a e w w K K c s s s f f K K y y T T K K
Active Objects
Active Objects 49
21 types 58,123 hosts 2,164,579 (query, host) pairs
129,088 contexts 235,385 entities
web logs
entities
Orlando hotel reviews Does Hope Solo have a boyfriend? Free Winzip download watch family guy online get reviews read biography download software watch shows online ←
Active Objects 52
Action Context
Goal: Define a theory for how actionable queries are generated.
The story for p(actionable query),
The story for p(f,q,a,n | a,b) For each action a fa ~ Dirichlet(b) For each query q q ~ Dirichlet(a) For each context position in q (pre or post) action a ~ Multinomial(q) ngram n ~ Multinomial(fa)
Model 1.01 Model 1.01 b b f f K K a a q q Q Q a1 a1 n1 a2 a2 n2
Star Wars “ebert review" read reviews action “______ ______” read reviews action
(action→contexts die)
Active Objects 53
Action Context
The story for p(f,q,a,n | a,b) For each action a fa ~ Dirichlet(b) For each query q q ~ Dirichlet(a) For each context position in q (pre or post) action a ~ Multinomial(q) ngram n ~ Multinomial(fa) action a ~ Multinomial(q) ngram n1 ~ Multinomial(fa) ngram n2 ~ Multinomial(fa)
a a q q n1 n2 Q Q b b f f K K Model 1.02 Model 1.02 a a
Star Wars “ebert review" read reviews action
Active Objects 54
Action Context buy action read reviews action amazon.com ebay.com walmart.com rottentomatoes.com metacritic.com efilmcritic.com
“______ ______” read reviews action
Active Objects
Model 1.13 Model 1.13
The story for P(f,q,w,a,n,c | a,b,) For each action a fa ~ Dirichlet(b) For each query q q ~ Dirichlet(a) action a ~ Multinomial(q) ngram n1 ~ Multinomial(fa) ngram n2 ~ Multinomial(fa)
“ebert review” Action Host Context
b b f f K K w w K K a a q q Q Q n 2 2 c a a
click c ~ Multinomial(wa) wa ~ Dirichlet(i)
www.rottentomatoes.com
w, ,c , (to contexts) (to clicks)
Star Wars
Active Objects 57
Action Host Context
Active Objects 58
Model 1.14 Model 1.14
Action Host Context Type
Model 1.03 Model 1.03 b b f f K K n 2 2 a a w w K K c g g t t T T a a q q Q Q t t
The story for P(f,q,t,w,t,a,n,c | a,b,g,) For each action a fa ~ Dirichlet(b) wa ~ Dirichlet(i) For each query q q ~ Dirichlet(a) action a ~ Multinomial(q) ngram n1 ~ Multinomial(fa) ngram n2 ~ Multinomial(fa) click c ~ Multinomial(wa)
a a q q Q Q
For each type t tt ~ Dirichlet(g) type t ~ Multinomial(q) action a ~ Multinomial(q) action a ~ Multinomial(tt)
“ebert review” “______ ______” film type read reviews action www.rottentomatoes.com
t, t, g, (to contexts) (to clicks) (to actions)
Star Wars
Active Objects 60
Action Host Context Type
Active Objects
a a q q Q Q b b f f K K Model 1.05b Model 1.05b t t n 2 2 g g t t T T a a w w K K c
Action Host Context Type Entity
h h y y T T e
The story for P(f,q,t,y,w,t,a,e,n,c | a,b,g,h,) For each action a fa ~ Dirichlet(b) wa ~ Dirichlet(i) For each type t tt ~ Dirichlet(g) For each query q q ~ Dirichlet(a) type t ~ Multinomial(q) action a ~ Multinomial(tt) ngram n1 ~ Multinomial(fa) ngram n2 ~ Multinomial(fa) click c ~ Multinomial(wa) yt ~ Dirichlet(h) entity e ~ Multinomial(yt) y, e, h,
Active Objects
Action Host Context Type Entity Empty
The story for P(f,q,t,y,w,s,t,a,e,s,n,c | a,b,g,h,,e) For each action/type pair, {a, t} fa ~ Dirichlet(b) wa ~ Dirichlet() sa,t ~ Beta(e) For each type t tt ~ Dirichlet(g) yt ~ Dirichlet(h) For each query q q ~ Dirichlet(a) type t ~ Multinomial(q) action a ~ Multinomial(tt) entity e ~ Multinomial(yt) switch s1 ~ Bernou (sa) switch s2 ~ Bernoulli(sa,t) if (s1) ngram n1 ~ Multinomial(fa) if (s2) ngram n2 ~ Multinomial(fa) click c ~ Multinomial(wa) The story for P(f,q,t,y,w,s,t,a,e,s,n,c | a,b,g,h,,e) For each action/type pair, {a, t} fa ~ Dirichlet(b) wa ~ Dirichlet() sa ~ Beta(e) switch s1 ~ Bernoulli(sa) switch s2 ~ Bernoulli(sa) if (s1) if (s2) s,
a a q q Q Q Model 1.06b Model 1.06b t t n 2 2 g g t t T T a a e w w K K c b b f f K K h h y y T T e e s s K K s
Active Objects 66 66
Action (hidden) Clicked Host (observed) Query Context (observed) Type (observed) P(Action|Type) P(Host|Action) P(Context|Action) A new query comes in: (e.g., “New York City hotels”) Entity Recognition(query) → entity (“New York City”) entity → types (“city”, “employer”, “travel destination”) (query, entity) → context (“Ø”, “hotels”) Historical Data(query) → distribution over hosts
each action cluster.
best hosts (“travel.bing.com”) query: jetbeam rrt-0
Active Objects 69
Active Objects 70
Web Trigrams Pattern Match “want to (x)” “have to (x)” “you can (x)” “I can (x)” Filter Adverbs (e.g., “honestly”, “quickly”) Filter noise (the 25% with lowest
𝑔𝑠𝑓𝑟𝑣𝑓𝑜𝑑𝑧 𝑣𝑜𝑗𝑠𝑏𝑛 𝑑𝑝𝑣𝑜𝑢
e.g., “a”, “boy”) 13,417 action words make download find torrent say eBay Pay login Buy podcast help … … Method scales to longer actions, e.g., 4-grams for 2-word actions (“read review”) Finds modern/web actions that older annotated corpora might miss. ↑ ↑
Active Objects 71
Web Action ngram pattern “(x) at (y)” where y has the form of a web site URL “buy at Amazon.com” “download at cnet.com” 13,417 action words make download find torrent say eBay Pay login Buy podcast help … … 1,279 web actions buy review shop unsubscribe book download …
Active Objects 73
19, Read biography of 19, See pictures of 19, Read blog of 19, Contact 19, Read interview with 19, Watch video of 12, Download 12, Find reviews of 12, Update 12, Get help for 24, Apply for jobs at 24, View career options at 24, View map of 24, Read news about 24, Find locations of 24, Find address of 24, Get stock quote of
From each model, we first automatically generate:
Use as training data
Active Objects 79
a a q q n1 n2 Q Q b b f f K K Model 1.02 Model 1.02 a aModel 2 Model 3 Model 4 Model 5 Model 6 +clicked host +entity types +entity +empty switch action → context
a a q q Q Q b b f f K K Model 1.13 Model 1.13 n 2 2 a a w w K K c a a q q Q Q b b f f K K Model 1.14 Model 1.14 t t n 2 2 g g t t T T a a w w K K c a a q q Q Q b b f f K K Model 1.05b Model 1.05b t t n 2 2 g g t t T T a a h h y y T T e w w K K c a a q q Q Q Model 1.06b Model 1.06b t t n 2 2 g g t t T T a a e w w K K c s e e s s K K b b f f K K h h y y T T21 types 58,123 hosts 2,164,579 (query, host) pairs
129,088 contexts 235,385 entities Data + Filter out Navigational Queries 50 clusters, 2-step learning over 100 total EM iterations, 2 folds per model
Active Objects 82
Combined set of actions over all models x 7 UHRS Fair κ agreement
Active Objects 87
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 HEAD TAIL Type-Balanced nDCG
nDCG vs. Query Sets
(with 95% confidence bounds) Model 2 Model 3 Model 4 Model 5 Model 6
Tail is dominated by “People” type
+ click + type + entity + switch
Active Objects 89
0% 100% 200% 300% 400% 500% 600% 700% 800% 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Total P(action | type) Cluster Rank Model 4 0% 100% 200% 300% 400% 500% 600% 700% 800% 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Total P(action | type) Cluster Rank Model 4 Model 5 0% 100% 200% 300% 400% 500% 600% 700% 800% 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Total P(action | type) Cluster Rank Model 4 Model 5 Model 6
Active Objects 92
Query: Webster University Entity: Webster University Context: (“”, “”) Types: /business/employer, education/university, /location/location
Model 2 (context) Model 3 (+click) Model 4 (+type) Model 5 (+entity) Model 6 (+switch)
pictures of
about
teams of
teams of
about
Models 4, 5, 6 automatically generate reasonable actions for this query
Model 2 (context) Model 3 (+click) Model 4 (+type) Model 5 (+entity) Model 6 (+switch)
pictures of
about
teams of
teams of
about
+ User Model?
Active Objects 93
Query: download Skype Entity: Skype Context: (“download”, “”) Types: /computer/software,/business/employer,/business/business_operation
Model 2 (context) Model 3 (+click) Model 4 (+type) Model 5 (+entity) Model 6 (+switch)
gossip about
with
with
at
networks
Again, we can now automatically generate reasonable actions for queries!
144
t t Q Q t t n 2 2 q q T T a a e w w K K c s s s f f K K y y T T K K
Active Objects
Active Objects (Model IM) Active Objects (Model IM) Q Q n 2 2 q q T T a a e w w K K c s s s f f K K y y T T K K
– Given admissible types from a KB such as Freebase, learn their priors and contextual disambiguation – Given a new term, induce the types of the term – Automatically induce type list and admissible types for arbitrary entities
Active Objects 145
t t t t
Active Objects 146
Generative process for entity bearing queries. For each query q entity e ~ Multinomial(y) type t ~ Multinomial(te) ngram n1 ~ Multinomial(ft) ngram n2 ~ Multinomial(ft)
Q Q Guo’09 Guo’09 y y e t t t t E E n 2 2 f f T T
Active Objects 147
y y Q Q t t n 2 2 e f f T T t t E E Guo’09 Guo’09
y y Q Q t t n 2 2 e f f T T t t E E Model M0 Model M0 s s s T T
Active Objects 148
y y Q Q t t n 2 2 e f f T T t t E E Guo’09 Guo’09
w w T T c s s s T T y y Q Q t t n 2 2 e f f T T t t E E Model M1 Model M1
Active Objects 149
y y Q Q t t n 2 2 e f f T T t t E E Guo’09 Guo’09
Model IM Model IM t t Q Q t t n 2 2 e f f K K y y T T q q T T a a w w K K c s s s K K
– Queries from 3 months of US Bing search usage logs – Entities from 73 Freebase types, accounting for 50% query traffic in US market – Model parameters trained using 2-step learning over 100 EM iterations, 2 folds per model
– Query-weighted random sample of 500 HEAD and 500 TAIL entity- bearing queries – 7 paid independent annotators identified all applicable Freebase types to the entities in the queries
Active Objects 150
Active Objects 151
column.
Guo’09, on HEAD.
– Biggest gains in first position of its ranking (Prec@1 metric).
Active Objects 153
0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 M0 M1 IM
Relative gain of switch vs. no switch
Effect of Empty Switch Parameter (s) on HEAD
No switch nDCG MAP MAPW Prec@1
– TAIL is skewed towards the PEOPLE types – Latent actions are over-expressive and they do not help in differentiating PEOPLE types
types have all their mass distributed to three generic and common intents (see pictures of, find biographical information about, and see video of)
– “ymca” -> {song, place, educational_institution}
0.63, 0.29, 0.08
– q1 = “jamestown ymca ny” IM correctly classified “ymca” as a place – q2 = “ymca palomar” IM correctly classified “ymca” as a educational_institution
Active Objects 155
Active Objects 157
Active Objects 159
related strings
through search
Active Objects 160
2009 2010
Active Objects 161
2009 2010
Active Objects 162
2009 2010
Active Objects 163
leverage strongly-typed identifier
2009 2010
Active Objects 164
leverage strongly-typed identifier
2009 2010
Add to Amazon cart Connect on Twitter Play track
Active Objects 165
2009 2010