Question Answering on Web Data Silei Xu CS294S April 9, 2020 Joint - - PowerPoint PPT Presentation

question answering on web data
SMART_READER_LITE
LIVE PREVIEW

Question Answering on Web Data Silei Xu CS294S April 9, 2020 Joint - - PowerPoint PPT Presentation

Question Answering on Web Data Silei Xu CS294S April 9, 2020 Joint work with Giovanni Campagna, Sina Semnani, Jian Li, and Monica S. Lam Commercial Assistants Alexa User hand-codes question/code 1 by 1 get me an upscale restaurants What are


slide-1
SLIDE 1

Question Answering on Web Data

Silei Xu CS294S April 9, 2020

Joint work with Giovanni Campagna, Sina Semnani, Jian Li, and Monica S. Lam

slide-2
SLIDE 2

Commercial Assistants

Alexa User hand-codes question/code 1 by 1

get me an upscale restaurants What are the restaurants around here? What is the best restaurant? search for Chinese restaurants

slide-3
SLIDE 3

Commercial Assistants

Alexa User hand-codes question/code 1 by 1

get me an upscale restaurants What are the restaurants around here? What is the best restaurant? search for Chinese restaurants

100K Alexa skills Sep 2019

slide-4
SLIDE 4

Commercial Assistants

get me an upscale restaurants What are the restaurants around here? What is the best restaurant? search for Chinese restaurants

100K Alexa skills Sep 2019

Alexa User hand-codes question/code 1 by 1

1.8 billion websites

slide-5
SLIDE 5

Genie: Synthesize Question/Code from a Schema

500 Domain- Independent Templates

What is the <prop> of <subject>? What is the <subject>’s <prop>?

Schema

Property Annotations Name Price Cuisine …

User Genie

slide-6
SLIDE 6

Genie: Synthesize Question/Code from a Schema

Find me the best restaurant with 500 or more reviews I’m looking for an Italian fine dining restaurant. What is the phone number of Wendy’s? Are there any restaurant with at least 4.5 stars? Show me a cheap restaurant with 5-star review. What is the best non-Chinese restaurant near here? Find restaurants that serve Chinese or Japanese food Give me the best Italian restaurant. What is the best restaurant within 10 miles? Show me some restaurant with less than 10 reviews get me an upscale restaurants What are the restaurants around here? What is the best restaurant? search for Chinese restaurants

500 Domain- Independent Templates

What is the <prop> of <subject>? What is the <subject>’s <prop>?

Schema

Property Annotations Name Price Cuisine …

User Genie

slide-7
SLIDE 7

The Web Has a Schema!

slide-8
SLIDE 8

The Web Has a Schema!

  • Schema.org
  • Structure data to mark up web pages
  • Mainly used by search engines
  • It covers many domains, including

restaurants, hotels, people, recipes, products, news …

slide-9
SLIDE 9

The Web Has a Schema!

  • Schema.org
  • Structure data to mark up web pages
  • Mainly used by search engines
  • It covers many domains, including

restaurants, hotels, people, recipes, products, news …

<script type="application/ld+json"> { @type: "restaurant", name: "The French Laundry", servesCuisine: “French", aggregateRating: { @type: "AggregateRating", reviewCount: 2527, ratingValue: 4.5 } ... } Schema.org markup on Yelp

slide-10
SLIDE 10

The Web Has a Schema!

  • Schema.org
  • Structure data to mark up web pages
  • Mainly used by search engines
  • It covers many domains, including

restaurants, hotels, people, recipes, products, news …

<script type="application/ld+json"> { @type: "restaurant", name: "The French Laundry", servesCuisine: “French", aggregateRating: { @type: "AggregateRating", reviewCount: 2527, ratingValue: 4.5 } ... } Schema.org markup on Yelp

40% of the websites use it!

slide-11
SLIDE 11

Outline

  • Introduction to Schema.org
  • Represent Questions in ThingTalk
  • LUINet: NL to ThingTalk
  • Training data generation
  • Experimental results
  • Work in progress: automate everything!
slide-12
SLIDE 12

Introduction to Schema.org

slide-13
SLIDE 13

Graph Data Model of Schema.org

slide-14
SLIDE 14

Graph Data Model of Schema.org

Organization legalName: Text slogan: Text aggregateRating: AggregateRating ...

slide-15
SLIDE 15

Graph Data Model of Schema.org

Organization legalName: Text slogan: Text aggregateRating: AggregateRating ...

class

slide-16
SLIDE 16

Graph Data Model of Schema.org

Organization legalName: Text slogan: Text aggregateRating: AggregateRating ...

class properties

slide-17
SLIDE 17

Graph Data Model of Schema.org

Organization legalName: Text slogan: Text aggregateRating: AggregateRating ...

class properties types – primitive or class

slide-18
SLIDE 18

Graph Data Model of Schema.org

Organization legalName: Text slogan: Text aggregateRating: AggregateRating ... AggregateRating ratingCount: Integer ratingValue: Integer ...

slide-19
SLIDE 19

Schema.org Hierarchy

Organization (Thing) legalName: Text slogan: Text aggregateRating: AggregateRating ... AggregateRating ratingCount: Integer ratingValue: Integer ... Thing name: Text url: URL ...

slide-20
SLIDE 20

Schema.org Hierarchy

Organization (Thing) legalName: Text slogan: Text aggregateRating: AggregateRating ... AggregateRating ratingCount: Integer ratingValue: Integer ... Thing name: Text url: URL ... LocalBusiness (Place, Organization)

  • peningHours: Text

priceRange: Text ...

slide-21
SLIDE 21
  • Google Structured Data Testing Tool
  • Show schema.org markups in a web page
  • Google Custom Search
  • Search for pages that contain certain schema.org domains

Some useful tools

slide-22
SLIDE 22

ThingTalk for Questions

slide-23
SLIDE 23

ThingTalk for QA

slide-24
SLIDE 24

ThingTalk for QA

Show me restaurants in Stanford

@QA.restaurant(), geo == makeLocation(“Stanford”) now => => notify

slide-25
SLIDE 25

ThingTalk for QA

Show me Chinese restaurants in Stanford

@QA.restaurant(), geo == makeLocation(“Stanford”) && servesCuisine =~ “Chinese” now => => notify

slide-26
SLIDE 26

ThingTalk for QA

Show me Chinese restaurants in Stanford

@QA.restaurant(), geo == makeLocation(“Stanford”) && servesCuisine =~ “Chinese” now => => notify

slide-27
SLIDE 27

ThingTalk for QA

Show me top-rated Chinese restaurants in Stanford

sort aggregateRating.ratingValue desc of ( @QA.restaurant(), geo == makeLocation(“Stanford”) && servesCuisine =~ “Chinese” ) now => => notify

slide-28
SLIDE 28

ThingTalk for QA

Show me top-rated Chinese restaurants in Stanford

now => => notify sort aggregateRating.ratingValue desc of ( @QA.restaurant(), geo == makeLocation(“Stanford”) && servesCuisine =~ “Chinese” )

slide-29
SLIDE 29

ThingTalk for QA

Show me top-rated Chinese restaurants in Stanford reviewed by Bob

now => => notify sort aggregateRating.ratingValue desc of ( @QA.restaurant(), geo == makeLocation(“Stanford”) && servesCuisine =~ “Chinese” ) join ( @QA.Review(), in_array(id, review) && author = “bob” )

slide-30
SLIDE 30

ThingTalk for QA

slide-31
SLIDE 31

ThingTalk for QA

slide-32
SLIDE 32

ThingTalk for QA

slide-33
SLIDE 33

LUINet: NL to ThingTalk

slide-34
SLIDE 34

Natural Language Programming

Natural language ThingTalk

LUINet

What is the top-rated Chinese restaurant in Palo Alto?

sort aggregateRating.ratingValue desc of ( @QA.restaurant(), geo == new MakeLocation(“Stanford”) && servesCuisine =~ “Chinese” )

slide-35
SLIDE 35

Genie Pipeline

Natural language ThingTalk

LUINet

slide-36
SLIDE 36

Natural Language Annotations

Genie Pipeline

Natural language ThingTalk

LUINet

Schema

Name Price Cuisine …

cuisine of the restaurant restaurant’s cuisine cuisine served by the restaurant

Thingpedia Manifest

slide-37
SLIDE 37

Natural Language Annotations

Genie Pipeline

Natural language ThingTalk

LUINet

ThingTalk Grammar

What is the <prop> of <table>? What is the <table>’s <prop>?

Schema

Name Price Cuisine …

cuisine of the restaurant restaurant’s cuisine cuisine served by the restaurant

Domain-independent Templates Thingpedia Manifest

slide-38
SLIDE 38

Natural Language Annotations

Genie Pipeline

Natural language ThingTalk

LUINet

ThingTalk Grammar

What is the <prop> of <table>? What is the <table>’s <prop>?

Schema

Name Price Cuisine …

cuisine of the restaurant restaurant’s cuisine cuisine served by the restaurant

Synthesize sentence/code pairs

Domain-independent Templates Thingpedia Manifest

slide-39
SLIDE 39

Natural Language Annotations

Genie Pipeline

Natural language ThingTalk

LUINet

ThingTalk Grammar

What is the <prop> of <table>? What is the <table>’s <prop>?

Schema

Name Price Cuisine …

cuisine of the restaurant restaurant’s cuisine cuisine served by the restaurant

Synthesize sentence/code pairs Paraphrase

Domain-independent Templates Thingpedia Manifest

slide-40
SLIDE 40

Natural Language Annotations

Genie Pipeline

Natural language ThingTalk

LUINet

ThingTalk Grammar

What is the <prop> of <table>? What is the <table>’s <prop>?

Schema

Name Price Cuisine …

cuisine of the restaurant restaurant’s cuisine cuisine served by the restaurant

Parameter & data augmentation Synthesize sentence/code pairs Paraphrase

Domain-independent Templates Thingpedia Manifest

slide-41
SLIDE 41

Natural Language Annotations

Genie Pipeline

Natural language ThingTalk

LUINet

ThingTalk Grammar

What is the <prop> of <table>? What is the <table>’s <prop>?

Schema

Name Price Cuisine …

cuisine of the restaurant restaurant’s cuisine cuisine served by the restaurant

Training Data

Parameter & data augmentation Synthesize sentence/code pairs Paraphrase

Domain-independent Templates Thingpedia Manifest

slide-42
SLIDE 42

Natural Language Annotations

Genie Pipeline

Natural language ThingTalk

LUINet

ThingTalk Grammar

What is the <prop> of <table>? What is the <table>’s <prop>?

Schema

Name Price Cuisine …

cuisine of the restaurant restaurant’s cuisine cuisine served by the restaurant

Training Data

Parameter & data augmentation Synthesize sentence/code pairs Paraphrase

Domain-independent Templates Thingpedia Manifest Evaluation & Test: Real User Input

slide-43
SLIDE 43

Automatically Turn Schema.org into Thingpedia Manifest

  • Tables with hierarchy
  • properties are inherited from

parent tables

  • only keep classes & properties

with data

  • decide types based on

schema.org types and data

@org.schema { Restaurant extends FoodEstablishment {} FoodEstablishment extends LocalBusiness { acceptsReservation: Boolean, servesCuisine: String, ... } LocalBusiness extends Place, Organizations { priceRange: String,

  • peningHours: String, ...

} Organizations extends Thing { aggregateRating: { ratingCount: Number, ratingValue: Number, }, review: Array(Review), } Thing { name: String, ... } }

slide-44
SLIDE 44

Map properties to natural language

slide-45
SLIDE 45

Map properties to natural language

  • Long, non-word, property names
  • E.g., ratingValue, servesCuisine
  • Variety in natural language usage
slide-46
SLIDE 46

Map properties to natural language

  • Long, non-word, property names
  • E.g., ratingValue, servesCuisine
  • Variety in natural language usage

servesCuisine ratingValue Chinese restaurant ✓ 4.5 restaurant ✘ Restaurant with Chinese cuisine ✓ Restaurant with 4.5 rating ✓ Restaurant served Chinese cuisine ✘ Restaurant rated 4.5 ✓ Restaurant that serves Chinese cuisine ✓ Restaurant rates 4.5 ✘ Restaurant with Chinese ✘ Restaurant with 4.5 ✘ … …

slide-47
SLIDE 47

NL Annotations by Part-Of-Speech Categories

  • “servesCuisine”
  • Noun phrase
  • “cuisine”: e.g., “the cuisine of the restaurant”, “restaurants with

Chinese cuisine”

  • Verb phrase
  • “serves # cuisine”, “serves #” : e.g., “restaurant that serves Chinese

cuisine”, “what does the restaurant serve”

  • Adjective-phrase value (with no property name)
  • E.g., “Chinese restaurants”
slide-48
SLIDE 48

NL Annotation Generation

servesCuisine

slide-49
SLIDE 49

NL Annotation Generation

serves cuisine servesCuisine

slide-50
SLIDE 50

NL Annotation Generation

serves cuisine

VBP NN

servesCuisine

slide-51
SLIDE 51

NL Annotation Generation

serves cuisine

VBP NN

servesCuisine servesCuisine: Verb: “serves # cuisine” Noun: “# cuisine”

NL Annotations

slide-52
SLIDE 52
  • Automatic: Heuristics based on POS (Part-Of-

Speech) tags

  • Manual:
  • Provides additional synonyms, and annotations in

different POS categories

NL Annotation Generation

serves cuisine

VBP NN

servesCuisine servesCuisine: Verb: “serves # cuisine” Noun: “# cuisine”

NL Annotations

slide-53
SLIDE 53
  • Automatic: Heuristics based on POS (Part-Of-

Speech) tags

  • Manual:
  • Provides additional synonyms, and annotations in

different POS categories

NL Annotation Generation

serves cuisine

VBP NN

servesCuisine servesCuisine: Verb: “serves # cuisine” Noun: “# cuisine”

NL Annotations

servesCuisine: Verb: “serves # cuisine” “offers # food” Noun: “# cuisine” “# food” Adjective: “#”

slide-54
SLIDE 54

Domain-independent Templates

servesCuisine: Verb: “serves # cuisine” “offers # food” Noun: “# cuisine” “# food” Adjective: “#”

slide-55
SLIDE 55

Domain-independent Templates

servesCuisine: Verb: “serves # cuisine” “offers # food” Noun: “# cuisine” “# food” Adjective: “#” Show me <table> that <verb>. Show me <table> with <noun>. Show me <adjective> <table>.

slide-56
SLIDE 56

Domain-independent Templates

servesCuisine: Verb: “serves # cuisine” “offers # food” Noun: “# cuisine” “# food” Adjective: “#” now => @QA.restaurant(), servesCuisine =~ “Chinese” => notify; Show me <table> that <verb>. Show me <table> with <noun>. Show me <adjective> <table>. Show me restaurants that serve Chinese cuisine. Show me restaurants with Chinese food. Show me Chinese restaurants.

slide-57
SLIDE 57

Domain-independent Templates

servesCuisine: Verb: “serves # cuisine” “offers # food” Noun: “# cuisine” “# food” Adjective: “#” now => @QA.restaurant(), servesCuisine =~ “Chinese” => notify; Show me <table> that <verb>. Show me <table> with <noun>. Show me <adjective> <table>. Show me restaurants that serve Chinese cuisine. Show me restaurants with Chinese food. Show me Chinese restaurants. Show me <table> with <noun:NUMBER> greater than <value>. Show me restaurants with rating greater than 4

slide-58
SLIDE 58

Domain-independent Templates

servesCuisine: Verb: “serves # cuisine” “offers # food” Noun: “# cuisine” “# food” Adjective: “#” now => @QA.restaurant(), servesCuisine =~ “Chinese” => notify; Show me <table> that <verb>. Show me <table> with <noun>. Show me <adjective> <table>. Show me restaurants that serve Chinese cuisine. Show me restaurants with Chinese food. Show me Chinese restaurants. Show me <table> with <noun:NUMBER> greater than <value>. Show me <table> with <noun:MEASURE(m)> longer than <value>. Show me restaurants with rating greater than 4 Show me surfboard with length longer than 3m

slide-59
SLIDE 59

Domain-dependent Templates

slide-60
SLIDE 60

Domain-dependent Templates

  • Some natural sentences cannot be generated by domain-independent templates:
  • “the top-rated restaurant”, “the best restaurant”
  • We allow developers to improve the accuracy by providing domain-dependent

templates

slide-61
SLIDE 61

Domain-dependent Templates

  • Some natural sentences cannot be generated by domain-independent templates:
  • “the top-rated restaurant”, “the best restaurant”
  • We allow developers to improve the accuracy by providing domain-dependent

templates

ThingTalk Sentence by domain-independent templates

sort aggregateRating.ratingValue desc

  • f @QA.restaurant()

restaurant with the highest rating restaurant that have the highest rating …

slide-62
SLIDE 62

Domain-dependent Templates

  • Some natural sentences cannot be generated by domain-independent templates:
  • “the top-rated restaurant”, “the best restaurant”
  • We allow developers to improve the accuracy by providing domain-dependent

templates

ThingTalk Sentence by domain-independent templates

sort aggregateRating.ratingValue desc

  • f @QA.restaurant()

restaurant with the highest rating restaurant that have the highest rating … ThingTalk Domain-dependent templates

sort aggregateRating.ratingValue desc

  • f @QA.restaurant()

the top-rated restaurant the best restaurant …

slide-63
SLIDE 63

Experiments

slide-64
SLIDE 64

Experimental Results

  • Domains
  • Restaurants: data from Yelp
  • Person: data from LinkedIn
  • Training set
  • Realistic evaluation set

Restaurant Person Synthetic 1,294,278 553,067 Paraphrase 6,288 6,000 Total (augmented) 1,809,109 930,564 Restaurant Person Dev 1 property 134 6 2 properties 47 144 3+ properties 59 Total 240 160 Test 1 property 96 127 2 properties 79 106 3+ properties 40 Total 215 233

slide-65
SLIDE 65

Comparison with Commercial Virtual Assistants

0% 10% 20% 30% 40% 50% 60% 70% 80% Alexa Google Siri Almond

Answer Accuracy on Restaurant Queries

slide-66
SLIDE 66

Comparison with Commercial Virtual Assistants

0% 10% 20% 30% 40% 50% 60% 70% 80% Alexa Google Siri Almond

Answer Accuracy on Restaurant Queries

Trained with no real data!

slide-67
SLIDE 67

Experimental Results

0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 1 property 2 properties 3+ properties Overall Restaurant Person

slide-68
SLIDE 68

Can We Do Better?

slide-69
SLIDE 69

Manual Effort in Genie Pipeline

slide-70
SLIDE 70

Manual Effort in Genie Pipeline

  • Natural language annotations
  • The heuristics based on part-of-speech doesn’t provide good variety, and

sometimes unnatural

  • Paraphrase
  • We ask crowdworkers to manually paraphrase synthetic sentences
  • We can only do this for a small sample of synthetic because of cost
  • Can we replace them with something automatic?
slide-71
SLIDE 71

Automatic NL Annotation Generation

  • Generate context-aware synonyms by a language model
slide-72
SLIDE 72

Automatic NL Annotation Generation

Show me restaurants with Italian cuisine.

A Sample Sentence Automatically Constructed based on POS

  • Generate context-aware synonyms by a language model
slide-73
SLIDE 73

Automatic NL Annotation Generation

Show me restaurants with Italian cuisine. BERT (pretrained)

A Sample Sentence Automatically Constructed based on POS

  • Generate context-aware synonyms by a language model
slide-74
SLIDE 74

Automatic NL Annotation Generation

Show me restaurants with Italian cuisine. BERT (pretrained)

A Sample Sentence Automatically Constructed based on POS Generate Context-aware Synonyms

Show me restaurants with Italian dishes. Show me restaurants with Italian food. Show me restaurants with Italian menu. …

  • Generate context-aware synonyms by a language model
slide-75
SLIDE 75

Automatic NL Annotation Generation

Show me restaurants with Italian cuisine. noun: “# cuisine | dishes | menu … ” BERT (pretrained)

A Sample Sentence Automatically Constructed based on POS Generate Context-aware Synonyms Templatize

Show me restaurants with Italian dishes. Show me restaurants with Italian food. Show me restaurants with Italian menu. …

  • Generate context-aware synonyms by a language model
slide-76
SLIDE 76

Automatic NL Annotation Generation (cont.)

  • Predict adjective properties by a language model
slide-77
SLIDE 77

Automatic NL Annotation Generation (cont.)

Show me a [MASK] restaurant.

Construct a sample sentence with mask

  • Predict adjective properties by a language model
slide-78
SLIDE 78

Automatic NL Annotation Generation (cont.)

Show me a [MASK] restaurant. BERT (pretrained)

Construct a sample sentence with mask

Show me a good restaurant. Show me a Chinese restaurant. …

Predict [MASK]

  • Predict adjective properties by a language model
slide-79
SLIDE 79

Automatic NL Annotation Generation (cont.)

Show me a [MASK] restaurant. BERT (pretrained)

Construct a sample sentence with mask

Show me a good restaurant. Show me a Chinese restaurant. … Look up predicted words in property value sets

Predict [MASK]

  • Predict adjective properties by a language model
slide-80
SLIDE 80

Automatic NL Annotation Generation (cont.)

Show me a [MASK] restaurant. servesCuisine – adjective: “#” … BERT (pretrained)

Construct a sample sentence with mask

Show me a good restaurant. Show me a Chinese restaurant. … Look up predicted words in property value sets

Predict [MASK] Add adjective annotation to found properties

  • Predict adjective properties by a language model
slide-81
SLIDE 81

Preliminary Experimental Result

0.00% 10.00% 20.00% 30.00% 40.00% 50.00% 60.00%

Accuracy on Restaurant Queries

POS-based Heuristics Automatic Manual

Auto NL Annotation Generation

slide-82
SLIDE 82

Automatic Paraphrasing

slide-83
SLIDE 83

Automatic Paraphrasing

GPT-2 (Pretrained) Paraphrase dataset

slide-84
SLIDE 84

Automatic Paraphrasing

GPT-2 (Pretrained) GPT-2 Paraphraser Paraphrase dataset

Fine- tune

slide-85
SLIDE 85

Automatic Paraphrasing

Show me restaurants with Chinese cuisine. GPT-2 (Pretrained)

Synthetic Training Examples

GPT-2 Paraphraser Paraphrase dataset What is a restaurant that is Chinese? Give me Chinese dining places. Show me top-rated Chinese restaurants. …

Fine- tune

slide-86
SLIDE 86

Automatic Paraphrasing

Show me restaurants with Chinese cuisine. GPT-2 (Pretrained)

Synthetic Training Examples

GPT-2 Paraphraser Paraphrase dataset LUINet Trained w/ Synthetic data Filter paraphrases that do not preserve meaning What is a restaurant that is Chinese? Give me Chinese dining places. Show me top-rated Chinese restaurants. …

Fine- tune

Inference

slide-87
SLIDE 87

Automatic Paraphrasing

Show me restaurants with Chinese cuisine. GPT-2 (Pretrained)

Synthetic Training Examples

GPT-2 Paraphraser Paraphrase dataset LUINet Trained w/ Synthetic data Filter paraphrases that do not preserve meaning What is a restaurant that is Chinese? Give me Chinese dining places. …

Paraphrased Examples

What is a restaurant that is Chinese? Give me Chinese dining places. Show me top-rated Chinese restaurants. …

Fine- tune

Inference

slide-88
SLIDE 88

Preliminary Experimental Result

0.00% 10.00% 20.00% 30.00% 40.00% 50.00% 60.00% 70.00% 80.00%

Accuracy on Restaurant Queries

Synthetic only Auto Paraphrase Humann Paraphrase

Auto Paraphrasing

slide-89
SLIDE 89

Thank you!

Hope you will enjoy your homework ☺