DEVICES Geoffrey Zweig Outline What is Mobile Voice search? An - - PowerPoint PPT Presentation

devices
SMART_READER_LITE
LIVE PREVIEW

DEVICES Geoffrey Zweig Outline What is Mobile Voice search? An - - PowerPoint PPT Presentation

Microsoft Research ---- Lang Tech 2008 VOICE SEARCH ON MOBILE DEVICES Geoffrey Zweig Outline What is Mobile Voice search? An example: Live Search for Windows Mobile Why is it important? The Competitive Landscape Basic


slide-1
SLIDE 1

VOICE SEARCH ON MOBILE DEVICES

Geoffrey Zweig

Microsoft Research ---- Lang Tech 2008

slide-2
SLIDE 2

Outline

 What is Mobile Voice search?

 An example: Live Search for Windows Mobile

 Why is it important?  The Competitive Landscape  Basic Technology  Advancing the State-of-the-Art  Next generation Applications

Microsoft Research ---- Lang Tech 2008

slide-3
SLIDE 3

What is Mobile Voice Search?

Microsoft Research ---- Lang Tech 2008

 Getting information when you are on-the-go  Business-information

 Phone numbers  Addresses  Ratings  Hours

 Maps & Directions  Entertainment

 Movie showtimes  Restaurant recommendations

slide-4
SLIDE 4

Live Search for Windows Mobile

Microsoft Research ---- Lang Tech 2008

slide-5
SLIDE 5

Asking for Seattle

Microsoft Research ---- Lang Tech 2008

slide-6
SLIDE 6

Confirming the Location

Microsoft Research ---- Lang Tech 2008

slide-7
SLIDE 7

Now we’re in Seattle

Microsoft Research ---- Lang Tech 2008

slide-8
SLIDE 8

Asking for Vietnamese Restaurants

Microsoft Research ---- Lang Tech 2008

slide-9
SLIDE 9

Finding a Vietnamese Restaurant

Microsoft Research ---- Lang Tech 2008

slide-10
SLIDE 10

The Details

Microsoft Research ---- Lang Tech 2008

slide-11
SLIDE 11

Let’s Get Directions

Microsoft Research ---- Lang Tech 2008

slide-12
SLIDE 12

Starting from 8350 159th PL NE Remond, WA

Microsoft Research ---- Lang Tech 2008

slide-13
SLIDE 13

Specifying a Starting Point

Microsoft Research ---- Lang Tech 2008

slide-14
SLIDE 14

And Now we can Go!

Microsoft Research ---- Lang Tech 2008

slide-15
SLIDE 15

You can even check the traffic

Microsoft Research ---- Lang Tech 2008

slide-16
SLIDE 16

What People Ask For – By Type

Microsoft Research ---- Lang Tech 2008

Type of Request

Business City-State-Zip Address Compound

slide-17
SLIDE 17

Frequent Requests

Microsoft Research ---- Lang Tech 2008

Businesses Cities Pizza (1.5%) Dallax TX (0.80%) Best Buy Seattle WA Starbucks Chicago IL Movies Redmond WA McDonald’s Los Angeles CA Wal-Mart Orlando FL Mexican Restaurant Miami FL Pizza Hut Bellevue WA Target San Diego CA Restaurants (0.73%) New York, NY (0.47%) Perplexity = 8514 Perplexity = 4741

slide-18
SLIDE 18

Outline

 What is Mobile Voice search?

 An example: Live Search for Windows Mobile

 Why is it important?  The Competitive Landscape  Basic Technology  Advancing the State-of-the-Art  Next generation Applications

Microsoft Research ---- Lang Tech 2008

slide-19
SLIDE 19

Skyrocketing Cellphone Use

Microsoft Research ---- Lang Tech 2008

100 200 300 400 500 600 1990 1995 2000 2005 2010 PCs in Use per 1000 People Internet Users per 1000 Cellphone Users per 1000 Computer Industry Almanac

slide-20
SLIDE 20

It’s a Global Market

Microsoft Research ---- Lang Tech 2008

Number of Cellphones: ~2.2B in 2005

EU China US Russia Japan Brazil India UK Pakistan Mexico Indonesia Turkey Rest of World Infoplease.com

slide-21
SLIDE 21

Potentially Big Revenues

Microsoft Research ---- Lang Tech 2008

Will mobile search be like internet search?

slide-22
SLIDE 22

Monetization

Microsoft Research ---- Lang Tech 2008

 Free 411 services create modest revenue streams  But multimodal has advantages:

 You are looking at a screen  You can be sms’d and that sticks around  Voice provides demographic clues not present in web search –

gender, race, age, education

 Many possibilities

 Standard search-specific advertising

 You say “Zales Jewelers” system suggests “Tiffany’s”  Demographically targeted ads  Men get different results from women

 Batched ads sent to email account provided at registration

slide-23
SLIDE 23

Outline

 What is Mobile Voice search?

 An example: Live Search for Windows Mobile

 Why is it important?  The Competitive Landscape  Basic Technology  State-of-the-Art  Next generation Applications

Microsoft Research ---- Lang Tech 2008

slide-24
SLIDE 24

Competitive Landscape: Basic Search

Microsoft Research ---- Lang Tech 2008

 Live Search for Windows Mobile  http://wls.live.com from your phone  Businesses, directions, maps, traffic, movies, gas  Windows Mobile phones  Tellme by Mobile  http://www.tellme.com/products/TellmeByMobile  Businesses, directions, maps  Java phones  V-enable  http://www.v-enable.com/directory_assistance.html  Businesses, directions, maps, weather  Demo only – not currently available

slide-25
SLIDE 25

Competitive Landscape: Beyond Search

Microsoft Research ---- Lang Tech 2008

 Vlingo

 http://vlingo.com/  Businesses, directions, maps, music downloads  sms by voice  Java phones

 Nuance Voice Control

 http://www.nuance.com/voicecontrol/  Businesses, directions, maps, weather, stocks, sports, movies,

web search

 Send emails, update calendar, go to web pages  Blackberry, Treo, Windows Mobile phones

slide-26
SLIDE 26

Outline

 What is Mobile Voice search?

 An example: Live Search for Windows Mobile

 Why is it important? -- Trends in Cellphone use  The Competitive Landscape  Basic Technology  Advancing the State-of-the-Art  Next generation Applications

Microsoft Research ---- Lang Tech 2008

slide-27
SLIDE 27

Client-Server Architecture

Microsoft Research ---- Lang Tech 2008

slide-28
SLIDE 28

Typical Grammar Setup

Microsoft Research ---- Lang Tech 2008

Local 1 Local 2 Local 600

National

Local 1 Local 2 Local 600

National

Business Address

City-State-Zip

n-gram LMs n-gram LMs Enumerated grammar

slide-29
SLIDE 29

Sample Performance Levels

Microsoft Research ---- Lang Tech 2008

1-best N-best N-best depth Inter- annotator agreement Overall 42% 47 3.6 67%

slide-30
SLIDE 30

Outline

 What is Mobile Voice search?

 An example: Live Search for Windows Mobile

 Why is it important? -- Trends in Cellphone use  The Competitive Landscape  Basic Technology  Advancing the State-of-the-Art  Next generation Applications

Microsoft Research ---- Lang Tech 2008

slide-31
SLIDE 31

Click-Driven Automated Feedback

Microsoft Research ---- Lang Tech 2008

Language Model Acoustic Model Stupid Detector

slide-32
SLIDE 32

Automated Feedback Methods

 Data addition

 What people click on & associated audio  Text searches from web

 Discriminative LM training

 Adjust LM to maximize posterior probability of correct words  Need to know competitors – from nbest lists

 Translation-based data generalization  Maximum likelihood database cleaning

 Learn error model of the mistakes people make when entering data  Recover the likeliest intended entries

 Adaptive N-best postprocessing

 Remove what history shows is obviously stupid  Reorder and augment the rest based on further analysis

 Personalization

 Per-person / user-profile grammars  Per-person speaker-adaptive transforms

Microsoft Research ---- Lang Tech 2008

slide-33
SLIDE 33

Sample Click Data

Microsoft Research ---- Lang Tech 2008

Clicked Competitor McDonald’s Mc Donald Coffee Coffey Mexican Restaurant Mexican Restrant Coffee Copy Mexican Food Mexican Foods Starbucks Star Box Starbucks Starbuck’s Sex 6 Burger King 13 Entries that frequently co-occur

slide-34
SLIDE 34

Discriminative LM Training (Xiao Li)

 Idea

 Increase n-gram probabilities of

the true hypothesis

 Decrease n-gram probabilities

  • f confusable competitors

 The LM is estimated to

maximize p(W|O)

 Leveraging click data

 View clicked item as “truth”  View n-best alternatives as

“competitors”

1.

Maine Home

2.

Maine School

3.

Maine Car

4.

Maine

5.

Maine Heart

6.

Maine Mall

7.

Maine Homes

8.

Mayo

9.

Maine Golf

10.

Maine Home Care

N-best alternatives

Microsoft Research ---- Lang Tech 2008

slide-35
SLIDE 35

Rescoring Results

 Experiments:  Rescore n-best alternatives using the baseline LM and

discriminatively trained LM

 Inspect if the rescored one-best is the user clicked item

One-best Acc Train Set Dev set Test set # utterances 150K 1.3K 1.4K Baseline 71.1% 71.5% 70.5% Discriminative Training

  • 74.8%

72.7% Fraction of time the clicked item is at the top of the n-best.

Microsoft Research ---- Lang Tech 2008

slide-36
SLIDE 36

Translation LM (Xiao Li, ICASSP-08)

 Goal:  “Translate” listing forms to query forms  Use translated query forms to augment the training data for

LM estimation.

 Example

listing Kung Ho Cuisine Of China can have

 “Kung Ho Chinese Restaurant”  “Kung Ho Restaurant”  “Kung Ho”

Microsoft Research ---- Lang Tech 2008

slide-37
SLIDE 37

Recognition Results

 Experiments  Test set: 3K directory-assistance utterances  Different LM training sets:

Sentence accuracy One-best N-best Listings 38.6% 48.3% Listings + transcription 41.5% 51.4% Listings + transcription + translation 43.1% 52.5%

Microsoft Research ---- Lang Tech 2008

slide-38
SLIDE 38

Maximum Likelihood Database Recovery

) ( ) | ( ) ( max arg ) | ( max arg

c i c i w c i w

w P w w P w P w w P

i i

) | ( ) ( max arg

i c i w

w w P w P

i

Wi: intended words (unknown, e.g. “Starbucks” or “Al’s Quick Mart”) Wc: Corrupted words in data (observed, e.g. “Starbuck’s” or “Al’s Kwik Mart”) Want to find the likeliest intended word sequence

LM built on clean data Error model wi wc P(wc|wi) Starbucks Starbuck’s 0.5 Starbucks Starbucks 0.5 Quick Quick 0.3 Quick Kwik 0.3 Quick Quik 0.3

Transductive aparatus used to recover the likeliest words

Microsoft Research ---- Lang Tech 2008

slide-39
SLIDE 39

Maximum Likelihood Database Recovery++ (G. Zweig, ICASSP 2008)

) ( ) , | ( ) , ( max arg ) | , ( max arg

, , c i c i l w c i l w

l P l w l P l w P l l w P

i i

) , | ( ) | ( ) ( max arg

, i c i l w

l w l P w l P w P

i

) | ( ) | ( ) ( max arg

, i c i l w

l l P w l P w P

i

W: intended words (unknown) li: intended letters (unknown) lc: corrupted letters (observed) Want to find the likeliest word and letter sequence underlying the observations

LM built on clean data 1:1 Spelling probabilities Error model for typos

Microsoft Research ---- Lang Tech 2008

slide-40
SLIDE 40

Database Recovery Steps

 Learn error model by aligning letters of click-pairs

 Coffey vs. Coffee  Starbuck’s vs Starbucks

 Learn language model from current version of

database

 Letter-to-word from a list of in-language words  Run database letters through transductive aparatus

to recover words

Microsoft Research ---- Lang Tech 2008

slide-41
SLIDE 41

Feedback-Driven N-best Postprocessing (Dan Bohus)

 Approach

 Click prediction model  Features

 Recognized words  Historical click-through rates  Intra n-best comparisons  User-specific features  Text query log features

Brooks Brothers College Roach Brothers College Rhodes College Rose Rose Cottage Rhodes College Brooks Brothers College Roach Brothers College Rose Rose Cottage

 Preliminary Results

 23% improvement in average

position of clicked item

Microsoft Research ---- Lang Tech 2008

slide-42
SLIDE 42

Outline

 What is Mobile Voice search?

 An example: Live Search for Windows Mobile

 Why is it important?  The Competitive Landscape  The Technology  Advancing the State-of-the-Art  Next generation Applications

Microsoft Research ---- Lang Tech 2008

slide-43
SLIDE 43

Next Generation Applications

Microsoft Research ---- Lang Tech 2008

 Better integration with information sources

 Unstructured information

 The web – “www dot langtech dot org”

 New kinds of structured information

 Product information  Movie reviews  Nutrition information – “Do apples have vitamin D?”  Access to private information

 “Show me my benefits information on the company website”  “Show me the email from Langtech about the banquet”

 Two-way interaction

 Rating products and businesses

slide-44
SLIDE 44

VoiceRate – A Sample NextGen Application

Microsoft Research ---- Lang Tech 2008

Get a rating or leave a rating? Leave a rating. Get a rating.

Local business, National Business, or Product ?

A product.

Which product?

Fisher Price Kick-Play Bouncer

This is rated 4.3 out of 5. Here are some of the things people had to say about this product: …

Local business, National Business, or Product ?

A product.

Which product?

Stanley 9 piece screwdriver set

Please rate it.

  • Excellent. It works great.

For comparison with

  • thers, please rate it on a scale
  • f one to five.

Five.

Thank you for using Voice Rate.

slide-45
SLIDE 45

VoiceRate Benefits

Microsoft Research ---- Lang Tech 2008

 User Benefits:  Facilitates informed impulse purchases  Let’s you provide immediate feedback  Access to ratings for:

 1.1M products (electronics, toys, books, DVDs, etc.)  270k restaurants (local businesses) in 1600 metros  3k national businesses (airlines, car rental companies, etc.)

 Researcher Benefits:  Fertile test-bed for many technologies

 Understanding verbal reviews  Summarizing across multiple reviews  Making pair-wise comparisons  Explaining why people like X better than Y  Core ASR

 Data collection

slide-46
SLIDE 46

Provider Benefits

 Sales of Targeted ads  Ask about Toro Snowblower; Snapper Snowblowers pays to

suggest their product

 Determine caller demographics by voice – tailor ads  Sale of market research services  When a person leaves a review  For example, if you call to review a lawnmower, Honda can pay to

ask “Did the mower cut the grass evenly?”

 When a person gets a review  If I call and ask about the Toro Power Curve Snow-blower, Toro can

pay to ask: “To help determine if there are any better products, how important is noise to you in a snowblower?”

 Location-specific ads  If you are in a Target store and call about X, that Target can pay

to offer you a deal.

Microsoft Research ---- Lang Tech 2008

slide-47
SLIDE 47

Conclusions

Microsoft Research ---- Lang Tech 2008

 Mobile Voice Search is a key technology area

 Impact on a large fraction of the world’s population  Global in scope

 Multi-modal interfaces are key

 Speech recognition is necessary because data entry just

too hard otherwise

 Click-driven feedback will drive system

improvements

 Current applications are just scratching the surface

slide-48
SLIDE 48

Thanks to VoiceSearch Collaborators!

Microsoft Research ---- Lang Tech 2008

 Xiao Li  Dan Bohus  Patrick Nguyen  Julian Odell  Oliver Scholz  Alex Acero