Improving Web Search with Language Technologies Thomas Hofmann - - PowerPoint PPT Presentation

improving web search with language technologies
SMART_READER_LITE
LIVE PREVIEW

Improving Web Search with Language Technologies Thomas Hofmann - - PowerPoint PPT Presentation

Improving Web Search with Language Technologies Thomas Hofmann Director of Engineering - Zurich Improving Web Search with Language Technologies 1 Lexical Semantics 2 Machine Translation 3 Information Extraction 4 Automatic Speech


slide-1
SLIDE 1

Improving Web Search with Language Technologies

Thomas Hofmann

Director of Engineering - Zurich

slide-2
SLIDE 2

2

Improving Web Search with Language Technologies 1 Lexical Semantics 2 Machine Translation 3 Information Extraction 4 Automatic Speech Recognition

slide-3
SLIDE 3

3

1 Lexical Semantics

Improving Ads Targeting & Search Quality

slide-4
SLIDE 4

4

Natural Language Processing for Search Quality

Two main ingredients: stemming and synonyms Challenges for synonym expansion

  • Learning of lexical semantics from data
  • High precision in order to avoid loss of topicality
  • Use context cues to trigger synonyms
slide-5
SLIDE 5

5

Natural Language Processing for Search quality

Synonym expansion depends on context:

ab = Alberta ab = Allen Bradley

slide-6
SLIDE 6

6

Expanded Matching in On-line Ads Targeting

Targeting mechanisms for AdWords: match user queries with advertiser (bidded) keywords Types of matches

  • Phrase match: all tokens from a keyword appear

consecutively in the query, and in the same order (keyword) used cars -> (query) cheap used cars

  • Broad match: all tokens from a keyword appear

somewhere in the query, regardless of order (keyword) used cars -> (query) used toyota cars

  • Expanded broad match: some tokens from a keyword or

its related words appear in the query (keyword) used cars -> (query) used automobiles, automobiles

slide-7
SLIDE 7

7

Expanded Matching in On-line Ads Targeting

slide-8
SLIDE 8

8

2 Machine Translation

Enriching Web Content

slide-9
SLIDE 9

9

Machine Translation for Web Search

Machine translation system developed in-house at Google (Franz Och) Goals: enrich Web content in languages with limited content Usage: Web page translation, translate this page link on result page, cross-language retrieval (Russian, Arabic) Challenges in machine translation:

  • MT from English into other target languages
  • MT for any text types & topics
  • Model size optimization & efficient search
  • Interface, usability, user feedback
slide-10
SLIDE 10

10

translate.google.com

slide-11
SLIDE 11

11

translate.google.com

slide-12
SLIDE 12

12

Search Results – “Translate this page” link

slide-13
SLIDE 13

13

Translation in Google Toolbar

slide-14
SLIDE 14

14

Translation Feedback -- Launched in Feb ‘07

slide-15
SLIDE 15

15

3 Information Extraction

Supporting Question Answer Retrieval

slide-16
SLIDE 16

16

Information Extraction for Question-Answer Retrieval

Open domain extraction of facts from the Web Goals: provide succinct answers to queries that are questions Usage: currently triggers a special “search onebox” to deliver a fact Challenges in information extraction:

  • Reliability of extracted facts
  • Coverage of relevant facts from all domains
  • Reputation of sources and combination thereof
  • Triggering of Q&A retrieval
  • Combination of evidence and inference
slide-17
SLIDE 17

17

Question Answering Retrieval: Example

Compile fact with source reference for simple question-like queries:

slide-18
SLIDE 18

18

4 Automatic Speech Recognition

1-800-GOOG-411

slide-19
SLIDE 19

19

Automatic Speech Recognition

1-800-GOOG-411 service from mobile phones Goals: local business information completely free, directly from your phone Usage: easy to use speech interface for mobile devices Challenges:

  • Speaker variability
  • Background noise
  • Navigation & usability
slide-20
SLIDE 20

20