Automatic Query Type Identification Automatic Query Type - PowerPoint PPT Presentation

Automatic Query Type Identification Automatic Query Type Identification Based on Click Through Information Based on Click Through Information Yiqun Liu, Min Zhang, Liu, Min Zhang, Liyun Liyun Ru Ru and and Shaoping Shaoping Ma Ma Yiqun State Key Lab of Intelligent Tech. & Sys State Key Lab of Intelligent Tech. & Sys Tsinghua University Tsinghua University

Automatic Query Type Identification Automatic Query Type Identification • Research Background • User analysis for query type identification • A Query Type Identification Algorithm • Experiments Results and Discussions

Research Background Research Background • Observer user from Search Engine’s prospect – Query stream & click through information – Query stream • Made up of queries which contain 3-4 words in English or less than 2 words in Chinese • Always confusing • Same query, different user request • Click through information helps us to identify users’ information needs

Research Background Research Background • Example: 魔獸爭霸（ War Craft ） – User type 1: Users want to visit a particular web site related to the game – User type 2: Users want to download the corresponding computer game – User type 3: Users want to get a overview of the corresponding computer game – We cannot identify the users’ information needs without the help of click through information

Research Background Research Background • Categories of Users’ information needs – Proposed by Broder(IBM, 2002) & Rose(Yahoo! 2004) respectively with search engine user behavior analysis – Navigational • A specific search target page • Users want to know a certain web page’s URL • “Yahoo HK”, “SIGIR 04 home” – Informational / Transactional • No specific search target page • Users want to know something about a certain topic • “bird flu”, “American civil war”

Research Background Research Background • Why should we identify users’ query types? – Different ranking models • Navigational type search: anchor text, URL information… • Informational type search: hyper link analysis, traditional IR models – Different performance • Navigational type search: MRR > 80%, systems can return the correct answer at 1st ranking for most queries • Informational type search: P@10 < 30%, systems can only return less than 3 correct answers in the top 10 results.

Research Background Research Background • Features used in query type identification – Query content feature • Length, POS information, existence of Abbreviation, etc. • Whether the query is the anchor text for a particular page – Result feedback of IR system • The similarity between query and top-ranked documents – Past click-through information • Past click behavior

Research Background Research Background • Related works – TREC2004: Query content and result feedback Best results ： 61.3% queries are correctly classified

Research Background Research Background • Related works – Kang et al • Mutual Information, POS and anchor text evidence • TREC data • Got better retrieval performance with his classification algorithm – Lee et al • Anchor text and click through information • UCLA campus search service data • 90% queries are correctly classified

Research Background Research Background • Major problems – Lack of practical search engine user analysis • TREC or small scale campus users’ behavior are significantly different from ordinary web users – Lack of examination of reliability • Small number of special designed queries • How many percentages of practical queries can be classified?

Automatic Query Type Identification Automatic Query Type Identification • Research Background • User analysis for query type identification • Query Type Identification Algorithm • Experiments Results and Discussions

User analysis for query type identification User analysis for query type identification • Review of proposed features in query type identification – Practical query logs obtained from Sogou.com • All user queries and corresponding click through data in February 2006 • 86538613 clicks • 26255952 user sessions • 4345557 unique user queries – About 200 queries are annotated by 3 assessors using voting method for training

User analysis for query type identification User analysis for query type identification • Query Length – Distribution of query length for different query types

User analysis for query type identification User analysis for query type identification • Part of speech tagging – POS feature of different types of queries

User analysis for query type identification User analysis for query type identification • In-link anchor information – Assumption: If one query Q shares the same content as a anchor text linking to a page A , Q is likely to be a navigational type query whose target page is A. – A has a lot of anchors whose content is Q -> Q is a navigational type query – Adopted by Kang (2004) and Lee (2005)

User analysis for query type identification User analysis for query type identification • How many queries can be identified using anchor text information? – Not all queries have a page which shares a same anchor 40% 35% 30% 25% 20% 15% 10% 5% Date 0% 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28

User analysis for query type identification User analysis for query type identification • How many queries can be identified using past click through information? – About 90% queries have been proposed and clicked every day. 100% 90% 80% 70% 60% 50% 40% 30% 20% 10% Date 0% 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28

Automatic Query Type Identification Automatic Query Type Identification • Research Background • User analysis for query type identification • Query Type Identification Algorithm • Experiments Results and Discussions

Query Type Identification Algorithm Query Type Identification Algorithm • N-click satisfied rate – Assumption 1( 懶鬼假設 ): When user submits a navigational type query, he clicks a small number of result URLs. • User has a specified search target in navigational searches • He is intended to click the highly-related results only. – N-click satisfied rate

Query Type Identification Algorithm Query Type Identification Algorithm • Distribution of nCS for search engine queries

Query Type Identification Algorithm Query Type Identification Algorithm • Top-n-result satisfied rate – Assumption 2( 封面假設 ):When user submits a navigational type query, he only clicks the top-ranked result URLs. • Navigational type search has good performance (usually over 80% correct answers are returned at top 1 ranking result) • It is not necessary for him to click other results – Top-n-result satisfied rate

Query Type Identification Algorithm Query Type Identification Algorithm • Distribution of nRS for search engine queries

Query Type Identification Algorithm Query Type Identification Algorithm • Click Distribution – Assumption 3( 焦點假設 ): When different users submit a same navigational type queries, they intend to click the same result URL. • Navigational type queries have specific search targets • If this target appears in the result URL list, users will focus on it. – Click Distribution

Query Type Identification Algorithm Query Type Identification Algorithm • Distribution of CD for search engine queries 导航类信息类 30.00% 25.00% 20.00% 15.00% 10.00% 5.00% 0.00% 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 Queries Focus URL 讀寫網 www.duxie.net/ 南方都市報 www.nanfangdaily.com.cn/ 卓越網 www.joyo.com/

Query Type Identification Algorithm Query Type Identification Algorithm • A query type identification decision tree

Query Type Identification for Web Search Query Type Identification for Web Search Engines Engines • Research Background • User analysis for query type identification • Query Type Identification Algorithm • Experiments Results and Discussions

实验结论与应用方式讨论实验结论与应用方式讨论 • Test set – Completely different from the training set • Different annotation methods: • Obtain informational type queries from a Chinese search engine performance contest organized by TianWang.com • Obtain navigational type queries from a famous Chinese Web directory (Hao123.com) • 200+ test queries

实验结论与应用方式讨论实验结论与应用方式讨论 • Experimental results – Our method outperforms previous Click-Distribution based method. (+30% in training, +19% in testing) 0.90 F-measure Train 0.85 Test 0.80 0.75 Train Test 0.70 0.65 Dtree CD

实验结论与应用方式讨论实验结论与应用方式讨论 • Experimental results – Over 80% queries are correctly classified both in training and testing sets

Thank you! Questions or comments?

Automatic Query Type Identification Automatic Query Type - PowerPoint PPT Presentation

Automatic Query Type Identification Automatic Query Type Identification Based on Click Through Information Based on Click Through Information Yiqun Liu, Min Zhang, Liu, Min Zhang, Liyun Liyun Ru Ru and and Shaoping Shaoping Ma Ma Yiqun

Improve Query Performance with the Query Log Analyzer Kees Vegter Field Engineer Query Log

Type Checking Grammar Rule Semantic Rule var-decl id : type-exp Insert (id.name, type-exp .

Query Execution 2 and Query Optimization Instructor: Matei Zaharia cs245.stanford.edu Query

Query Processing Relevance feedback; query expansion; Web Search 1 Overview Indexes Query

Automatic Verification of Automatic Verification of Automatic Verification of Automatic

RISK IDENTIFICATION Everything your competitor knows about Risk Identification on Software

Query Understanding: A Manifesto Daniel Tunkelang queryunderstanding.com Overview What is

Perfect Query FORMULA 5 critical sections in every successful query letter (c) 2019

Query Op)miza)on 1 Query op)miza)on Given an SQL query,

CS4224/CS5424 Lecture 9 Distributed Query Processing Query Processing Translates query into a

Hindley-Milner Type Checking Automatic Type Inference What can be inferred about type of f or x

Automatic Identification of Locative Expressions from Social Media Text: A Comparative Analysis

Agenda Unique Identification (UID); Item Unique Identification; Unique Item Identifier (UII)

Hazard Identification & Control Contents Hazard Identification & Control Hazard Alert

CIMMYT CAGE meeting CIMMYT CAGE meeting Update : Identification and utilization of novel sources

Religious Profile: Jewish Identification 2 Jewish Identification (Jewish Households)

Student Login Path: My Baker My Services Career Services Handshake

Pr Prepar eparat ation 1. Install the free ZO ZOOM Cl Clien ent for or Meet eetings on your

Intersecting Fields Pr Proximal Reading a network-scaled approach to digital literature

Abstract Session E4: Health Information Technology Moderator: Stephen D. Persell, MD, MPH PRIMARY

OSCAR GALLEGO GLOBAL HEAD OF SMART COMMUNICATIONS & SECURITY VODAFONE Vodafone RCS

OE7 MEDIA BUY REQUEST FY18-FY19 PERFORMANCE HIGHLIGHTS *(2019: https://bit.ly/2meoQNZ) 2 Paid

2019 2019 M Mar arketi ting Pre resentat ation The P Power o of LAMMA Advertis isin

Optimal Large-Scale Internet Media Selection Gareth James Department of Data Sciences and

Automatic Query Type Identification Automatic Query Type - PowerPoint PPT Presentation

Automatic Query Type Identification Automatic Query Type Identification Based on Click Through Information Based on Click Through Information Yiqun Liu, Min Zhang, Liu, Min Zhang, Liyun Liyun Ru Ru and and Shaoping Shaoping Ma Ma Yiqun

Improve Query Performance with the Query Log Analyzer Kees Vegter Field Engineer Query Log

Type Checking Grammar Rule Semantic Rule var-decl id : type-exp Insert (id.name, type-exp .

Query Execution 2 and Query Optimization Instructor: Matei Zaharia cs245.stanford.edu Query

Query Processing Relevance feedback; query expansion; Web Search 1 Overview Indexes Query

Automatic Verification of Automatic Verification of Automatic Verification of Automatic

RISK IDENTIFICATION Everything your competitor knows about Risk Identification on Software

Query Understanding: A Manifesto Daniel Tunkelang queryunderstanding.com Overview What is

Perfect Query FORMULA 5 critical sections in every successful query letter (c) 2019

Query Op)miza)on 1 Query op)miza)on Given an SQL query,

CS4224/CS5424 Lecture 9 Distributed Query Processing Query Processing Translates query into a

Hindley-Milner Type Checking Automatic Type Inference What can be inferred about type of f or x

Automatic Identification of Locative Expressions from Social Media Text: A Comparative Analysis

Agenda Unique Identification (UID); Item Unique Identification; Unique Item Identifier (UII)

Hazard Identification &amp; Control Contents Hazard Identification &amp; Control Hazard Alert

CIMMYT CAGE meeting CIMMYT CAGE meeting Update : Identification and utilization of novel sources

Religious Profile: Jewish Identification 2 Jewish Identification (Jewish Households)

Student Login Path: My Baker My Services Career Services Handshake

Pr Prepar eparat ation 1. Install the free ZO ZOOM Cl Clien ent for or Meet eetings on your

Intersecting Fields Pr Proximal Reading a network-scaled approach to digital literature

Abstract Session E4: Health Information Technology Moderator: Stephen D. Persell, MD, MPH PRIMARY

OSCAR GALLEGO GLOBAL HEAD OF SMART COMMUNICATIONS &amp; SECURITY VODAFONE Vodafone RCS

OE7 MEDIA BUY REQUEST FY18-FY19 PERFORMANCE HIGHLIGHTS *(2019: https://bit.ly/2meoQNZ) 2 Paid

2019 2019 M Mar arketi ting Pre resentat ation The P Power o of LAMMA Advertis isin

Optimal Large-Scale Internet Media Selection Gareth James Department of Data Sciences and

Hazard Identification & Control Contents Hazard Identification & Control Hazard Alert

OSCAR GALLEGO GLOBAL HEAD OF SMART COMMUNICATIONS & SECURITY VODAFONE Vodafone RCS