I mproving a ccuracy of SMS based FAQ r etrieval From: Delhi T - - PowerPoint PPT Presentation

i mproving a ccuracy of sms based faq r etrieval
SMART_READER_LITE
LIVE PREVIEW

I mproving a ccuracy of SMS based FAQ r etrieval From: Delhi T - - PowerPoint PPT Presentation

I mproving a ccuracy of SMS based FAQ r etrieval From: Delhi T echnological University (DTU), Formerly known as Delhi College of Engineering (DCE) Anwar Shaikh, Mukul Jain, Mukul Rawat, Rajiv FIRE Ratn 2011 Outline Abstract Introduction


slide-1
SLIDE 1

Improving accuracy of SMS based FAQ retrieval

Anwar Shaikh, Mukul Jain, Mukul Rawat, Rajiv Ratn

FIRE 2011

From: Delhi T echnological University (DTU), Formerly known as Delhi College of Engineering (DCE)

slide-2
SLIDE 2

Outline

Abstract Introduction Prior Work Our Contribution Proposed System Problem Formulation T

  • ols Used

Results Future Work References

12/3/2011 FIRE 2011: Improving Accuracy of SMS based FAQ Retrieval

slide-3
SLIDE 3

Abstract

We Implemented an automatic SMS-based question answering system for SMS users as proposed by L. Venkata Subramanium and team in their paper SMS based interface for FAQ retrieval (2009) We are Presenting three techniques to improve the accuracy of SMS based FAQ Retrieval that does not require any training data or SMS normalization It can handle syntactic and semantic variations in question formulation with more accuracy

12/3/2011 FIRE 2011: Improving Accuracy of SMS based FAQ Retrieval

slide-4
SLIDE 4

Introduction

Amazing growing rate of number of mobile users. Anytime anywhere access provided by mobile networks. This encouraged service providers to build information services based

  • n SMS technology

Existing systems require either SMSes in some particular format or the intervention of human in the query response Proposed Automatic system would provide user an independence to write query without any format and the system would produce response without the intervention of human

12/3/2011 FIRE 2011: Improving Accuracy of SMS based FAQ Retrieval

slide-5
SLIDE 5

Prior Work

SMS questions poses significant challenges due to the inherent noise in it Handle the noise in a SMS query by formulating the query similarity over FAQ questions as a combinatorial search problem The Search space consists of combinations of all possible dictionary variations of tokens in the noisy query Scoring function based on similarity 12/3/2011 FIRE 2011: Improving Accuracy of SMS based FAQ Retrieval

slide-6
SLIDE 6

Calculation of Similarity Score

12/3/2011 FIRE 2011: Improving Accuracy of SMS based FAQ Retrieval

slide-7
SLIDE 7

Our Contribution

New Scoring function Proximity Score based on Proximity Search Length Score based on length of FAQ and SMS query Consider FAQ answers as well for finding closest similar question with respect to SMS query

12/3/2011 FIRE 2011: Improving Accuracy of SMS based FAQ Retrieval

slide-8
SLIDE 8

Proposed System

Step 1: Preprocessing- Indexing for FAQ questions and answers Create Domain and Synonym Dictionary Remove Stop Words Remove Punctuation Symbols Convert number to word (e.g 4get to fourget) Step 2: Calculate the Similarity Score Calculations of Similarity Measure Calculations of Inverse Document Frequency Step 3: Calculate the Proximity Score Step 4: Calculate the Length Score Step 5: Result- If match is found then return the result Else Look for the FAQ answer for matching Step 6: If still there is not any match then check Out of Domain logic for query 12/3/2011 FIRE 2011: Improving Accuracy of SMS based FAQ Retrieval

slide-9
SLIDE 9

Problem Formulation

Score (Q) = W1* Similarity_Score(Q, S) + W2* Proximity_Score(Q, S) – W3* Length_Score(Q, S)

Where Q is the FAQ question under consideration and S = {s1, s2, …,sn} is the SMS query

W1 + W2 = 1.0 (or 100%). W3 is assigned comparatively less value

12/3/2011 FIRE 2011: Improving Accuracy of SMS based FAQ Retrieval

slide-10
SLIDE 10

E.g. of Proximity Search

12/3/2011 FIRE 2011: Improving Accuracy of SMS based FAQ Retrieval

SMS: “wt is captl f india?” FAQ 1: “What is the capital of UP? It is situated which

part of India?”

Answer: “Lucknow is the capital of UP and It is in the

north part of India.”

FAQ 2: “What is the capital of India?” Answer: “Delhi is the capital of India.”

slide-11
SLIDE 11

Formulation of Proximity Score

Click to edit Master text styles Second level

  • Third level
  • Fourth level
  • Fifth level

12/3/2011 FIRE 2011: Improving Accuracy of SMS based FAQ Retrieval

slide-12
SLIDE 12

Calculating Proximity Score

Click to edit Master text styles Second level

  • Third level
  • Fourth level
  • Fifth level

12/3/2011 FIRE 2011: Improving Accuracy of SMS based FAQ Retrieval

slide-13
SLIDE 13

E.g. of Length Search

12/3/2011 FIRE 2011: Improving Accuracy of SMS based FAQ Retrieval

SMS: “wt is captl f india?” FAQ 1: “What is the capital of UP? It is situated which

part of India? What is the culture of UP?”

Answer: “Lucknow is the capital of UP and It is in the

north part of India. Peoples are very friendly in nature.”

FAQ 2: “What is the capital of India?” Answer: “Delhi is the capital of India.”

slide-14
SLIDE 14

Length Score (Negative)

Click to edit Master text styles Second level

  • Third level
  • Fourth level
  • Fifth level

12/3/2011 FIRE 2011: Improving Accuracy of SMS based FAQ Retrieval

slide-15
SLIDE 15

Limitations of Length Score

There is a drawback of using Length Score when a question having more number of tokens would always have less overall score because there are more number of unmatched FAQ tokens.

12/3/2011 FIRE 2011: Improving Accuracy of SMS based FAQ Retrieval

slide-16
SLIDE 16

Solution 1: Rewrite FAQ

FAQ Question: “DTU offers various M Tech courses.

What are the Internship opportunities for M Tech students at DTU? There are many M Tech students in DTU. Do all M Tech students get the Internship

  • ffer?”

Corresponding Small Question: “What are Internship

  • pportunities for M Tech students at DTU?”

12/3/2011 FIRE 2011: Improving Accuracy of SMS based FAQ Retrieval

slide-17
SLIDE 17

Solution 2: Length Score (Positive)

Click to edit Master text styles Second level

  • Third level
  • Fourth level
  • Fifth level

12/3/2011 FIRE 2011: Improving Accuracy of SMS based FAQ Retrieval

slide-18
SLIDE 18

Matching with Answers

This will be used only when- There is more than one FAQ-question having the closest matching with the SMS query. There is no matching FAQ-question found. FAQ: “What are the different insurance schemes?” Answer: “LIC, LIC JivanSaral, LIC JivanTarang, LIC Plus, Bajaj Allianz, ICICI Lombard etc are different insurance schemes.” SMS: “wht r difrnt LIC scems?”

12/3/2011 FIRE 2011: Improving Accuracy of SMS based FAQ Retrieval

slide-19
SLIDE 19

Results

12/3/2011 FIRE 2011: Improving Accuracy of SMS based FAQ Retrieval Statistics for Hindi task: ***** FIRE 2011 SMS TASK EVALUATION REPORT *****

  • No. of In-domain Queries :200
  • No. of Out of Domain Queries:124

In Domain correct:198/200 (0.99) Out of Domain correct:3/124 (0.024193548) Mean Reciprocal Rank (MRR): 0.99 Statistics for FAQ database in English: ***** FIRE 2011 SMS TASK EVALUATION REPORT *****

  • No. of In-domain Queries :704
  • No. of Out of Domain Queries:2701

In Domain correct:539/704 (0.765625) Out of Domain correct:871/2701 (0.32247317) Mean Reciprocal Rank (MRR): 0.8309513

slide-20
SLIDE 20

T

  • ols Used

Lucene1 Wordnet English2 Wordnet Hindi3

12/3/2011 FIRE 2011: Improving Accuracy of SMS based FAQ Retrieval

slide-21
SLIDE 21

Future Work

Stemming Automatic Spelling Checker Rewriting FAQ Improve Proximity Search

12/3/2011 FIRE 2011: Improving Accuracy of SMS based FAQ Retrieval

slide-22
SLIDE 22

References

Govind Kothari, Sumit Negi, T anveer A. Faruquie, Venkaesan T. Chakaravarthy, L. Venkata Subramaniam. 2009. SMS based Interface for FAQ Retrieval. ACL and AFNLP Suntec, Singapore. Danish Cotractor, Govind Kothari, T anveer A. Faruquie, L. Venkata Subramaniam, Sumit Negi. 2010. Handling Noisy Queries In Cross Language FAQ Retrieval. ACL MIT, Massachusetts, USA. [1] http://lucene.apache.org/ [2] http://wordnet.princeton.edu/ [3] http://www.cfilt.iitb.ac.in/wordnet/webhwn/

12/3/2011 FIRE 2011: Improving Accuracy of SMS based FAQ Retrieval

slide-23
SLIDE 23

Thanks!