Improving accuracy of SMS based FAQ retrieval
Anwar Shaikh, Mukul Jain, Mukul Rawat, Rajiv Ratn
FIRE 2011
From: Delhi T echnological University (DTU), Formerly known as Delhi College of Engineering (DCE)
I mproving a ccuracy of SMS based FAQ r etrieval From: Delhi T - - PowerPoint PPT Presentation
I mproving a ccuracy of SMS based FAQ r etrieval From: Delhi T echnological University (DTU), Formerly known as Delhi College of Engineering (DCE) Anwar Shaikh, Mukul Jain, Mukul Rawat, Rajiv FIRE Ratn 2011 Outline Abstract Introduction
From: Delhi T echnological University (DTU), Formerly known as Delhi College of Engineering (DCE)
12/3/2011 FIRE 2011: Improving Accuracy of SMS based FAQ Retrieval
12/3/2011 FIRE 2011: Improving Accuracy of SMS based FAQ Retrieval
Amazing growing rate of number of mobile users. Anytime anywhere access provided by mobile networks. This encouraged service providers to build information services based
Existing systems require either SMSes in some particular format or the intervention of human in the query response Proposed Automatic system would provide user an independence to write query without any format and the system would produce response without the intervention of human
12/3/2011 FIRE 2011: Improving Accuracy of SMS based FAQ Retrieval
SMS questions poses significant challenges due to the inherent noise in it Handle the noise in a SMS query by formulating the query similarity over FAQ questions as a combinatorial search problem The Search space consists of combinations of all possible dictionary variations of tokens in the noisy query Scoring function based on similarity 12/3/2011 FIRE 2011: Improving Accuracy of SMS based FAQ Retrieval
12/3/2011 FIRE 2011: Improving Accuracy of SMS based FAQ Retrieval
New Scoring function Proximity Score based on Proximity Search Length Score based on length of FAQ and SMS query Consider FAQ answers as well for finding closest similar question with respect to SMS query
12/3/2011 FIRE 2011: Improving Accuracy of SMS based FAQ Retrieval
Step 1: Preprocessing- Indexing for FAQ questions and answers Create Domain and Synonym Dictionary Remove Stop Words Remove Punctuation Symbols Convert number to word (e.g 4get to fourget) Step 2: Calculate the Similarity Score Calculations of Similarity Measure Calculations of Inverse Document Frequency Step 3: Calculate the Proximity Score Step 4: Calculate the Length Score Step 5: Result- If match is found then return the result Else Look for the FAQ answer for matching Step 6: If still there is not any match then check Out of Domain logic for query 12/3/2011 FIRE 2011: Improving Accuracy of SMS based FAQ Retrieval
Score (Q) = W1* Similarity_Score(Q, S) + W2* Proximity_Score(Q, S) – W3* Length_Score(Q, S)
Where Q is the FAQ question under consideration and S = {s1, s2, …,sn} is the SMS query
12/3/2011 FIRE 2011: Improving Accuracy of SMS based FAQ Retrieval
12/3/2011 FIRE 2011: Improving Accuracy of SMS based FAQ Retrieval
SMS: “wt is captl f india?” FAQ 1: “What is the capital of UP? It is situated which
Answer: “Lucknow is the capital of UP and It is in the
FAQ 2: “What is the capital of India?” Answer: “Delhi is the capital of India.”
Click to edit Master text styles Second level
12/3/2011 FIRE 2011: Improving Accuracy of SMS based FAQ Retrieval
Click to edit Master text styles Second level
12/3/2011 FIRE 2011: Improving Accuracy of SMS based FAQ Retrieval
12/3/2011 FIRE 2011: Improving Accuracy of SMS based FAQ Retrieval
SMS: “wt is captl f india?” FAQ 1: “What is the capital of UP? It is situated which
Answer: “Lucknow is the capital of UP and It is in the
FAQ 2: “What is the capital of India?” Answer: “Delhi is the capital of India.”
Click to edit Master text styles Second level
12/3/2011 FIRE 2011: Improving Accuracy of SMS based FAQ Retrieval
There is a drawback of using Length Score when a question having more number of tokens would always have less overall score because there are more number of unmatched FAQ tokens.
12/3/2011 FIRE 2011: Improving Accuracy of SMS based FAQ Retrieval
FAQ Question: “DTU offers various M Tech courses.
Corresponding Small Question: “What are Internship
12/3/2011 FIRE 2011: Improving Accuracy of SMS based FAQ Retrieval
Click to edit Master text styles Second level
12/3/2011 FIRE 2011: Improving Accuracy of SMS based FAQ Retrieval
This will be used only when- There is more than one FAQ-question having the closest matching with the SMS query. There is no matching FAQ-question found. FAQ: “What are the different insurance schemes?” Answer: “LIC, LIC JivanSaral, LIC JivanTarang, LIC Plus, Bajaj Allianz, ICICI Lombard etc are different insurance schemes.” SMS: “wht r difrnt LIC scems?”
12/3/2011 FIRE 2011: Improving Accuracy of SMS based FAQ Retrieval
12/3/2011 FIRE 2011: Improving Accuracy of SMS based FAQ Retrieval Statistics for Hindi task: ***** FIRE 2011 SMS TASK EVALUATION REPORT *****
In Domain correct:198/200 (0.99) Out of Domain correct:3/124 (0.024193548) Mean Reciprocal Rank (MRR): 0.99 Statistics for FAQ database in English: ***** FIRE 2011 SMS TASK EVALUATION REPORT *****
In Domain correct:539/704 (0.765625) Out of Domain correct:871/2701 (0.32247317) Mean Reciprocal Rank (MRR): 0.8309513
Lucene1 Wordnet English2 Wordnet Hindi3
12/3/2011 FIRE 2011: Improving Accuracy of SMS based FAQ Retrieval
Stemming Automatic Spelling Checker Rewriting FAQ Improve Proximity Search
12/3/2011 FIRE 2011: Improving Accuracy of SMS based FAQ Retrieval
Govind Kothari, Sumit Negi, T anveer A. Faruquie, Venkaesan T. Chakaravarthy, L. Venkata Subramaniam. 2009. SMS based Interface for FAQ Retrieval. ACL and AFNLP Suntec, Singapore. Danish Cotractor, Govind Kothari, T anveer A. Faruquie, L. Venkata Subramaniam, Sumit Negi. 2010. Handling Noisy Queries In Cross Language FAQ Retrieval. ACL MIT, Massachusetts, USA. [1] http://lucene.apache.org/ [2] http://wordnet.princeton.edu/ [3] http://www.cfilt.iitb.ac.in/wordnet/webhwn/
12/3/2011 FIRE 2011: Improving Accuracy of SMS based FAQ Retrieval