Quora Question Pairs Identify if two questions have the same intent - PowerPoint PPT Presentation

Oct 14, 2022 •983 likes •1.2k views

Quora Question Pairs Identify if two questions have the same intent Agenda 1. Problem 2. Train & test data 3. Analyzing the data 4. Vectorizing the data 5. Extra feature selection 6. AI Models a. XGBoost b. Neural Network 7. Results

Quora Question Pairs Identify if two questions have the same intent
Agenda 1. Problem 2. Train & test data 3. Analyzing the data 4. Vectorizing the data 5. Extra feature selection 6. AI Models a. XGBoost b. Neural Network 7. Results
Problem Given a pair of questions q1 More formally: and q2 we need to determine Build a model that learns the if they are duplicates of each function: other. f(q1, q2) = 1 or 0
Train data Test data Question 1 - Question 2 Question 1 - Question 2 - Answer Question 3 - Question 4 Question 3 - Question 4 - Answer … … Question 2.000.108 - Question 2.000.109 Question 400.904 - Question 400.905 - Answer Example Could time travel ever be possible? - Will time travel ever be possible? - 1 Why aren’t blueberries blue? - Do rubber ducks quack? - 0
Analyzing the data Needed to answer the question: How can a computer determine if two questions are duplicates? What features makes a pair of questions more likely to be duplicates?
Vectorizing How do we perform calculations on strings? Answer: By vectorizing it!
GloVe Pre-trained vectors for English words. Similar words placed closer in vector space, giving a sense of context. GloVe 50d ● ● GloVe 100d GloVe 200d ● GloVe 300d ●
GloVe King + Woman = Queen glove(“King”) + glove(“Woman”) = glove(“Queen”) [0.126, 0.043, …, 0.321] + [0.421, 0.203, …, 0.366] = [0.547, 0.246, …, 0.687]
Extra Features Basic Features: Distance Features (using GloVe vector space): Length of question 1 Euclidian distance ● ● Length of question 2 Manhattan distance ● ● Length difference Cosine distance ● ● Nbr of words in question 1 Correlation distance ● ● Nbr of words in question 2 Jaccard distance ● ● Number of common words Chebyshev distance ● ● ... Hamming distance ● ● Canberra distance ● Braycurtis distance ● ... ●
Final vector Adding everything together gives us a vector on following form: [glove(Question 1), glove(Question 2), extra features] = 115 dimensions
XGBoost Stands for eXtreme Gradient Boosting Gradient boosting is an approach which predicts the errors made by existing models and adds models until no improvements can be made There are two main reasons for using XGBoost Execution speed ● Model performance ● Have been shown to be the go-to algorithm for Kaggle competition winners Result?
0.35660 Logarithmic loss
Neural Network + + Tensorflow - Open source machine learning library for python by Google ● ● Keras - Tensorflow API, additional abstraction layer. GPU acceleration support ●
Neural Network
Feed-Forward Neural Network Input: GloVe vector, 115 neurons wide. Weights: Edge weights between neurons updates automatically in the training phase. Output: 1 neuron, value between 0 and 1.
Results XGBoost: 0.35660 Feed-Forward Neural Network: 0.35354 1,257th place of 2,847 in Kaggle competition
Demonstration
Questions?

Recommend

Scaling Quality On Quora Using Machine Learning Nikhil Garg @nikhilgarg28 @Quora @QconSF

Scaling Quality On Quora Using Machine Learning Nikhil Garg @nikhilgarg28 @Quora @QconSF 11/7/16 Goals Of The Talk Introducing specific product problems we need to solve to stay high-quality Describing our formulation and approach to

883 views • 75 slides

Quora is a platform to ask questions, get useful answers, and share what you know with the

Quora is a platform to ask questions, get useful answers, and share what you know with the world. Data at Quora Lifecycle of a question Deep dive: Automatic question correction Other question and answer understanding

534 views • 42 slides

Ascent sequences avoiding pairs of Lara Pudwell patterns Introduction & History Pairs of

Ascent sequences avoiding pairs of patterns Ascent sequences avoiding pairs of Lara Pudwell patterns Introduction & History Pairs of Length 3 Patterns Unbalanced equivalences Lara Pudwell An Erd os-Szekeres-like Theorem Other

596 views • 43 slides

Biorthogonal Filter Pairs und Wavelets WTBV January 20, 2016 WTBV Biorthogonal Filter Pairs und

Biorthogonal Filter Pairs und Wavelets WTBV January 20, 2016 WTBV Biorthogonal Filter Pairs und Wavelets January 20, 2016 1 / 50 Review: orthogonal filters 1 Biorthogonal filter pairs 2 Motivation and setup Transformation matrices and

511 views • 50 slides

Line segment intersection Find all pairs of intersecting line segments. Find all pairs of

Line segment intersection Find all pairs of intersecting line segments. Find all pairs of intersecting line segments. ...... ........... Winter term 11/12 1 Line segment intersection Find all pairs of intersecting line segments. p g g A

184 views • 17 slides

MATH 105: Finite Mathematics 1-2: Pairs of Lines Prof. Jonathan Duncan Walla Walla College

Pairs of Lines Perpendicular Lines Conclusion MATH 105: Finite Mathematics 1-2: Pairs of Lines Prof. Jonathan Duncan Walla Walla College Winter Quarter, 2006 Pairs of Lines Perpendicular Lines Conclusion Outline Pairs of Lines 1

380 views • 35 slides

Architecting Cross-Platform Mobile Frameworks Spencer Chan Quora Motivation Two extremes

Architecting Cross-Platform Mobile Frameworks Spencer Chan Quora Motivation Two extremes Fully native Fully HTML+JS How can we get the best of both worlds? Motivation Two extremes: Fully native Fully

1.1k views • 61 slides

DONT REMOVE MY STOP WORDS: IDENTIFYING PERSONALITY TRAITS FROM QUORA ANSWERS ASHUTOSH BAHETI,

DONT REMOVE MY STOP WORDS: IDENTIFYING PERSONALITY TRAITS FROM QUORA ANSWERS ASHUTOSH BAHETI, 12CS10012 RAHUL GURNANI, 12CS10039 DHRUV JAIN, 12CS30043 NISHKARSH SHASTRI, 12CS10034 SABYASACHEE BARAUH, 12CS30029 OBJECTIVE 2 Identifying

585 views • 29 slides

Machine Learning @Quora: Beyond Deep Learning 08/02/2016 Xavier Amatriain (@xamat) Our Mission

Machine Learning @Quora: Beyond Deep Learning 08/02/2016 Xavier Amatriain (@xamat) Our Mission To share and grow the worlds knowledge Millions of questions Millions of answers Millions of users Thousands of topics

767 views • 52 slides

God of Peace? Question Question Various approaches Question Various approaches Suggestions

God of War or God of Peace? Question Question Various approaches Question Various approaches Suggestions Question Various approaches Suggestions Question Various approaches Go strike Amalek and devote to destruction all that they have.

1.01k views • 89 slides

Com Compatible patible Pairs Pair Compatible Pairs for addition and subtraction are numbers

75 and 25 make 100 6 and 4 make 10 Com Compatible patible Pairs Pair Compatible Pairs for addition and subtraction are numbers which go together to make nice numbers. The tasks in this booklet get students accustomed to looking for

371 views • 9 slides

All-Pairs Shortest Paths Version of October 28, 2016 Version of October 28, 2016 All-Pairs

All-Pairs Shortest Paths Version of October 28, 2016 Version of October 28, 2016 All-Pairs Shortest Paths 1 / 26 Outline Another example of dynamic programming Will see two different dynamic programming formulations for same problem. Outline

276 views • 27 slides

Stratospheric DIAL system Maido 6 channels and 3 signal pairs of absorbed/non absorbed wavelength

Stratospheric DIAL system Maido 6 channels and 3 signal pairs of absorbed/non absorbed wavelength 4 rayleigh pairs and 2 raman pairs DIAL METHOD nO3(z) ozone number density at altitude z P( i ,z) number of detected photons at wavelength I

324 views • 6 slides

Assembly Assembly Assembling with Repeats Assembling with Repeats Mate Pairs Mate Pairs Whole

Assembly Assembly Assembling with Repeats Assembling with Repeats Mate Pairs Mate Pairs Whole genome Whole genome shotgun shotgun Input: Input: Shotgun sequence fragments (reads) Shotgun sequence fragments (reads) Mate

674 views • 24 slides

Least branch hod pairs pairs Hod pair capturing and HOD . John R. Steel University of

Preliminaries Definition of least branch hod pair Comparison of least branch hod Least branch hod pairs pairs Hod pair capturing and HOD . John R. Steel University of California, Berkeley January 2017 Preliminaries Problem: Analyze HOD

1.23k views • 93 slides

LSB detection by Pairs Analysis CSM25 Secure Information Hiding Dr Hans Georg Schaathun

LSB detection by Pairs Analysis CSM25 Secure Information Hiding Dr Hans Georg Schaathun University of Surrey Spring 2007 Dr Hans Georg Schaathun LSB detection by Pairs Analysis Spring 2007 1 / 41 Outcomes Learn how to implement pairs

1k views • 66 slides

Formal Methods && Tools Group Stefania Gnesi F F F M&&T M&&T

F F F Formal Methods && Tools Group Stefania Gnesi F F F M&&T M&&T M&&T June 27, 03 Formal Methods && Tools Group - ISTI CNR CAF - 1 F F F Outline Overview of the Formal Methods

276 views • 23 slides

Welcome to the Town of Regina Beach Owners and Tenants Meeting July 17, 2020 1

Welcome to the Town of Regina Beach Owners and Tenants Meeting July 17, 2020 1 Introductions Regina Beach Town Council Regina Beach Acting CAO Regina Beach Public Works and Utilities Manager EMO: Kathy Burnett Brand

804 views • 38 slides

CS3157: Advanced Programming Lecture # 4 Sept 25 Shlomo Hershkop shlomo@cs.columbia.edu 1

CS3157: Advanced Programming Lecture # 4 Sept 25 Shlomo Hershkop shlomo@cs.columbia.edu 1 Announcements next Monday (October 2) no class will be meeting in lab as usual first homework assignment will be released online

424 views • 39 slides

Programs that Respond to Input Programs in chapters one and two generate the same output each

Programs that Respond to Input Programs in chapters one and two generate the same output each time they are executed. Old MacDonald doesnt get new animals without editing and recompiling the program Drawbacks in editing and

496 views • 16 slides

XQuery 3.0 Overview: XQuery 3.0 Fix shortcomings of XQuery 1.0, not a radical change

XQuery 3.0 Overview: XQuery 3.0 Fix shortcomings of XQuery 1.0, not a radical change Better align XPath 3.0, XSLT 3.0, XQuery 3.0 (thus the version!) Properly incorporate some of the best ideas from other environments Higher

482 views • 27 slides

JUST THE MATHS SLIDES NUMBER 13.11 INTEGRATION APPLICATIONS 11 (Second moments of an area

JUST THE MATHS SLIDES NUMBER 13.11 INTEGRATION APPLICATIONS 11 (Second moments of an area (A)) by A.J.Hobson 13.11.1 Introduction 13.11.2 The second moment of an area about the y -axis 13.11.3 The second moment of an area about the x

370 views • 11 slides

JUST THE MATHS SLIDES NUMBER 3.1 TRIGONOMETRY 1 (Angles & trigonometric functions)

JUST THE MATHS SLIDES NUMBER 3.1 TRIGONOMETRY 1 (Angles & trigonometric functions) by A.J.Hobson 3.1.1 Introduction 3.1.2 Angular measure 3.1.3 Trigonometric functions UNIT 3.1 - TRIGONOMETRY 1 - ANGLES AND TRIGONOMETRIC

256 views • 9 slides

Walks with large steps in the quadrant Mireille Bousquet-Mlou, CNRS, Universit de Bordeaux

Walks with large steps in the quadrant Mireille Bousquet-Mlou, CNRS, Universit de Bordeaux based on work with Alin Bostan, INRIA Saclay, Paris Steve Melczer, University of Waterloo and cole normale suprieure de Lyon Outline I.

959 views • 71 slides