NPFL103: Information Retrieval (9) Vector Space Classification - PowerPoint PPT Presentation

Vector space classification k nearest neighbors Linear classifiers Support vector machines NPFL103: Information Retrieval (9) Vector Space Classification Pavel Pecina Institute of Formal and Applied Linguistics Faculty of Mathematics and Physics Charles University Original slides are courtesy of Hinrich Schütze, University of Stutugart. 1 / 52 pecina@ufal.mff.cuni.cz

Vector space classification k nearest neighbors Linear classifiers Support vector machines Contents Vector space classification k nearest neighbors Linear classifiers Support vector machines 2 / 52

Vector space classification k nearest neighbors Linear classifiers Support vector machines Vector space classification 3 / 52

Vector space classification k nearest neighbors Linear classifiers Support vector machines Recall vector space representation 4 / 52 ▶ Each document is a vector, one component for each term. ▶ Terms are axes. ▶ High dimensionality: 100,000s of dimensions ▶ Normalize vectors (documents) to unit length ▶ How can we do classification in this space?

Vector space classification k nearest neighbors Linear classifiers Support vector machines Vector space classification points or vectors in the vector space. 5 / 52 ▶ The training set is a set of documents, each labeled with its class. ▶ In vector space classification, this set corresponds to a labeled set of ▶ Premise 1: Documents in the same class form a contiguous region. ▶ Premise 2: Documents from difgerent classes don’t overlap. ▶ We define lines, surfaces, hypersurfaces to divide regions.

Vector space classification x assigned to China How do we find separators that do a good job at UK Kenya China k nearest neighbors 6 / 52 x x x Classes in the vector space Support vector machines Linear classifiers ⋄ ⋄ ⋄ ⋄ ⋄ ⋆ ⋄ Should the document ⋆ be assigned to China , UK or Kenya ? Find separators between the classes Based on these separators: ⋆ should be classifying new documents like ⋆ ?

Vector space classification k nearest neighbors Linear classifiers Support vector machines k nearest neighbors 7 / 52

Vector space classification k nearest neighbors Linear classifiers Support vector machines kNN classification short time … …and you don’t care about efgiciency that much … …use kNN. 8 / 52 ▶ kNN classification is another vector space classification method. ▶ It also is very simple and easy to implement. ▶ kNN is more accurate (in most cases) than Naive Bayes ▶ If you need to get a pretuy accurate classifier up and running in a

Vector space classification k nearest neighbors Linear classifiers Support vector machines kNN classification the class of its nearest neighbor in the training set. the majority class of its k nearest neighbors in the training set. – far away points do not influence the classification decision. label as the training documents located in the local region surrounding d (contiguity hypothesis). 9 / 52 ▶ kNN classification rule for k = 1 (1NN): Assign each test document to ▶ 1NN is not very robust, one document can be mislabeled or atypical. ▶ kNN classification rule for k > 1 (kNN): Assign each test document to ▶ This amounts to locally defined decision boundaries between classes ▶ Rationale of kNN: We expect a test document d to have the same

Vector space classification k nearest neighbors Linear classifiers Support vector machines Probabilistic kNN 10 / 52 ▶ Probabilistic version of kNN: P ( c | d ) = fraction of k neighbors of d that are in c ▶ kNN classification rule for probabilistic kNN: Assign d to class c with highest P ( c | d )

Vector space classification x x k nearest neighbors x x x x x x Support vector machines x Linear classifiers 11 / 52 kNN is based on Voronoi tessellation x x ⋄ ⋄ ⋄ ⋄ ⋄ ⋄ ⋆ ⋄ ⋄ ⋄ ⋄ ⋄

Vector space classification 3 4 3 2 1 k nearest neighbors 12 / 52 2 1 kNN algorithm Support vector machines Linear classifiers Train-kNN ( C , D ) D ′ ← Preprocess ( D ) k ← Select-k ( C , D ′ ) return D ′ , k Apply-kNN ( D ′ , k , d ) S k ← ComputeNearestNeighbors ( D ′ , k , d ) for each c j ∈ C ( D ′ ) do p j ← | S k ∩ c j | / k return arg max j p j

Vector space classification x (i) 1-NN (ii) 3-NN (iii) 9-NN (iv) 15-NN How is star classified by: o o o o o x x x k nearest neighbors x x x x x x Exercise Support vector machines Linear classifiers 13 / 52 ⋆

Vector space classification testing testing training k nearest neighbors without preprocessing of training set training with preprocessing of training set Time complexity of kNN Support vector machines Linear classifiers 14 / 52 Θ( | D | L ave ) Θ( L a + | D | M ave M a ) = Θ( | D | M ave M a ) Θ(1) Θ( L a + | D | L ave M a ) = Θ( | D | L ave M a ) ▶ M ave , M a is the size of vocabulary of a document (average, test) ▶ L ave , L a is the length of a document (average, test) ▶ kNN test time proportional to the size of the training set! ▶ The larger the training set, the longer it takes to classify a test doc. ▶ kNN is inefgicient for very large training sets.

Vector space classification k nearest neighbors scan. nearest neighbors. training documents. retrievals using the test document as a query to a database of documents in collection. kNN with inverted index Support vector machines Linear classifiers 15 / 52 ▶ Naively finding nearest neighbors requires a linear search through | D | ▶ Finding k nearest neighbors is the same as determining the k best ▶ Use standard vector space inverted index methods to find the k ▶ Testing time: O ( | D | ) , that is, still linear in the number of documents. (Length of postings lists approximately linear in number of docs D .) ▶ But constant factor much smaller for inverted index than for linear

Vector space classification k nearest neighbors Linear classifiers Support vector machines kNN: Discussion Naive Bayes. kNN is linear. 16 / 52 ▶ No training necessary ▶ But linear preprocessing of documents is as expensive as training ▶ We always preprocess the training set, so in reality training time of ▶ kNN is very accurate if training set is large. ▶ Optimality result: asymptotically zero error if Bayes rate is zero. ▶ But kNN can be very inaccurate if training set is small.

Vector space classification k nearest neighbors Linear classifiers Support vector machines Linear classifiers 17 / 52

Vector space classification k nearest neighbors explain on the next slides 18 / 52 Support vector machines Linear classifiers Linear classifiers ▶ Definition: ▶ A linear classifier computes a linear combination or weighted sum ∑ i w i x i of the feature values. ▶ Classification decision: ∑ i w i x i > θ ? …where θ (the threshold) is a parameter. ▶ (First, we only consider binary classifiers.) ▶ Geometrically, this corresponds to a line (2D), a plane (3D) or a hyperplane (higher dimensionalities), the separator. ▶ We find this separator based on training set. ▶ Methods for finding separator: Perceptron, Naive Bayes – as we will ▶ Assumption: The classes are linearly separable.

Vector space classification k nearest neighbors Linear classifiers Support vector machines A linear classifier in 1D a point described by the are in the class c . are in the complement class c . 19 / 52 ▶ A linear classifier in 1D is equation w 1 d 1 = θ ▶ The point at θ / w 1 ▶ Points ( d 1 ) with w 1 d 1 ≥ θ ▶ Points ( d 1 ) with w 1 d 1 < θ

Vector space classification k nearest neighbors the complement class c . the class c . classifier 20 / 52 a line described by the A linear classifier in 2D Support vector machines Linear classifiers ▶ A linear classifier in 2D is equation w 1 d 1 + w 2 d 2 = θ ▶ Example for a 2D linear ▶ Points ( d 1 d 2 ) with w 1 d 1 + w 2 d 2 ≥ θ are in ▶ Points ( d 1 d 2 ) with w 1 d 1 + w 2 d 2 < θ are in

Vector space classification k nearest neighbors class c . are in the complement are in the class c . classifier 21 / 52 equation a plane described by the A linear classifier in 3D Support vector machines Linear classifiers ▶ A linear classifier in 3D is w 1 d 1 + w 2 d 2 + w 3 d 3 = θ ▶ Example for a 3D linear ▶ Points ( d 1 d 2 d 3 ) with w 1 d 1 + w 2 d 2 + w 3 d 3 ≥ θ ▶ Points ( d 1 d 2 d 3 ) with w 1 d 1 + w 2 d 2 + w 3 d 3 < θ

Vector space classification M positions in d as k did in our original definition of Naive Bayes) k nearest neighbors 22 / 52 Support vector machines Naive Bayes as a linear classifier Linear classifiers ▶ Multinomial Naive Bayes is linear classifier (in log space) defined by: ∑ w i d i = θ i =1 ▶ where ▶ w i = log [ˆ P ( t i | c )/ˆ P ( t i | ¯ c )] , ▶ d i = number of occurrences of t i in d , and ▶ θ = − log [ˆ P ( c )/ˆ P (¯ c )] . ▶ Here, the index i , 1 ≤ i ≤ M , refers to terms of the vocabulary (not to

Vector space classification x described as classifiers that can be general not linear linear … classes are piecewise boundaries between nearest neighbors. based on majority of k k nearest neighbors 23 / 52 x Linear classifiers Support vector machines kNN is not a linear classifier x x x x x x ⋄ ▶ Classification decision x x x ⋄ ⋄ ⋄ ▶ The decision ⋄ ⋄ ⋆ ⋄ ⋄ ⋄ ⋄ ⋄ ▶ …but they are in ∑ M i =1 w i d i = θ .

NPFL103: Information Retrieval (9) Vector Space Classification - PowerPoint PPT Presentation

Vector space classification k nearest neighbors Linear classifiers Support vector machines NPFL103: Information Retrieval (9) Vector Space Classification Pavel Pecina Institute of Formal and Applied Linguistics Faculty of Mathematics and

NPFL103: Information Retrieval (1) Introduction, Boolean retrieval, Inverted index, Text

NPFL103: Information Retrieval (4) Ranked retrieval, Term weighting, Vector space model Pavel

NPFL103: Information Retrieval (8) Language Models for Information Retrieval, Text Classification

NPFL103: Information Retrieval (2) Dictionaries, Tolerant retrieval, Spelling correction Pavel

NPFL103: Information Retrieval (11) Latent semantic indexing Pavel Pecina Institute of Formal

Information Retrieval Introducing Information Retrieval and Web Search Information Retrieval

NPFL103: Information Retrieval (6) Result summaries, Relevance Feedback, Qvery Expansion Pavel

NPFL103: Information Retrieval (12) Web search, Crawling, Spam detection Pavel Pecina Institute

NPFL103: Information Retrieval (10) Document clustering Pavel Pecina Institute of Formal and

NPFL103: Information Retrieval (3) Index construction, Distributed and dynamic indexing, Index

NPFL103: Information Retrieval (5) Ranking, Complete search system, Evaluation, Benchmarks Pavel

XML Retrieval XML Retrieval XML Retrieval XML Retrieval DB/IR in DB/IR in Theory Theory Web

CS54701: Information Retrieval CS-54701 Information Retrieval Retrieval Models: Language models

CS54701: Information Retrieval CS-54701 Information Retrieval Luo Si Department of Computer

Retrieval by Content Part 2: Text Retrieval Term Frequency and Inverse Document Frequency

Model Divergence Retrieval LM, session 10 CS6200: Information Retrieval Slides by: Jesse

Distributed Data Classification Chih-Jen Lin Department of Computer Science National Taiwan

Ri Risk bo bounds unds for r some me classificati tion n and nd re regre ression models

Multi-dimensional Packet Classification Yadi Ma, Suman Banerjee University of Wisconsin-Madison

An unsupervised classification process for large datasets based on web reasoning Rafael PEIXOTO,

Automated Application Signature Generation Using LASER and Cosine Similarity Byungchul Park, Jae

SANTA CLARA UNIVERSITY HUMAN RESOURCES COMMUNICATIONS COMMITTEE December 4, 2019 Agenda 2

Machine Learning & Decision Trees CS16: Introduction to Data Structures & Algorithms

Slide 1 ___________________________________ 2.1 Ge ne r al Cost Classific ations o Costs are an

NPFL103: Information Retrieval (9) Vector Space Classification - PowerPoint PPT Presentation

Vector space classification k nearest neighbors Linear classifiers Support vector machines NPFL103: Information Retrieval (9) Vector Space Classification Pavel Pecina Institute of Formal and Applied Linguistics Faculty of Mathematics and

NPFL103: Information Retrieval (1) Introduction, Boolean retrieval, Inverted index, Text

NPFL103: Information Retrieval (4) Ranked retrieval, Term weighting, Vector space model Pavel

NPFL103: Information Retrieval (8) Language Models for Information Retrieval, Text Classification

NPFL103: Information Retrieval (2) Dictionaries, Tolerant retrieval, Spelling correction Pavel

NPFL103: Information Retrieval (11) Latent semantic indexing Pavel Pecina Institute of Formal

Information Retrieval Introducing Information Retrieval and Web Search Information Retrieval

NPFL103: Information Retrieval (6) Result summaries, Relevance Feedback, Qvery Expansion Pavel

NPFL103: Information Retrieval (12) Web search, Crawling, Spam detection Pavel Pecina Institute

NPFL103: Information Retrieval (10) Document clustering Pavel Pecina Institute of Formal and

NPFL103: Information Retrieval (3) Index construction, Distributed and dynamic indexing, Index

NPFL103: Information Retrieval (5) Ranking, Complete search system, Evaluation, Benchmarks Pavel

XML Retrieval XML Retrieval XML Retrieval XML Retrieval DB/IR in DB/IR in Theory Theory Web

CS54701: Information Retrieval CS-54701 Information Retrieval Retrieval Models: Language models

CS54701: Information Retrieval CS-54701 Information Retrieval Luo Si Department of Computer

Retrieval by Content Part 2: Text Retrieval Term Frequency and Inverse Document Frequency

Model Divergence Retrieval LM, session 10 CS6200: Information Retrieval Slides by: Jesse

Distributed Data Classification Chih-Jen Lin Department of Computer Science National Taiwan

Ri Risk bo bounds unds for r some me classificati tion n and nd re regre ression models

Multi-dimensional Packet Classification Yadi Ma, Suman Banerjee University of Wisconsin-Madison

An unsupervised classification process for large datasets based on web reasoning Rafael PEIXOTO,

Automated Application Signature Generation Using LASER and Cosine Similarity Byungchul Park, Jae

SANTA CLARA UNIVERSITY HUMAN RESOURCES COMMUNICATIONS COMMITTEE December 4, 2019 Agenda 2

Machine Learning &amp; Decision Trees CS16: Introduction to Data Structures &amp; Algorithms

Slide 1 ___________________________________ 2.1 Ge ne r al Cost Classific ations o Costs are an

Machine Learning & Decision Trees CS16: Introduction to Data Structures & Algorithms