Natural Language Processing CSCI 4152/6509 — Lecture 11 IR Measures and Text Mining
Instructor: Vlado Keselj Time and date: 09:35–10:25, 30-Jan-2020 Location: Dunn 135
CSCI 4152/6509, Vlado Keselj Lecture 11 1 / 18
Natural Language Processing CSCI 4152/6509 Lecture 11 IR Measures - - PowerPoint PPT Presentation
Natural Language Processing CSCI 4152/6509 Lecture 11 IR Measures and Text Mining Instructor: Vlado Keselj Time and date: 09:3510:25, 30-Jan-2020 Location: Dunn 135 CSCI 4152/6509, Vlado Keselj Lecture 11 1 / 18 Previous Lecture
CSCI 4152/6509, Vlado Keselj Lecture 11 1 / 18
CSCI 4152/6509, Vlado Keselj Lecture 11 2 / 18
i=1 wi,qwi,d
i=1 w2 i,q ·
i=1 w2 i,d
α q d cos = sim(d,q) α x y z
CSCI 4152/6509, Vlado Keselj Lecture 11 3 / 18
http: //nlp.stanford.edu/IR-book/html/htmledition/irbook.html
CSCI 4152/6509, Vlado Keselj Lecture 11 4 / 18
CSCI 4152/6509, Vlado Keselj Lecture 11 5 / 18
CSCI 4152/6509, Vlado Keselj Lecture 11 6 / 18
P R 1 1
CSCI 4152/6509, Vlado Keselj Lecture 11 7 / 18
CSCI 4152/6509, Vlado Keselj Lecture 11 8 / 18
CSCI 4152/6509, Vlado Keselj Lecture 11 9 / 18
◮ Set 1: {R} (R = 0.125, P = 1) ◮ Set 2: {R, R} (R = 0.25, P = 1) ◮ Set 3: {R, R, R}, (R = 0.375, P = 1) ◮ Set 4: {R, R, R, NR}, (R = 0.375, P = 0.75) ◮ Set 5: {R, R, R, NR, R}, (R = 0.5, P = 0.8) ◮ . . . etc. CSCI 4152/6509, Vlado Keselj Lecture 11 10 / 18
1 2 3 4 5 6 7 8 9 10 11 12
CSCI 4152/6509, Vlado Keselj Lecture 11 11 / 18
k,R(k)≥r P(k)
8 = 0.375 ⇒ IntPrec(r) = 1
8 = 0.5 ⇒ IntPrec(r) = 0.8
8 = 0.625 ⇒ IntPrec(r) = 5/7 ≈
8 = 0.75 ⇒ IntPrec(r) = 0.6
CSCI 4152/6509, Vlado Keselj Lecture 11 12 / 18
1 2 3 4 5 6 7 8 9 10 11 12
CSCI 4152/6509, Vlado Keselj Lecture 11 13 / 18
CSCI 4152/6509, Vlado Keselj Lecture 11 14 / 18
◮ Text Visualization ◮ Filtering tasks, Event Detection ◮ Terminology Extraction CSCI 4152/6509, Vlado Keselj Lecture 11 15 / 18
CSCI 4152/6509, Vlado Keselj Lecture 11 16 / 18
CSCI 4152/6509, Vlado Keselj Lecture 11 17 / 18
◮ typically rule-based classifier ◮ example: detect or count occurrences of some
◮ In other words, classifiers are generated based
◮ supervised learning CSCI 4152/6509, Vlado Keselj Lecture 11 18 / 18