INFO 4300 / CS4300 Information Retrieval slides adapted from - - PowerPoint PPT Presentation

info 4300 cs4300 information retrieval slides adapted
SMART_READER_LITE
LIVE PREVIEW

INFO 4300 / CS4300 Information Retrieval slides adapted from - - PowerPoint PPT Presentation

INFO 4300 / CS4300 Information Retrieval slides adapted from Hinrich Sch utzes, linked from http://informationretrieval.org/ IR 25/25: Text Classification and Exam Overview Paul Ginsparg Cornell University, Ithaca, NY 2 Dec 2009 1 / 59


slide-1
SLIDE 1

INFO 4300 / CS4300 Information Retrieval slides adapted from Hinrich Sch¨ utze’s, linked from http://informationretrieval.org/

IR 25/25: Text Classification and Exam Overview

Paul Ginsparg

Cornell University, Ithaca, NY

2 Dec 2009

1 / 59

slide-2
SLIDE 2

Administrativa

Assignment 4 due Fri 3 Dec (extended to Sun 5 Dec). Mon 13 Dec, Early Final examination, 2:00-4:30 p.m., Upson B17 (by prior notification of intent via CMS) Fri 17 Dec, Final examination, 2:00-4:30 p.m., in Hollister Hall B14 Office Hours: Wed 8 Dec, Fri 10 Dec, Wed 15 Dec No office hour: Fri 3 Dec (due to conflict with talk I’m giving that afternoon)

2 / 59

slide-3
SLIDE 3

Overview

1

Recap

2

Discussion

3

More Statistical Learning

4

Naive Bayes, cont’d

5

Evaluation of TC

6

NB independence assumptions

7

Structured Retrieval

8

Exam Overview

3 / 59

slide-4
SLIDE 4

Outline

1

Recap

2

Discussion

3

More Statistical Learning

4

Naive Bayes, cont’d

5

Evaluation of TC

6

NB independence assumptions

7

Structured Retrieval

8

Exam Overview

4 / 59

slide-5
SLIDE 5

Formal definition of TC — summary

Training Given: A document space X

Documents are represented in some high-dimensional space.

A fixed set of classes C = {c1, c2, . . . , cJ}

human-defined for needs of application (e.g., rel vs. non-rel).

A training set D of labeled documents d, c ∈ X × C Using a learning method or learning algorithm, we then wish to learn a classifier γ that maps documents to classes: γ : X → C Application/Testing Given: a description d ∈ X of a document Determine: γ(d) ∈ C, i.e., the class most appropriate for d

5 / 59

slide-6
SLIDE 6

Classification methods — summary

  • 1. Manual (accurate if done by experts, consistent for problem size

and team is small difficult and expensive to scale)

  • 2. Rule-based (accuracy very high if a rule has been carefully

refined over time by a subject expert, building and maintaining expensive)

  • 3. Statistical/Probabilistic

As per our definition of the classification problem – text classification as a learning problem Supervised learning of a the classification function γ and its application to classifying new documents We have looked at a couple of methods for doing this: Rocchio, kNN. Now Naive Bayes No free lunch: requires hand-classified training data But this manual classification can be done by non-experts.

6 / 59

slide-7
SLIDE 7

The Naive Bayes classifier

The Naive Bayes classifier is a probabilistic classifier. We compute the probability of a document d being in a class c as follows: P(c|d) ∝ P(c)

  • 1≤k≤nd

P(tk|c) nd is the length of the document. (number of tokens) P(tk|c) is the conditional probability of term tk occurring in a document of class c P(tk|c) as a measure of how much evidence tk contributes that c is the correct class. P(c) is the prior probability of c. If a document’s terms do not provide clear evidence for one class vs. another, we choose the c with higher P(c).

7 / 59

slide-8
SLIDE 8

To avoid zeros: Add-one smoothing

Add one to each count to avoid zeros: ˆ P(t|c) = Tct + 1

  • t′∈V (Tct′ + 1) =

Tct + 1 (

t′∈V Tct′) + B

B is the number of different words (in this case the size of the vocabulary: |V | = M)

8 / 59

slide-9
SLIDE 9

Exercise

docID words in document in c = China? training set 1 Chinese Beijing Chinese yes 2 Chinese Chinese Shanghai yes 3 Chinese Macao yes 4 Tokyo Japan Chinese no test set 5 Chinese Chinese Chinese Tokyo Japan ? Estimate parameters of Naive Bayes classifier Classify test document

9 / 59

slide-10
SLIDE 10

Example: Parameter estimates

Priors: ˆ P(c) = 3/4 and ˆ P(c) = 1/4 Conditional probabilities: ˆ P(Chinese|c) = (5 + 1)/(8 + 6) = 6/14 = 3/7 ˆ P(Tokyo|c) = ˆ P(Japan|c) = (0 + 1)/(8 + 6) = 1/14 ˆ P(Chinese|c) = ˆ P(Tokyo|c) = ˆ P(Japan|c) = (1 + 1)/(3 + 6) = 2/9 The denominators are (8 + 6) and (3 + 6) because the lengths of textc and textc are 8 and 3, respectively, and because the constant B is 6 since the vocabulary consists of six terms.

Exercise: verify that ˆ P(Chinese|c) + ˆ P(Beijing|c) + ˆ P(Shanghai|c) + ˆ P(Macao|c) + ˆ P(Tokyo|c) + ˆ P(Japan|c) = 1 and ˆ P(Chinese|c) + ˆ P(Beijing|c) + ˆ P(Shanghai|c) + ˆ P(Macao|c) + ˆ P(Tokyo|c) + ˆ P(Japan|c) = 1

10 / 59

slide-11
SLIDE 11

Example: Classification

d5 = (Chinese Chinese Chinese Tokyo Japan) ˆ P(c|d5) ∝ 3/4 · (3/7)3 · 1/14 · 1/14 ≈ 0.0003 ˆ P(c|d5) ∝ 1/4 · (2/9)3 · 2/9 · 2/9 ≈ 0.0001 Thus, the classifier assigns the test document to c = China: the three occurrences of the positive indicator Chinese in d5

  • utweigh the occurrences of the two negative indicators Japan and

Tokyo. Exercise: evaluate ˆ P(c|d) and ˆ P(c|d) for d6 = (Chinese Chinese Tokyo Japan) and d7 = (Chinese Tokyo Japan)

11 / 59

slide-12
SLIDE 12

Outline

1

Recap

2

Discussion

3

More Statistical Learning

4

Naive Bayes, cont’d

5

Evaluation of TC

6

NB independence assumptions

7

Structured Retrieval

8

Exam Overview

12 / 59

slide-13
SLIDE 13

Discussion 6

More Statistical Methods: Peter Norvig, “How to Write a Spelling Corrector” http://norvig.com/spell-correct.html See also http://yehha.net/20794/facebook.com/peter-norvig.html, ”Engineering@Facebook: Tech Talk with Peter Norvig” roughly 00:11:00 – 00:19:15 of a one hour video, but whole first half (or more) if you have time... or as well http://videolectures.net/cikm08 norvig slatuad/, “Statistical Learning as the Ultimate Agile Development Tool” Additional related reference: http://doi.ieeecomputersociety.org/10.1109/MIS.2009.36

  • A. Halevy, P. Norvig, F. Pereira,

The Unreasonable Effectiveness of Data, Intelligent Systems Mar/Apr 2009 (copy at readings/unrealdata.pdf)

13 / 59

slide-14
SLIDE 14

A little theory

Find the correction c that maximizes the probability of c given the

  • riginal word w:

argmaxc P(c|w) By Bayes’ Theorem, equivalent to argmaxc P(w|c)P(c)/P(w). P(w) the same for every possible c, so ignore, and consider: argmaxc P(w|c)P(c) . Three parts : P(c), the probability that a proposed correction c stands on its own. The language model: “how likely is c to appear in an English text?” (P(“the”) high, P(“zxzxzxzyyy”) near zero) P(w|c), the probability that w would be typed when author meant c. The error model: “how likely is author to type w by mistake instead of c?” argmaxc, the control mechanism: choose c that gives the best combined probability score.

14 / 59

slide-15
SLIDE 15

Example

w=“thew” two candidate corrections c=“the” and c=“thaw”. which has higher P(c|w)? “thaw” has only small change “a” to “e” “the” is a very common word, and perhaps the typist’s finger slipped off the “e” onto the “w”. To estimate P(c|w), have to consider both the probability of c and the probability of the change from c to w

15 / 59

slide-16
SLIDE 16

Complete Spelling Corrector

import re, collections def words(text): return re.findall(’[a-z]+’, text.lower()) def train(features): model = collections.defaultdict(lambda: 1) for f in features: model[f] += 1 return model NWORDS = train(words(file(’big.txt’).read())) alphabet = ’abcdefghijklmnopqrstuvwxyz’ = ⇒

16 / 59

slide-17
SLIDE 17

def edits1(word): s = [(word[:i], word[i:]) for i in range(len(word) + 1)] deletes = [a + b[1:] for a, b in s if b] transposes = [a + b[1] + b[0] + b[2:] for a, b in s if len(b)>1] replaces = [a + c + b[1:] for a, b in s for c in alphabet if b] inserts = [a + c + b for a, b in s for c in alphabet] return set(deletes + transposes + replaces + inserts)

def known edits2(word): return set(e2 for e1 in edits1(word) for e2 in edits1(e1) if e2 in NWORDS)

def known(words): return set(w for w in words if w in NWORDS) def correct(word): candidates = known([word]) or known(edits1(word))

  • r known edits2(word) or [word]

return max(candidates, key=NWORDS.get)

17 / 59

slide-18
SLIDE 18

Outline

1

Recap

2

Discussion

3

More Statistical Learning

4

Naive Bayes, cont’d

5

Evaluation of TC

6

NB independence assumptions

7

Structured Retrieval

8

Exam Overview

18 / 59

slide-19
SLIDE 19

More Data

Figure 1. Learning Curves for Confusion Set Disambiguation http://acl.ldc.upenn.edu/P/P01/P01-1005.pdf Scaling to Very Very Large Corpora for Natural Language Disambiguation

  • M. Banko and E. Brill (2001)

19 / 59

slide-20
SLIDE 20

More Data for this Task

http://acl.ldc.upenn.edu/P/P01/P01-1005.pdf Scaling to Very Very Large Corpora for Natural Language Disambiguation

  • M. Banko and E. Brill (2001)

The amount of readily available on-line text has reached hundreds of billions of words and continues to grow. Yet for most core natural language tasks, algorithms continue to be optimized, tested and compared after training on corpora consisting of only one million words or

  • less. In this paper, we evaluate the performance of different learning

methods on a prototypical natural language disambiguation task, confusion set disambiguation, when trained on orders of magnitude more labeled data than has previously been used. We are fortunate that for this particular application, correctly labeled training data is free. Since this will often not be the case, we examine methods for effectively exploiting very large corpora when labeled data comes at a cost. (Confusion set disambiguation is the problem of choosing the correct use

  • f a word, given a set of words with which it is commonly confused.

Example confusion sets include: {principle , principal}, {then , than}, {to , two , too} , and {weather,whether}.)

20 / 59

slide-21
SLIDE 21

Segmentation

nowisthetimeforallgoodmentocometothe Probability of a segmentation = P(first word) × P(rest) Best segmentation = one with highest probability P(word) = estimated by counting Trained on 1.7B words English, 98% word accuracy

21 / 59

slide-22
SLIDE 22

Spelling with Statistical Learning

Probability of a spelling correction, c = P(c as a word) × P(original is a typo for c) Best correction = one with highest probability P(c as a word) = estimated by counting P(original is a typo for c) = proportional to number of changes Similarly for speech recognition, using language model p(c) and acoustic model p(s|c)

22 / 59

slide-23
SLIDE 23

Google Sets

Given “lion, tiger, bear” find: bear, tiger, lion, elephant, monkey, giraffe, dog, cat, snake, horse, zebra, rabbit, wolf, dolphin, dragon, pig, frog, duck, cheetah, bird, cow, cotton, hippo, turtle, penguin, rat, gorilla, leopard, sheep, mouse, puppy, ox, rooster, fish, lamb, panda, wood, musical, toddler, fox, goat, deer, squirrel, koala, crocodile, hamster (using co-occurrence in pages)

23 / 59

slide-24
SLIDE 24

And others

Statistical Machine Translation

Collect parallel texts (“Rosetta stones”), Align

Canonical image selection from the web (Y. Jing, S. Baluja,

  • H. Rowley, 2007)

Learning people annotation from the web via consistency learning (J. Yagnik, A. Islam, 2007) (results on learning from a very large dataset of 37 million images resulting in a validation accuracy of 92.68%) fill in occluded portions of photos

24 / 59

slide-25
SLIDE 25

Outline

1

Recap

2

Discussion

3

More Statistical Learning

4

Naive Bayes, cont’d

5

Evaluation of TC

6

NB independence assumptions

7

Structured Retrieval

8

Exam Overview

25 / 59

slide-26
SLIDE 26

Naive Bayes: Analysis

Now we want to gain a better understanding of the properties

  • f Naive Bayes.

We will formally derive the classification rule . . . . . . and state the assumptions we make in that derivation explicitly.

26 / 59

slide-27
SLIDE 27

Derivation of Naive Bayes rule

We want to find the class that is most likely given the document: cmap = arg max

c∈C

P(c|d) Apply Bayes rule P(A|B) = P(B|A)P(A)

P(B)

: cmap = arg max

c∈C

P(d|c)P(c) P(d) Drop denominator since P(d) is the same for all classes: cmap = arg max

c∈C

P(d|c)P(c)

27 / 59

slide-28
SLIDE 28

Too many parameters / sparseness

cmap = arg max

c∈C

P(d|c)P(c) = arg max

c∈C

P(t1, . . . , tk, . . . , tnd |c)P(c) There are too many parameters P(t1, . . . , tk, . . . , tnd |c), one for each unique combination of a class and a sequence of words. We would need a very, very large number of training examples to estimate that many parameters. This is the problem of data sparseness.

28 / 59

slide-29
SLIDE 29

Naive Bayes conditional independence assumption

To reduce the number of parameters to a manageable size, we make the Naive Bayes conditional independence assumption: P(d|c) = P(t1, . . . , tnd |c) =

  • 1≤k≤nd

P(Xk = tk|c) We assume that the probability of observing the conjunction of attributes is equal to the product of the individual probabilities P(Xk = tk|c). Recall from earlier the estimates for these priors and conditional probabilities: ˆ P(c) = Nc

N and ˆ

P(t|c) =

Tct+1 (P

t′∈V Tct′)+B 29 / 59

slide-30
SLIDE 30

Generative model

C=China X1=Beijing X2=and X3=Taipei X4=join X5=WTO

P(c|d) ∝ P(c)

1≤k≤nd P(tk|c)

Generate a class with probability P(c) Generate each of the words (in their respective positions), conditional on the class, but independent of each other, with probability P(tk|c) To classify docs, we “reengineer” this process and find the class that is most likely to have generated the doc.

30 / 59

slide-31
SLIDE 31

Second independence assumption

ˆ P(tk1|c) = ˆ P(tk2|c) For example, for a document in the class UK, the probability

  • f generating queen in the first position of the document is

the same as generating it in the last position. The two independence assumptions amount to the bag of words model.

31 / 59

slide-32
SLIDE 32

A different Naive Bayes model: Bernoulli model

UAlaska=0 UBeijing=1 UIndia=0 Ujoin=1 UTaipei=1 UWTO=1 C=China

32 / 59

slide-33
SLIDE 33

Outline

1

Recap

2

Discussion

3

More Statistical Learning

4

Naive Bayes, cont’d

5

Evaluation of TC

6

NB independence assumptions

7

Structured Retrieval

8

Exam Overview

33 / 59

slide-34
SLIDE 34

Evaluation on Reuters

classes: training set: test set:

regions industries subject areas γ(d′) =China

first private Chinese airline

UK China poultry coffee elections sports

London congestion Big Ben Parliament the Queen Windsor Beijing Olympics Great Wall tourism communist Mao chicken feed ducks pate turkey bird flu beans roasting robusta arabica harvest Kenya votes recount run-off seat campaign TV ads baseball diamond soccer forward captain team

d′

34 / 59

slide-35
SLIDE 35

Example: The Reuters collection

symbol statistic value N documents 800,000 L

  • avg. # word tokens per document

200 M word types 400,000

  • avg. # bytes per word token (incl. spaces/punct.)

6

  • avg. # bytes per word token (without spaces/punct.)

4.5

  • avg. # bytes per word type

7.5 non-positional postings 100,000,000 type of class number examples region 366 UK, China industry 870 poultry, coffee subject area 126 elections, sports

35 / 59

slide-36
SLIDE 36

Evaluating classification

Evaluation must be done on test data that are independent of the training data (usually a disjoint set of instances). It’s easy to get good performance on a test set that was available to the learner during training (e.g., just memorize the test set). Measures: Precision, recall, F1, classification accuracy

36 / 59

slide-37
SLIDE 37

Naive Bayes vs. other methods

(a) NB Rocchio kNN SVM micro-avg-L (90 classes) 80 85 86 89 macro-avg (90 classes) 47 59 60 60 (b) NB Rocchio kNN trees SVM earn 96 93 97 98 98 acq 88 65 92 90 94 money-fx 57 47 78 66 75 grain 79 68 82 85 95 crude 80 70 86 85 89 trade 64 65 77 73 76 interest 65 63 74 67 78 ship 85 49 79 74 86 wheat 70 69 77 93 92 corn 65 48 78 92 90 micro-avg (top 10) 82 65 82 88 92 micro-avg-D (118 classes) 75 62 n/a n/a 87 Evaluation measure: F1 Naive Bayes does pretty well, but some methods beat it consistently (e.g., SVM).

See Section 13.6

37 / 59

slide-38
SLIDE 38

Outline

1

Recap

2

Discussion

3

More Statistical Learning

4

Naive Bayes, cont’d

5

Evaluation of TC

6

NB independence assumptions

7

Structured Retrieval

8

Exam Overview

38 / 59

slide-39
SLIDE 39

Violation of Naive Bayes independence assumptions

The independence assumptions do not really hold of documents written in natural language. Conditional independence: P(t1, . . . , tnd |c) =

  • 1≤k≤nd

P(Xk = tk|c) Positional independence: ˆ P(tk1|c) = ˆ P(tk2|c) Exercise

Examples for why conditional independence assumption is not really true? Examples for why positional independence assumption is not really true?

How can Naive Bayes work if it makes such inappropriate assumptions?

39 / 59

slide-40
SLIDE 40

Why does Naive Bayes work?

Naive Bayes can work well even though conditional independence assumptions are badly violated. Example: c1 c2 class selected true probability P(c|d) 0.6 0.4 c1 ˆ P(c)

1≤k≤nd ˆ

P(tk|c) 0.00099 0.00001 NB estimate ˆ P(c|d) 0.99 0.01 c1 Double counting of evidence causes underestimation (0.01) and overestimation (0.99). Classification is about predicting the correct class and not about accurately estimating probabilities. Correct estimation ⇒ accurate prediction. But not vice versa!

40 / 59

slide-41
SLIDE 41

Naive Bayes is not so naive

Naive Bayes has won some bakeoffs (e.g., KDD-CUP 97) More robust to nonrelevant features than some more complex learning methods More robust to concept drift (changing of definition of class

  • ver time) than some more complex learning methods

Better than methods like decision trees when we have many equally important features A good dependable baseline for text classification (but not the best) Optimal if independence assumptions hold (never true for text, but true for some domains) Very fast Low storage requirements

41 / 59

slide-42
SLIDE 42

Naive Bayes: Effect of feature selection

Improves performance of text classifiers

# # # # # # # # # # # # # # #

1 10 100 1000 10000 0.0 0.2 0.4 0.6 0.8 number of features selected F1 measure

  • o oo
  • x

x x x x x x x x x x x x x b b b bb b b b b b b b b b b #

  • x

b multinomial, MI multinomial, chisquare multinomial, frequency binomial, MI

(multinomial = multinomial Naive Bayes)

42 / 59

slide-43
SLIDE 43

Feature selection for Naive Bayes

In general, feature selection is necessary for Naive Bayes to get decent performance. Also true for most other learning methods in text classification: you need feature selection for optimal performance.

43 / 59

slide-44
SLIDE 44

Outline

1

Recap

2

Discussion

3

More Statistical Learning

4

Naive Bayes, cont’d

5

Evaluation of TC

6

NB independence assumptions

7

Structured Retrieval

8

Exam Overview

44 / 59

slide-45
SLIDE 45

XML markup

play authorShakespeare/author titleMacbeth/title act number=”I” scene number=”vii” titleMacbeths castle/title verseWill I with wine and wassail .../verse /scene /act /play

45 / 59

slide-46
SLIDE 46

XML Doc as DOM object

46 / 59

slide-47
SLIDE 47

Outline

1

Recap

2

Discussion

3

More Statistical Learning

4

Naive Bayes, cont’d

5

Evaluation of TC

6

NB independence assumptions

7

Structured Retrieval

8

Exam Overview

47 / 59

slide-48
SLIDE 48

Definition of information retrieval (from Lecture 1)

Information retrieval (IR) is finding material (usually documents) of an unstructured nature (usually text) that satisfies an information need from within large collections (usually stored on computers). Three scales (web, enterprise/inst/domain, personal)

48 / 59

slide-49
SLIDE 49

“Plan” (from Lecture 1)

Search full text: basic concepts Web search Probabalistic Retrieval Interfaces Metadata / Semantics IR ⇔ NLP ⇔ ML Prereqs: Introductory courses in data structures and algorithms, in linear algebra and in probability theory

49 / 59

slide-50
SLIDE 50

1st Half

Searching full text: dictionaries, inverted files, postings, implementation and algorithms, term weighting, Vector Space Model, similarity, ranking Word Statistics MRS: 1 Boolean retrieval MRS: 2 The term vocabulary and postings lists MRS: 5 Index compression MRS: 6 Scoring, term weighting, and the vector space model MRS: 7 Computing scores in a complete search system

50 / 59

slide-51
SLIDE 51

1st Half, cont’d

Evaluation of retrieval effectiveness MRS: 8. Evaluation in information retrieval Latent semantic indexing MRS: 18. Matrix decompositions and latent semantic indexing Discussion 2 IDF Discussion 3 Latent semantic indexing

51 / 59

slide-52
SLIDE 52

Midterm

1) term-document matrix, VSM, tf.idf 2) Recall/Precision 3) LSI 4) Word statistics (Heap, Zipf)

52 / 59

slide-53
SLIDE 53

2nd Half

MRS: 9 Relevance feedback and query expansion MRS: 11 Probabilistic information retrieval Web Search: anchor text and links, Citation and Link Analysis, Web crawling MRS: 19 Web search basics MRS: 21 Link analysis

53 / 59

slide-54
SLIDE 54

2nd Half, cont’d

Classification, categorization, clustering MRS: 13 Text classification and Naive Bayes MRS: 14 Vector space classification MRS: 16 Flat clustering MRS: 17 Hierarchical clustering (Structured Retrieval MRS: 10 XML Retrieval) Discussion 4 Google Discussion 5 MapReduce Discussion 6 Statistical Spell Correction

54 / 59

slide-55
SLIDE 55

Assignment 3

The page rank rj of page j is determined self-consistently by the equation rj = α n + (1 − α)

  • i|i→j

ri di , α is a number between 0 and 1 (originally taken to be .15) the sum on i is over pages i pointing to j di is the outgoing degree of page i. Incidence matrix Aij = 1 if i points to j, otherwise Aij = 0. Transition probability from page i to page j Pij = α n Oij + (1 − α) 1 di Aij where n = total # of pages, di is the outdegree of node i, and Oij = 1(∀i, j). The matrix eigenvector relation rP = r or r = PT r is equivalent to the equation above (with r is normalized as a probability, so that

i ri Oij = i ri = 1). Power Method:

r = wPm

55 / 59

slide-56
SLIDE 56

Fig 14.6 from Easley and Kleinberg, ”Networks, Crowds, and Markets” http://www.cs.cornell.edu/home/kleinber/networks-book/

56 / 59

slide-57
SLIDE 57

x x/2 x/2 x/4 x/4 x/4 x/4 x/4 x + 2 · x/2 + 5 · x/4 = 13x/4 = 1

57 / 59

slide-58
SLIDE 58

Fig 14.7 from Easley and Kleinberg, ”Networks, Crowds, and Markets” http://www.cs.cornell.edu/home/kleinber/networks-book/

58 / 59

slide-59
SLIDE 59

Final Exam, probably 4 questions from these topics

issues in personal/enterprise/webscale searching, recall/precision, and how related to info/nav/trans needs issues for modern search engines... (e.g., w.r.t. web scale, tf.idf? recall/precision?) web indexing and retrieval: link analysis, PageRank clustering: flat, hierarchical (k-means, agglomerative, similarity dendrograms): evaluation of clustering, measures of cluster similarity (single/complete link, average, group average) cluster labeling, feature selection MapReduce recommender systems, adversarial IR types of text classification (curated, rule-based, statistical), e.g., naive Bayes Vector space classification (rocchio, kNN) Fri 17 Dec 2:00-4:30 PM, Hollister Hall B14

  • r Mon 13 Dec 2:00-4:30 PM, Upson B17 (register via CMS)

59 / 59