The Art of Predictive Analytics: More Data, Same Models [STUDY - - PowerPoint PPT Presentation

the art of predictive analytics more data same models
SMART_READER_LITE
LIVE PREVIEW

The Art of Predictive Analytics: More Data, Same Models [STUDY - - PowerPoint PPT Presentation

The Art of Predictive Analytics: More Data, Same Models [STUDY SLIDES] Joseph Turian joseph@metaoptimize.com @turian 2012.02.02 MetaOptimize NOTE: These are the STUDY slides from my talk at the predictive analytics meetup:


slide-1
SLIDE 1

The Art of Predictive Analytics: More Data, Same Models [STUDY SLIDES]

Joseph Turian joseph@metaoptimize.com @turian MetaOptimize

2012.02.02

slide-2
SLIDE 2

NOTE: These are the STUDY slides from my talk at the predictive analytics meetup: http://bit.ly/xVLBuS I have removed some graphics, and added some text.

Please email me any questions

slide-3
SLIDE 3

Who am I?

Engineer with 20 yrs coding exp PhD 10 yrs exp: large-scale ML + NLP Founded MetaOptimize

slide-4
SLIDE 4

What is MetaOptimize?

Consultancy + community on: Large-scale ML + NLP Well engineered solutions

slide-5
SLIDE 5

“Both NLP and ML have a lot of folk wisdom about what works and what doesn't. [This site] is crucial for sharing this collective knowledge.” - @aria42

http://metaoptimize.com/qa/

slide-6
SLIDE 6

http://metaoptimize.com/qa/

slide-7
SLIDE 7

http://metaoptimize.com/qa/

slide-8
SLIDE 8

“A lot of expertise in machine learning is simply developing effective biases.”

  • Dan Melamed

(quoted from memory)

slide-9
SLIDE 9

What's a good choice of learning rate for the second layer of this neural net on image patches? [intuition] (Yoshua Bengio)

0.02!

slide-10
SLIDE 10

Occam's Razor is a great example of ML intuition

slide-11
SLIDE 11

Without the aid of prejudice and custom I should not be able to find my way across the room.

  • William Hazlitt
slide-12
SLIDE 12

It's fun to be a geek

slide-13
SLIDE 13

Be an artist

slide-14
SLIDE 14

Be an artist

slide-15
SLIDE 15

How to build the world's biggest langid (langcat) model?

slide-16
SLIDE 16
slide-17
SLIDE 17
slide-18
SLIDE 18

+ Vowpal Wabbit = Win

slide-19
SLIDE 19

How to build the world's biggest langid (langcat) model? SOLVED.

slide-20
SLIDE 20

The art of predictive analytics: 1) Know the data out there 2) Know the code out there 3) Intuition (bias)

slide-21
SLIDE 21

A lot of data with one feature correlated with the label

slide-22
SLIDE 22

Twitter sentiment analysis?

slide-23
SLIDE 23
slide-24
SLIDE 24

Awesome! RT @rupertgrintnet Harry Potter Marks Place in Film History http://bit.ly/Eusxi :)

“Distant supervision” (Go et al., 09) (Use emoticons as labels)

slide-25
SLIDE 25

Recipe: You know a lot about the problem

Smart Priors

slide-26
SLIDE 26

You know a lot about the problem: Smart Priors

Yarowsky (1995), WSD 1) One sense per collocation. 2) One sense per discourse.

slide-27
SLIDE 27

Recipe: You know a lot about the problem

Create new features

slide-28
SLIDE 28

You know a lot about the problem: Create new features

Error-analysis

slide-29
SLIDE 29

What errors is your model making? DO SOME EXPLORATORY DATA ANALYSIS (EDA)

slide-30
SLIDE 30

Andrew Ng: “Advice for applying ML” Where do the errors come from?

slide-31
SLIDE 31

Recipe: You know a little about the problem

Semi-supervised learning

slide-32
SLIDE 32

You know a little about the problem: Semi-supervised learning

JOINT semi-supervised learning Ando and Zhang (2005) Suzuki and Isozaki (2008) Suzuki et al. (2009), etc. => effective but task-specific

slide-33
SLIDE 33

You know a little about the problem: Semi-supervised learning

Unsupervised learning, followed by Supervised learning

slide-34
SLIDE 34

34

Sup model Sup data

Supervised training

How can Bob improve his model?

slide-35
SLIDE 35

35

Sup model Sup data

Supervised training

Semi-sup training?

slide-36
SLIDE 36

36

Sup model Sup data

Supervised training

Semi-sup training? More feats

slide-37
SLIDE 37

37

Sup model Sup data More feats Sup model Sup data More feats sup task 1 sup task 2

More features can be used on different tasks

slide-38
SLIDE 38

38

Semi-sup model Unsup data Sup data

Joint semi-sup

(standard semi-sup setup)

slide-39
SLIDE 39

39

Semi-sup model Unsup model Unsup data Sup data unsup pretraining semi-sup fine-tuning

Unsupervised, then supervised

slide-40
SLIDE 40

40

Unsup model Unsup data unsup training unsup feats

Use unsupervised learning to create new features

slide-41
SLIDE 41

41

Semi-sup model Unsup data unsup training Sup training Sup data unsup feats

These features can then be shared with other people

slide-42
SLIDE 42

42

Unsup data unsup training unsup feats sup task 1 sup task 2 sup task 3

slide-43
SLIDE 43

Recipe: You know almost nothing about the problem

Build cool generic features

slide-44
SLIDE 44

Know almost nothing about problem: Build cool generic features

Word features (Turian et al., 2010)

http://metaoptimize.com/projects/wordreprs/

slide-45
SLIDE 45

45

Brown clustering (Brown et al. 92)

(image from Terry Koo)

cluster(chairman) = `0010’ 2-prefix(cluster(chairman)) = `00’

slide-46
SLIDE 46

46

50-dim embeddings: Collobert + Weston (2008) t-SNE vis by van der Maaten + Hinton (2008)

slide-47
SLIDE 47

Know almost nothing about problem: Build cool generic features

Document features: Document clustering LSA/LDA Deep model

slide-48
SLIDE 48

Document features

Salakhutdinov + Hinton 06

slide-49
SLIDE 49

Domain adaptation for sentiment analysis (Glorot et al. 11)

Document features example

slide-50
SLIDE 50

Recipe: You know a little about the problem Make more REAL training examples

slide-51
SLIDE 51

Make more real training examples

Cuz you have some time

  • r a small budget

Amazon Mechanical Turk

slide-52
SLIDE 52

Snow et al. 08 “Cheap and Fast – But is it Good?”

1K turk labels per dollar Average over (5) Turks to reduce noise => http://crowdflower.com/

slide-53
SLIDE 53

Soylent (Bernstein et al. 10)

Find-Fix-Verify: Crowd control design pattern

Soylent, a prototype... Soylent, a prototype... Soylent, a prototype... Soylent, a prototype...

Find a problem Fix each problem Verify quality

  • f each fix
slide-54
SLIDE 54

Make more real training examples

Active learning

slide-55
SLIDE 55

Dualist (Settles 11) http://code.google.com/p/dualist/

slide-56
SLIDE 56

Dualist (Settles 11) http://code.google.com/p/dualist/ Applications: Document categorization WSD Information Extraction Twitter sentiment analysis

slide-57
SLIDE 57

You know a little about the problem: Make more training examples

FAKE training examples

slide-58
SLIDE 58

NOISE

slide-59
SLIDE 59

FAKE training examples

Denoising AA RBM

slide-60
SLIDE 60

MNIST distortions (LeCun et al. 98)

slide-61
SLIDE 61

No negative examples?

slide-62
SLIDE 62

FAKE training examples

Multi-view / multi-modal

slide-63
SLIDE 63

Multi-view / multi-modal

How do you evaluate an IR system, if you have no labels? See how good the title is at retrieving the body text.

slide-64
SLIDE 64

2) KNOW THE DATA

slide-65
SLIDE 65

Know the data

Labelled/structured data: ODP, Freebase, Wikipedia, Dbpedia, etc.

slide-66
SLIDE 66

Know the data Unlabelled data: WaCKy, ClueWeb09, CommonCrawl, Ngram corpora

slide-67
SLIDE 67

Ngrams

Google Bing Google Books Roll your own: Common crawl

slide-68
SLIDE 68

Know the data Do something stupid on a lot of data

slide-69
SLIDE 69

Do something stupid on a lot of data: Ngrams

Spell-checking Phrase segmentation Word breaking Synonyms Language models

See “An Overview of Microsoft Web N-gram Corpus and Applications” (Wang et al 10)

slide-70
SLIDE 70

Do something stupid on a lot of data

Web-scale k-means for NER (Lin and Wu 09)

slide-71
SLIDE 71

Do something stupid on a lot of data

Web-scale clustering

slide-72
SLIDE 72

Know the data Multi-modal learning

slide-73
SLIDE 73

Multi-modal learning Images and captions features features “facepalm” =

slide-74
SLIDE 74

Multi-modal learning Titles and article body features features Article body = Title

slide-75
SLIDE 75

Multi-modal learning Audio and tags features features “upbeat”, “hip hop” =

slide-76
SLIDE 76

3) IT'S MODELS ALL THE WAY DOWN

slide-77
SLIDE 77

Break down a pipeline 1-best (greedy), k-best, Finkel et al. 06

slide-78
SLIDE 78

Good code to build on Stanford NLP tools, clustering algorithms, Terry Koo's parser, etc.

slide-79
SLIDE 79

Good code to build on YOUR MODEL

slide-80
SLIDE 80

Eat your own dogfood Bootstrapping (Yarowsky 95) Co-training (Blum+Mitchell 98) EM (Nigam et al., 00) Self-training (McClosky et al., 06)

slide-81
SLIDE 81

Dualist (Settles '11) Active learning + semisup learning

slide-82
SLIDE 82

Eat your own dogfood Cheap bootstrapping: One step of EM (Settles 11) “Awesome! What a great movie!”

slide-83
SLIDE 83

It's models all the way down Use models to annotate Low recall + high precision + lots of data = win

slide-84
SLIDE 84

Use models to annotate Face modeling

slide-85
SLIDE 85

Pose-invariant face features

slide-86
SLIDE 86

Pose-invariant face features

slide-87
SLIDE 87

It's models all the way down

THE FUTURE?

Joins on large noisy data sets

slide-88
SLIDE 88

Joins on large noisy data sets ReVerb (Fader et al., 11) http://reverb.cs.washington.edu Extractions over entire ClueWeb09 (826 MB compressed)

slide-89
SLIDE 89

ReVerb (Fader et al., 11)

slide-90
SLIDE 90

Joins on noisy data sets (can clean up the data??)

???

slide-91
SLIDE 91

The art of predictive analytics: 1) Know the data out there 2) Know the code out there 3) Intuition (bias)

slide-92
SLIDE 92

Summary of recipes:

Know your problem Throw in good features Use other's good models in yr pipeline Make more training examples Use a lot of data

slide-93
SLIDE 93

"It especially annoys me when racists are accused of 'discrimination.' The ability to discriminate is a precious facility; by judging all members of one 'race' to be the same, the racist precisely shows himself incapable of discrimination."

  • Christopher Hitchens (RIP)
slide-94
SLIDE 94

Other cool research to look at: * Frustratingly easy domain adaptation (Daume 07) * The Unreasonable Effectiveness of Data (Halevy et al 09) * Web-scale algorithms (search on http://metaoptimize.com/qa/) * Self-taught learning (Raina et al 07)

slide-95
SLIDE 95

Joseph Turian joseph@metaoptimize.com @turian http://metaoptimize.com/qa/

2012.02.02

Please email me any questions