Machine Learning 2 DS 4420 - Spring 2020 Humans-in-the-loop Byron - - PowerPoint PPT Presentation

machine learning 2
SMART_READER_LITE
LIVE PREVIEW

Machine Learning 2 DS 4420 - Spring 2020 Humans-in-the-loop Byron - - PowerPoint PPT Presentation

Machine Learning 2 DS 4420 - Spring 2020 Humans-in-the-loop Byron C. Wallace Today Reducing annotation costs : active learning and crowdsourcing Efficient annotation Figure from Settles, 08 Active learning Crowdsourcing Standard


slide-1
SLIDE 1

Machine Learning 2

DS 4420 - Spring 2020

Humans-in-the-loop

Byron C. Wallace

slide-2
SLIDE 2

Today

  • Reducing annotation costs: active learning and

crowdsourcing

slide-3
SLIDE 3

Efficient annotation

Active learning Crowdsourcing

Figure from Settles, ‘08

slide-4
SLIDE 4

Standard supervised learning

expert"annotator" labeled"data! evaluate"classifier"" test" data" learned" classifier"

slide-5
SLIDE 5

Active learning

expert"annotator" labeled"data! evaluate"classifier"" test" data" learned" classifier" expert"annotator" labeled"data! learned" classifier" evaluate"classifier"" test" data" select"x*"from" U"for"labeling!

slide-6
SLIDE 6

Active learning

Figure from Settles, ‘08

slide-7
SLIDE 7

Learning paradigms

Slide credit: Piyush Rai

slide-8
SLIDE 8

Unsupervised learning

Slide credit: Piyush Rai

slide-9
SLIDE 9

Semi-supervised learning

Slide credit: Piyush Rai

slide-10
SLIDE 10

Active learning

Slide credit: Piyush Rai

slide-11
SLIDE 11

Active learning

Slide credit: Piyush Rai

slide-12
SLIDE 12

Active learning

Slide credit: Piyush Rai

slide-13
SLIDE 13

Active learning

Slide credit: Piyush Rai

slide-14
SLIDE 14

Active learning

Slide credit: Piyush Rai

slide-15
SLIDE 15

Motivation

  • Labels are expensive
  • Maybe we can reduce the cost of training a good

model by picking training examples cleverly

slide-16
SLIDE 16

Why active learning?

Suppose classes looked like this

slide-17
SLIDE 17

Why active learning?

Suppose classes looked like this We only need 5 labels!

slide-18
SLIDE 18

Why active learning?

Example from Daniel Ting

x x x x x x x x x x 0 0 0 1 1 1 1 1

slide-19
SLIDE 19

Why active learning?

Example from Daniel Ting

x x x x x x x x x x 0 0 0 1 1 1 1 1

Labeling points out here is not helpful!

slide-20
SLIDE 20

Types of AL

  • Stream-based active learning Consider one

unlabeled instance at a time; decide whether to query for its label (or to ignore it).

  • Pool-based active learning Given a large “pool” of

unlabeled examples, rank these with some heuristic that aims to capture informativeness

slide-21
SLIDE 21

Types of AL

  • Pool-based active learning Given a large “pool” of

unlabeled examples, rank these with some heuristic that aims to capture informativeness

slide-22
SLIDE 22

Types of AL

  • Pool-based active learning Given a large “pool” of

unlabeled examples, rank these with some heuristic that aims to capture informativeness

slide-23
SLIDE 23

Pool based AL

  • Pool-based active learning proceeds in rounds

– Each round is associated with a current model that is learned using the labeled

data seen thus far

  • The model selects the most informative example(s) remaining to

be labeled at each step

– We then pay to acquire these labels

  • New labels are added to the labeled data; the model is re-

trained

  • We repeat this process until we are out of $$$
slide-24
SLIDE 24

Pool based AL

  • Pool-based active learning proceeds in rounds

– Each round is associated with a current model that is learned using the labeled data

seen thus far

  • The model selects the most informative example(s) remaining to be

labeled at each step

– We then pay to acquire these labels

  • New labels are added to the labeled data; the model is re-trained
  • We repeat this process until we are out of $$$
slide-25
SLIDE 25

Pool based AL

  • Pool-based active learning proceeds in rounds

– Each round is associated with a current model that is learned using the labeled

data seen thus far

  • The model selects the most informative example(s) remaining to

be labeled at each step

– We then pay to acquire these labels

  • New labels are added to the labeled data; the model is re-

trained

  • We repeat this process until we are out of $$$
slide-26
SLIDE 26

Pool based AL

  • Pool-based active learning proceeds in rounds

– Each round is associated with a current model that is learned using the labeled data

seen thus far

  • The model selects the most informative example(s) remaining to be

labeled at each step

– We then pay to acquire these labels

  • New labels are added to the labeled data; the model is re-trained
  • We repeat this process until we are out of $$$
slide-27
SLIDE 27

How might we pick ‘good’ unlabeled examples?

slide-28
SLIDE 28

Query by Committee (QBC)

slide-29
SLIDE 29

Query by Committee (QBC)

Picking point about which there is most disagreement

slide-30
SLIDE 30

[McCallum & Nigam, 1998]

Query by Committee (QBC)

slide-31
SLIDE 31

Active Learning using Pre-clustering

If data clusters, we only require a few representative instances from each cluster to label data

Viagra"“Bargains”" Investment"“OpportuniHes”" Work" Personal" Facebook"

[Ngyuen"&"Smeulders"04]"

Pre-Clustering

slide-32
SLIDE 32

Uncertainty sampling

  • Query the event that the current classifier is most uncertain about
  • Needs measure of uncertainty, probabilistic model for prediction!
  • Examples:

– Entropy – Least confident predicted label – Euclidean distance (e.g. point closest to margin in SVM)

slide-33
SLIDE 33

Uncertainty sampling

  • Query the event that the current classifier is most uncertain about
  • Needs measure of uncertainty, probabilistic model for prediction!
  • Examples:

– Entropy – Least confident predicted label – Euclidean distance (e.g. point closest to margin in SVM)

slide-34
SLIDE 34

Uncertainty sampling

  • Query the event that the current classifier is most uncertain about
  • Needs measure of uncertainty, probabilistic model for prediction!
  • Examples:

– Entropy – Least confident predicted label – Euclidean distance (e.g. point closest to margin in SVM)

slide-35
SLIDE 35

Uncertainty sampling

slide-36
SLIDE 36
slide-37
SLIDE 37

Let’s implement this… (“in class” exercise on active learning)

slide-38
SLIDE 38

Practical Obstacles to Deploying Active Learning

David Lowell

Northeastern University

Zachary C. Lipton

Carnegie Mellon University

Byron C. Wallace

Northeastern University

slide-39
SLIDE 39

Given

  • Pool of unlabeled data P
  • Model parameterized by θ
  • A sorting heuristic h
slide-40
SLIDE 40
slide-41
SLIDE 41
slide-42
SLIDE 42
  • Users must choose a single heuristic (AL strategy) from many

choices before acquiring more data

  • Active learning couples datasets to the model used at

acquisition time

Some issues

slide-43
SLIDE 43

Active Learning involves:

  • A data pool
  • An acquisition model and function
  • A “successor” model (to be trained)

Experiments

slide-44
SLIDE 44

Classification

Movie reviews, Subjectivity/objectivity, Customer reviews, Question type classification

Tasks & datasets

Sequence labeling (NER)

CoNLL, OntoNotes

slide-45
SLIDE 45

Classification

SVM, CNN, BiLSTM

Models

Sequence labeling (NER)

CRF, BiLSTM-CNN

slide-46
SLIDE 46

Uncertainty sampling

slide-47
SLIDE 47

(For sequences)

slide-48
SLIDE 48

Query By Committee (QBC)

slide-49
SLIDE 49

(For sequences)

slide-50
SLIDE 50

Results

  • 75.0%: there exists a heuristic that outperforms i.i.d.
  • 60.9%: a specific heuristic outperforms i.i.d.
  • 37.5%: transfer of actively acquired data outperforms i.i.d.
  • But, active learning consistently outperforms i.i.d. for

sequential tasks

slide-51
SLIDE 51

(a) Performance of AL relative to i.i.d. across corpora.

slide-52
SLIDE 52

Results

It is difficult to characterize when AL will be successful Trends:

  • Uncertainty with SVM or CNN
  • BALD with CNN
  • AL transfer leads to poor results
slide-53
SLIDE 53

Crowdsourcing

slides derived from Matt Lease

slide-54
SLIDE 54

Crowdsourcing

  • In ML, supervised learning still dominates (despite the various

innovations in self-/un-supervised learning we have seen in this class

  • Supervision is expensive; modern (deep) models need lots of it
  • One use of crowdsourcing is collecting lots of annotations, on

the cheap

slide-55
SLIDE 55

Crowdsourcing

  • In ML, supervised learning still dominates (despite the various

innovations in self-/un-supervised learning we have seen in this class

  • Supervision is expensive; modern (deep) models need lots of it
  • One use of crowdsourcing is collecting lots of annotations, on

the cheap

slide-56
SLIDE 56

Crowdsourcing

  • In ML, supervised learning still dominates (despite the various

innovations in self-/un-supervised learning we have seen in this class

  • Supervision is expensive; modern (deep) models need lots of it
  • One use of crowdsourcing is collecting lots of annotations, on

the cheap

slide-57
SLIDE 57

Crowdsourcing

Data “crowdworkers”

Y $$$

Crowdsourcing platform

$$$ Y

slide-58
SLIDE 58

Crowdsourcing

Human Intelligence Tasks (HITs)

slide-59
SLIDE 59
slide-60
SLIDE 60

asks

Recognizing textual entailment

Cheap and Fast — But is it Good? Evaluating Non-Expert Annotations for Natural Language Tasks

Rion Snow† Brendan O’Connor‡ Daniel Jurafsky§ Andrew Y. Ng†

†Computer Science Dept.

Stanford University Stanford, CA 94305

{rion,ang}@cs.stanford.edu

‡Dolores Labs, Inc.

832 Capp St. San Francisco, CA 94110

brendano@doloreslabs.com

§Linguistics Dept.

Stanford University Stanford, CA 94305

jurafsky@stanford.edu

Abstract

Our evaluation of non-expert labeler data vs. expert annotations for five tasks found that for many tasks only a small number of non- expert annotations per item are necessary to equal the performance of an expert annotator.

slide-61
SLIDE 61

Computer Vision: ! Sorokin & Forsythe (CVPR 2008)

  • 4K labels for US $60
slide-62
SLIDE 62

Dealing with noise

Problem Crowd annotations are often noisy One way to address: collect independent annotations from multiple workers But then how to combine these?

slide-63
SLIDE 63

Dealing with noise

Problem Crowd annotations are often noisy One way to address: collect independent annotations from multiple workers But then how to combine these?

slide-64
SLIDE 64

Dealing with noise

Problem Crowd annotations are often noisy One way to address: collect independent annotations from multiple workers But then how to combine these?

slide-65
SLIDE 65

Dawid-Skene

Define a simple probabilistic model of worker annotations, conditioned on latent “true” labels for instances Can easily estimate via Expectation-Maximization

slide-66
SLIDE 66

J labelers K categories (classes) I instances

slide-67
SLIDE 67

Aggregating and Predicting Sequence Labels from Crowd Annotations

An T. Nguyen1 Byron C. Wallace2 Junyi Jessy Li3 Ani Nenkova3 Matthew Lease 1

1University of Texas at Austin, 2Northeastern University, 3University of Pennsylvania,

atn@cs.utexas.edu, byron@ccs.neu.edu, {ljunyi|nenkova}@seas.upenn.edu, ml@utexas.edu

lij

Discrete

C(j)

hi hi−1 hi+1

m workers Discrete

vi

slide-68
SLIDE 68

Evidence-based Medicine

“Citizen Science”

slide-69
SLIDE 69

C

  • m

b i n i n g C r

  • w

d a n d E x p e r t L a b e l s u s i n g D e c i s i

  • n

T h e

  • r

e t i c A c t i v e L e a r n i n g

A n T . N g u y e n

D e p a r t m e n t

  • f

C

  • m

p u t e r S c i e n c e U n i v e r s i t y

  • f

T e x a s a t A u s t i n a t n @ c s . u t e x a s . e d u

B y r

  • n

C . W a l l a c e a n d M a t t h e w L e a s e

S c h

  • l
  • f

I n f

  • r

m a t i

  • n

U n i v e r s i t y

  • f

T e x a s a t A u s t i n { b y r

  • n

. w a l l a c e | m l } @ u t e x a s . e d u D e

  • Task routing

Predicting Annotation Difficulty to Improve Task Routing and Model Performance for Biomedical Information Extraction

Yinfei Yang Google AI yinfeiy@google.com Oshin Agarwal University of Pennsylvania

  • agarwal@seas.upenn.edu

Chris Tar Google AI ctar@google.com Byron C. Wallace Northeastern University b.wallace@northeastern.edu Ani Nenkova University of Pennsylvania nenkova@seas.upenn.edu Abstract

Modern NLP

ficiently large corpus would

slide-70
SLIDE 70

Crowdsourcing takeaways

  • If you’re in a position of needing to acquire supervision (annotations),

you’ll probably want to use crowdsourcing

  • Invest in good task design and think about how you will aggregate

individual annotations

  • It may be worth investing in a small set of “expert” annotations as well