Stacking With Auxiliary Features Nazneen Rajani and Ray Mooney - - PowerPoint PPT Presentation

stacking with auxiliary features
SMART_READER_LITE
LIVE PREVIEW

Stacking With Auxiliary Features Nazneen Rajani and Ray Mooney - - PowerPoint PPT Presentation

Stacking With Auxiliary Features Nazneen Rajani and Ray Mooney nrajani@cs.utexas.edu and mooney@cs.utexas.edu University of Texas at Austin Introduction 2 Introduction Ensembling algorithms cannot effectively discriminate across: 2


slide-1
SLIDE 1

Stacking With Auxiliary Features

Nazneen Rajani and Ray Mooney

nrajani@cs.utexas.edu and mooney@cs.utexas.edu

University of Texas at Austin

slide-2
SLIDE 2

Introduction

2

slide-3
SLIDE 3

Introduction

2

  • Ensembling algorithms cannot effectively discriminate across:
slide-4
SLIDE 4

Introduction

2

  • Ensembling algorithms cannot effectively discriminate across:
  • component systems
slide-5
SLIDE 5

Introduction

2

  • Ensembling algorithms cannot effectively discriminate across:
  • component systems
  • input instances
slide-6
SLIDE 6

Introduction

2

  • Ensembling algorithms cannot effectively discriminate across:
  • component systems
  • input instances
  • We propose Stacking With Auxiliary Features (SWAF) as a

general ML algorithm

slide-7
SLIDE 7

Introduction

2

  • Ensembling algorithms cannot effectively discriminate across:
  • component systems
  • input instances
  • We propose Stacking With Auxiliary Features (SWAF) as a

general ML algorithm

  • Demonstrate SWAF on various challenging structured

prediction tasks:

slide-8
SLIDE 8

Introduction

2

  • Ensembling algorithms cannot effectively discriminate across:
  • component systems
  • input instances
  • We propose Stacking With Auxiliary Features (SWAF) as a

general ML algorithm

  • Demonstrate SWAF on various challenging structured

prediction tasks:

  • Slot Filling (SF)
slide-9
SLIDE 9

Introduction

2

  • Ensembling algorithms cannot effectively discriminate across:
  • component systems
  • input instances
  • We propose Stacking With Auxiliary Features (SWAF) as a

general ML algorithm

  • Demonstrate SWAF on various challenging structured

prediction tasks:

  • Slot Filling (SF)
  • Entity Discovery and Linking (EDL)
slide-10
SLIDE 10

Introduction

2

  • Ensembling algorithms cannot effectively discriminate across:
  • component systems
  • input instances
  • We propose Stacking With Auxiliary Features (SWAF) as a

general ML algorithm

  • Demonstrate SWAF on various challenging structured

prediction tasks:

  • Slot Filling (SF)
  • Entity Discovery and Linking (EDL)
  • ImageNet object detection
slide-11
SLIDE 11

Introduction

2

  • Ensembling algorithms cannot effectively discriminate across:
  • component systems
  • input instances
  • We propose Stacking With Auxiliary Features (SWAF) as a

general ML algorithm

  • Demonstrate SWAF on various challenging structured

prediction tasks:

  • Slot Filling (SF)
  • Entity Discovery and Linking (EDL)
  • ImageNet object detection

}

slide-12
SLIDE 12

Introduction

2

  • Ensembling algorithms cannot effectively discriminate across:
  • component systems
  • input instances
  • We propose Stacking With Auxiliary Features (SWAF) as a

general ML algorithm

  • Demonstrate SWAF on various challenging structured

prediction tasks:

  • Slot Filling (SF)
  • Entity Discovery and Linking (EDL)
  • ImageNet object detection

} NLP

slide-13
SLIDE 13

Introduction

2

  • Ensembling algorithms cannot effectively discriminate across:
  • component systems
  • input instances
  • We propose Stacking With Auxiliary Features (SWAF) as a

general ML algorithm

  • Demonstrate SWAF on various challenging structured

prediction tasks:

  • Slot Filling (SF)
  • Entity Discovery and Linking (EDL)
  • ImageNet object detection

} NLP

slide-14
SLIDE 14

Introduction

2

  • Ensembling algorithms cannot effectively discriminate across:
  • component systems
  • input instances
  • We propose Stacking With Auxiliary Features (SWAF) as a

general ML algorithm

  • Demonstrate SWAF on various challenging structured

prediction tasks:

  • Slot Filling (SF)
  • Entity Discovery and Linking (EDL)
  • ImageNet object detection

} NLP

Vision

slide-15
SLIDE 15

Slot Filling

3

  • 1. city_of_headquarters:
  • 2. website:
  • 3. subsidiaries:
  • 4. employees:
  • 5. shareholders:

Microsoft is a technology company, headquartered in Redmond, Washington that develops …

city_of_headquarters: Redmond provenance: confidence score: 1.0

  • rg: Microsoft
slide-16
SLIDE 16

Entity Discovery and Linking (EDL)

4

FreeBase entry: Hillary Diane Rodham Clinton is a US Secretary of State, U.S. Senator, and First Lady of the United States. From 2009 to 2013, she was the 67th Secretary of State, serving under President Barack Obama. She previously represented New York in the U.S. Senate.

Source Corpus Document: Hillary Clinton Not Talking About ’92 Clinton-Gore Confederate Campaign Button..

FreeBase entry: William Jefferson "Bill" Clinton is an American poli5cian who served as the 42nd President of the United States from 1993 to 2001. Clinton was Governor of Arkansas from 1979 to 1981 and 1983 to 1992, and Arkansas AJorney General from 1977 to 1979.

slide-17
SLIDE 17

Entity Discovery and Linking (EDL)

4

FreeBase entry: Hillary Diane Rodham Clinton is a US Secretary of State, U.S. Senator, and First Lady of the United States. From 2009 to 2013, she was the 67th Secretary of State, serving under President Barack Obama. She previously represented New York in the U.S. Senate.

Source Corpus Document: Hillary Clinton Not Talking About ’92 Clinton-Gore Confederate Campaign Button..

FreeBase entry: William Jefferson "Bill" Clinton is an American poli5cian who served as the 42nd President of the United States from 1993 to 2001. Clinton was Governor of Arkansas from 1979 to 1981 and 1983 to 1992, and Arkansas AJorney General from 1977 to 1979.

slide-18
SLIDE 18

Entity Discovery and Linking (EDL)

4

FreeBase entry: Hillary Diane Rodham Clinton is a US Secretary of State, U.S. Senator, and First Lady of the United States. From 2009 to 2013, she was the 67th Secretary of State, serving under President Barack Obama. She previously represented New York in the U.S. Senate.

Source Corpus Document: Hillary Clinton Not Talking About ’92 Clinton-Gore Confederate Campaign Button..

FreeBase entry: William Jefferson "Bill" Clinton is an American poli5cian who served as the 42nd President of the United States from 1993 to 2001. Clinton was Governor of Arkansas from 1979 to 1981 and 1983 to 1992, and Arkansas AJorney General from 1977 to 1979.

slide-19
SLIDE 19

ImageNet Object Detection

5

slide-20
SLIDE 20

Ensemble Algorithms

6

  • Stacking (Wolpert, 1992)

System 1 System 2 System N-1 System N Trained classifier

Accept?

conf 1 conf 2 conf N-1 conf N

slide-21
SLIDE 21

Stacking With Auxiliary Features (SWAF)

System 1 System 2 System N

Trained Meta-classifier

Provenance Features conf 2 conf N Accept? System N-1 conf N-1 conf 1 Auxiliary Features Instance Features

  • Stacking using two types of auxiliary features:

7

slide-22
SLIDE 22

Instance Features

8

slide-23
SLIDE 23

Instance Features

  • Enables stacker to discriminate between input

instance types

8

slide-24
SLIDE 24

Instance Features

  • Enables stacker to discriminate between input

instance types

  • Some systems are better at certain input types

8

slide-25
SLIDE 25

Instance Features

  • Enables stacker to discriminate between input

instance types

  • Some systems are better at certain input types
  • SF — slot type (per: age)

8

slide-26
SLIDE 26

Instance Features

  • Enables stacker to discriminate between input

instance types

  • Some systems are better at certain input types
  • SF — slot type (per: age)

8

slide-27
SLIDE 27

Instance Features

  • Enables stacker to discriminate between input

instance types

  • Some systems are better at certain input types
  • SF — slot type (per: age)

8

{

slide-28
SLIDE 28

Instance Features

  • Enables stacker to discriminate between input

instance types

  • Some systems are better at certain input types
  • SF — slot type (per: age)

8

slide-29
SLIDE 29

Instance Features

  • Enables stacker to discriminate between input

instance types

  • Some systems are better at certain input types
  • SF — slot type (per: age)

8

slide-30
SLIDE 30

Instance Features

  • Enables stacker to discriminate between input

instance types

  • Some systems are better at certain input types
  • SF — slot type (per: age)
  • EDL — entity type (PER/ORG/GPE/FAC/LOC)

8

slide-31
SLIDE 31

Instance Features

  • Enables stacker to discriminate between input

instance types

  • Some systems are better at certain input types
  • SF — slot type (per: age)
  • EDL — entity type (PER/ORG/GPE/FAC/LOC)
  • Object detection — object category and

VGGNet’s fc7 features

8

slide-32
SLIDE 32

Provenance Features

9

slide-33
SLIDE 33

Provenance Features

  • Enables the stacker to discriminate between

systems

9

slide-34
SLIDE 34

Provenance Features

  • Enables the stacker to discriminate between

systems

  • Output is reliable if systems agree on source

9

slide-35
SLIDE 35

Provenance Features

  • Enables the stacker to discriminate between

systems

  • Output is reliable if systems agree on source
  • SF and EDL — document and offset

provenance

9

slide-36
SLIDE 36

Provenance Features

  • Enables the stacker to discriminate between

systems

  • Output is reliable if systems agree on source
  • SF and EDL — document and offset

provenance

9

slide-37
SLIDE 37

Provenance Features

  • Enables the stacker to discriminate between

systems

  • Output is reliable if systems agree on source
  • SF and EDL — document and offset

provenance

9

slide-38
SLIDE 38

Provenance Features

  • Enables the stacker to discriminate between

systems

  • Output is reliable if systems agree on source
  • SF and EDL — document and offset

provenance

9

slide-39
SLIDE 39

Provenance Features

  • Enables the stacker to discriminate between

systems

  • Output is reliable if systems agree on source
  • SF and EDL — document and offset

provenance

9

slide-40
SLIDE 40

Provenance Features

  • Enables the stacker to discriminate between

systems

  • Output is reliable if systems agree on source
  • SF and EDL — document and offset

provenance

  • Object detection — bounding box provenance

9

slide-41
SLIDE 41

Document Provenance Feature

  • For a given query and slot, for each system, i, there is a

feature DPi:

  • N systems provide a fill for the slot.
  • Of these, n give same provenance docid as i.
  • DPi = n/N is the document provenance score.
  • Measures extent to which systems agree on document

provenance of the slot fill.

10

slide-42
SLIDE 42

Offset Provenance Feature

  • Degree of overlap between systems’ provenance strings.
  • Uses Jaccard similarity coefficient.
  • Systems with different docid have zero OP

11

OP(n) = 1 | N | × | substring(i)∩substring(n)| | substring(i)∪substring(n)|

i∈N,i≠n

slide-43
SLIDE 43

Offset Provenance Feature

12

Former President Barack Obama

System 2 System 3

Offsets System 1 System 2 System 3 Start 8 1 18 End 29 16 29

OP

1 = 1

2 × 9 29 + 12 21 ⎛ ⎝ ⎜ ⎞ ⎠ ⎟

slide-44
SLIDE 44

Provenance Features

  • Object detection — measure BB overlap

13

+

slide-45
SLIDE 45

14

  • Mixtures of Experts (MoE) (Jacobs et al., 1991)
  • same intuition as instance auxiliary features
  • partition the problem into sub-spaces
  • learn to switch experts based on input using a gating network
  • Oracle Voting
  • Vary the number of systems from 1 to n and use the one that

results in best performance

  • Upper-bound on voting

Baselines

slide-46
SLIDE 46

Results

  • 2016 SF — 8 component systems

15

slide-47
SLIDE 47

Results

  • 2016 SF — 8 component systems

Approach Precision Recall F1

15

slide-48
SLIDE 48

Results

  • 2016 SF — 8 component systems

Approach Precision Recall F1

Mixtures of Experts (Jacobs et al., 1991) 0.168 0.321 0.180

15

slide-49
SLIDE 49

Results

  • 2016 SF — 8 component systems

Approach Precision Recall F1

Mixtures of Experts (Jacobs et al., 1991) 0.168 0.321 0.180 Oracle voting (>=4) 0.191 0.379 0.206

15

slide-50
SLIDE 50

Results

  • 2016 SF — 8 component systems

Approach Precision Recall F1

Mixtures of Experts (Jacobs et al., 1991) 0.168 0.321 0.180 Oracle voting (>=4) 0.191 0.379 0.206 Top ranked system (Zhang et al., 2016) 0.265 0.302 0.260

15

slide-51
SLIDE 51

Results

  • 2016 SF — 8 component systems

Approach Precision Recall F1

Mixtures of Experts (Jacobs et al., 1991) 0.168 0.321 0.180 Oracle voting (>=4) 0.191 0.379 0.206 Top ranked system (Zhang et al., 2016) 0.265 0.302 0.260 Stacking 0.311 0.253 0.279

15

slide-52
SLIDE 52

Results

  • 2016 SF — 8 component systems

Approach Precision Recall F1

Mixtures of Experts (Jacobs et al., 1991) 0.168 0.321 0.180 Oracle voting (>=4) 0.191 0.379 0.206 Top ranked system (Zhang et al., 2016) 0.265 0.302 0.260 Stacking 0.311 0.253 0.279 Stacking + instance features 0.257 0.346 0.295

15

slide-53
SLIDE 53

Results

  • 2016 SF — 8 component systems

Approach Precision Recall F1

Mixtures of Experts (Jacobs et al., 1991) 0.168 0.321 0.180 Oracle voting (>=4) 0.191 0.379 0.206 Top ranked system (Zhang et al., 2016) 0.265 0.302 0.260 Stacking 0.311 0.253 0.279 Stacking + instance features 0.257 0.346 0.295 Stacking + provenance features 0.252 0.377 0.302

15

slide-54
SLIDE 54

Results

  • 2016 SF — 8 component systems

Approach Precision Recall F1

Mixtures of Experts (Jacobs et al., 1991) 0.168 0.321 0.180 Oracle voting (>=4) 0.191 0.379 0.206 Top ranked system (Zhang et al., 2016) 0.265 0.302 0.260 Stacking 0.311 0.253 0.279 Stacking + instance features 0.257 0.346 0.295 Stacking + provenance features 0.252 0.377 0.302 SWAF 0.258 0.439 0.324

15

slide-55
SLIDE 55

Results

  • 2016 SF — 8 component systems

Approach Precision Recall F1

Mixtures of Experts (Jacobs et al., 1991) 0.168 0.321 0.180 Oracle voting (>=4) 0.191 0.379 0.206 Top ranked system (Zhang et al., 2016) 0.265 0.302 0.260 Stacking 0.311 0.253 0.279 Stacking + instance features 0.257 0.346 0.295 Stacking + provenance features 0.252 0.377 0.302 SWAF 0.258 0.439 0.324

15

slide-56
SLIDE 56

Results

  • 2016 EDL — 6 component systems

Approach Precision Recall F1

Oracle voting (>=4) 0.588 0.412 0.485 Mixtures of Experts (Jacobs et al., 1991) 0.721 0.494 0.587 Top ranked system (Sil et al., 2016) 0.717 0.517 0.601 Stacking 0.723 0.537 0.616 Stacking + instance features 0.752 0.542 0.630 Stacking + provenance features 0.767 0.544 0.637 SWAF 0.739 0.600 0.662

16

slide-57
SLIDE 57

Results

  • 2016 EDL — 6 component systems

Approach Precision Recall F1

Oracle voting (>=4) 0.588 0.412 0.485 Mixtures of Experts (Jacobs et al., 1991) 0.721 0.494 0.587 Top ranked system (Sil et al., 2016) 0.717 0.517 0.601 Stacking 0.723 0.537 0.616 Stacking + instance features 0.752 0.542 0.630 Stacking + provenance features 0.767 0.544 0.637 SWAF 0.739 0.600 0.662

16

slide-58
SLIDE 58

Results

  • 2015 ImageNet object detection—

3 component systems

Approach Mean AP Median AP

Oracle voting (>=1) 0.366 0.368 Best standalone system (VGG + selective search) 0.434 0.430 Stacking 0.451 0.441 Stacking + instance features 0.461 0.45 Mixtures of Experts (Jacobs et al., 1991) 0.494 0.489 Stacking + provenance features 0.502 0.494 SWAF 0.506 0.497

17

slide-59
SLIDE 59

Takeaways

  • SWAF produced SOTA on SF and EDL
  • Significant improvements on ImageNet object

detection

  • Our approach is more robust than MoE in

terms of number of component systems

  • For object detection — works well for images

with multiple instances of the same object

18