[PPT] - Breaking Out of Local Optima with Count Transforms and Model PowerPoint Presentation

SLIDE 1

Breaking Out of Local Optima with

Count Transforms and Model Recombination Valentin I. Spitkovsky with Hiyan Alshawi (Google Inc.) and Daniel Jurafsky (Stanford University)

Spitkovsky et al. (Stanford & Google) Breaking out of Local Optima EMNLP (2013-10-21) 1 / 21

SLIDE 2

The Problem Statement

Problem: Unsupervised Parsing (and Grammar Induction)

Spitkovsky et al. (Stanford & Google) Breaking out of Local Optima EMNLP (2013-10-21) 2 / 21

SLIDE 3

The Problem Statement

Problem: Unsupervised Parsing (and Grammar Induction)

Input: Raw Text ... By most measures, the nation’s industrial sector is now growing very slowly — if at all. Factory payrolls fell in

September. So did the Federal Reserve ...

Spitkovsky et al. (Stanford & Google) Breaking out of Local Optima EMNLP (2013-10-21) 2 / 21

SLIDE 4

The Problem Statement

Problem: Unsupervised Parsing (and Grammar Induction)

N N V P N ♦ | | | | | | Factory payrolls fell in September . Input: Raw Text (Sentences, Tokens and their Categories) ... By most measures, the nation’s industrial sector is now growing very slowly — if at all. Factory payrolls fell in

September. So did the Federal Reserve ...

Spitkovsky et al. (Stanford & Google) Breaking out of Local Optima EMNLP (2013-10-21) 2 / 21

SLIDE 5

The Problem Statement

Problem: Unsupervised Parsing (and Grammar Induction)

N N V P N ♦ | | | | | | Factory payrolls fell in September . Input: Raw Text (Sentences, Tokens and their Categories) ... By most measures, the nation’s industrial sector is now growing very slowly — if at all. Factory payrolls fell in

September. So did the Federal Reserve ...

Output: Syntactic Structures (and a Probabilistic Grammar)

Spitkovsky et al. (Stanford & Google) Breaking out of Local Optima EMNLP (2013-10-21) 2 / 21

SLIDE 6

The Problem Motivation

Motivation: Unsupervised (Dependency) Parsing

Spitkovsky et al. (Stanford & Google) Breaking out of Local Optima EMNLP (2013-10-21) 3 / 21

SLIDE 7

The Problem Motivation

Motivation: Unsupervised (Dependency) Parsing

Parsing can be useful...

Spitkovsky et al. (Stanford & Google) Breaking out of Local Optima EMNLP (2013-10-21) 3 / 21

SLIDE 8

The Problem Motivation

Motivation: Unsupervised (Dependency) Parsing

Parsing can be useful...

◮ machine translation

— word alignment, phrase extraction, reordering;

◮ web search

— retrieval, query refinement;

◮ question answering, speech recognition, etc. Spitkovsky et al. (Stanford & Google) Breaking out of Local Optima EMNLP (2013-10-21) 3 / 21

SLIDE 9

The Problem Motivation

Motivation: Unsupervised (Dependency) Parsing

Parsing can be useful...

◮ machine translation

— word alignment, phrase extraction, reordering;

◮ web search

— retrieval, query refinement;

◮ question answering, speech recognition, etc.

But we don’t always have treebanks...

◮ specialized genres (e.g., legal), ◮ understudied languages, etc. Spitkovsky et al. (Stanford & Google) Breaking out of Local Optima EMNLP (2013-10-21) 3 / 21

SLIDE 10

Hardness Non-Convex Objective

Hardness: Why is grammar induction difficult?

Spitkovsky et al. (Stanford & Google) Breaking out of Local Optima EMNLP (2013-10-21) 4 / 21

SLIDE 11

Hardness Non-Convex Objective

Hardness: Why is grammar induction difficult?

Requires solving a non-convex optimization problem.

— problem can be NP-hard (Cohen and Smith, 2010)

Spitkovsky et al. (Stanford & Google) Breaking out of Local Optima EMNLP (2013-10-21) 4 / 21

SLIDE 12

Hardness Non-Convex Objective

Hardness: Why is grammar induction difficult?

Requires solving a non-convex optimization problem.

— problem can be NP-hard (Cohen and Smith, 2010)

◮ issue: can’t just hill-climb ⋆ learning is very sensitive to initialization, tie-breaking, etc. ⋆ hard to replicate others’ results... Spitkovsky et al. (Stanford & Google) Breaking out of Local Optima EMNLP (2013-10-21) 4 / 21

SLIDE 13

Hardness Non-Convex Objective

Hardness: Why is grammar induction difficult?

Requires solving a non-convex optimization problem.

— problem can be NP-hard (Cohen and Smith, 2010)

◮ issue: can’t just hill-climb ⋆ learning is very sensitive to initialization, tie-breaking, etc. ⋆ hard to replicate others’ results... ◮ alternative: use sampling methods Spitkovsky et al. (Stanford & Google) Breaking out of Local Optima EMNLP (2013-10-21) 4 / 21

SLIDE 14

Hardness Non-Convex Objective

Hardness: Why is grammar induction difficult?

Requires solving a non-convex optimization problem.

— problem can be NP-hard (Cohen and Smith, 2010)

◮ issue: can’t just hill-climb ⋆ learning is very sensitive to initialization, tie-breaking, etc. ⋆ hard to replicate others’ results... ◮ alternative: use sampling methods ⋆ also runs into difficulties (e.g., when to stop?) ⋆ but offers useful intuition (i.e., to move away and restart) Spitkovsky et al. (Stanford & Google) Breaking out of Local Optima EMNLP (2013-10-21) 4 / 21

SLIDE 15

Hardness Non-Convex Objective

Hardness: Why is grammar induction difficult?

Requires solving a non-convex optimization problem.

— problem can be NP-hard (Cohen and Smith, 2010)

◮ issue: can’t just hill-climb ⋆ learning is very sensitive to initialization, tie-breaking, etc. ⋆ hard to replicate others’ results... ◮ alternative: use sampling methods ⋆ also runs into difficulties (e.g., when to stop?) ⋆ but offers useful intuition (i.e., to move away and restart) ◮ our approach: combining the best of both Spitkovsky et al. (Stanford & Google) Breaking out of Local Optima EMNLP (2013-10-21) 4 / 21

SLIDE 16

Goal Improving I

Goal: How to not get stuck and make progress?

Challenge:

Spitkovsky et al. (Stanford & Google) Breaking out of Local Optima EMNLP (2013-10-21) 5 / 21

SLIDE 17

Goal Improving I

Goal: How to not get stuck and make progress?

Challenge:

◮ given a (locally optimal) solution, find a better solution ⋆ e.g., turn a set of parse trees into a better set Spitkovsky et al. (Stanford & Google) Breaking out of Local Optima EMNLP (2013-10-21) 5 / 21

SLIDE 18

Goal Improving I

Goal: How to not get stuck and make progress?

Challenge:

◮ given a (locally optimal) solution, find a better solution ⋆ e.g., turn a set of parse trees into a better set

Desiderata:

Spitkovsky et al. (Stanford & Google) Breaking out of Local Optima EMNLP (2013-10-21) 5 / 21

SLIDE 19

Goal Improving I

Goal: How to not get stuck and make progress?

Challenge:

◮ given a (locally optimal) solution, find a better solution ⋆ e.g., turn a set of parse trees into a better set

Desiderata:

◮ want an informed, medium-size step in parameter space Spitkovsky et al. (Stanford & Google) Breaking out of Local Optima EMNLP (2013-10-21) 5 / 21

SLIDE 20

Goal Improving I

Goal: How to not get stuck and make progress?

Challenge:

◮ given a (locally optimal) solution, find a better solution ⋆ e.g., turn a set of parse trees into a better set

Desiderata:

◮ want an informed, medium-size step in parameter space ◮ not too big (e.g., random restarts undo all previous work) Spitkovsky et al. (Stanford & Google) Breaking out of Local Optima EMNLP (2013-10-21) 5 / 21

SLIDE 21

Goal Improving I

Goal: How to not get stuck and make progress?

Challenge:

◮ given a (locally optimal) solution, find a better solution ⋆ e.g., turn a set of parse trees into a better set

Desiderata:

◮ want an informed, medium-size step in parameter space ◮ not too big (e.g., random restarts undo all previous work) ◮ not too small (i.e., not overly self-similar, as in MCMC) Spitkovsky et al. (Stanford & Google) Breaking out of Local Optima EMNLP (2013-10-21) 5 / 21

SLIDE 22

Goal Improving I

Goal: How to not get stuck and make progress?

Challenge:

◮ given a (locally optimal) solution, find a better solution ⋆ e.g., turn a set of parse trees into a better set

Desiderata:

◮ want an informed, medium-size step in parameter space ◮ not too big (e.g., random restarts undo all previous work) ◮ not too small (i.e., not overly self-similar, as in MCMC)

Algorithm Template:

Spitkovsky et al. (Stanford & Google) Breaking out of Local Optima EMNLP (2013-10-21) 5 / 21

SLIDE 23

Goal Improving I

Goal: How to not get stuck and make progress?

Challenge:

◮ given a (locally optimal) solution, find a better solution ⋆ e.g., turn a set of parse trees into a better set

Desiderata:

◮ want an informed, medium-size step in parameter space ◮ not too big (e.g., random restarts undo all previous work) ◮ not too small (i.e., not overly self-similar, as in MCMC)

Algorithm Template:

◮ selectively forget (or filter) some aspect of a solution Spitkovsky et al. (Stanford & Google) Breaking out of Local Optima EMNLP (2013-10-21) 5 / 21

SLIDE 24

Goal Improving I

Goal: How to not get stuck and make progress?

Challenge:

◮ given a (locally optimal) solution, find a better solution ⋆ e.g., turn a set of parse trees into a better set

Desiderata:

◮ want an informed, medium-size step in parameter space ◮ not too big (e.g., random restarts undo all previous work) ◮ not too small (i.e., not overly self-similar, as in MCMC)

Algorithm Template:

◮ selectively forget (or filter) some aspect of a solution, ◮ re-optimize from this new starting point Spitkovsky et al. (Stanford & Google) Breaking out of Local Optima EMNLP (2013-10-21) 5 / 21

SLIDE 25

Goal Improving I

Goal: How to not get stuck and make progress?

Challenge:

◮ given a (locally optimal) solution, find a better solution ⋆ e.g., turn a set of parse trees into a better set

Desiderata:

◮ want an informed, medium-size step in parameter space ◮ not too big (e.g., random restarts undo all previous work) ◮ not too small (i.e., not overly self-similar, as in MCMC)

Algorithm Template:

◮ selectively forget (or filter) some aspect of a solution, ◮ re-optimize from this new starting point, ◮ and take the better of the two. Spitkovsky et al. (Stanford & Google) Breaking out of Local Optima EMNLP (2013-10-21) 5 / 21

SLIDE 26

Goal Improving I

Goal: How to not get stuck and make progress?

Challenge:

◮ given a (locally optimal) solution, find a better solution ⋆ e.g., turn a set of parse trees into a better set

Desiderata:

◮ want an informed, medium-size step in parameter space ◮ not too big (e.g., random restarts undo all previous work) ◮ not too small (i.e., not overly self-similar, as in MCMC)

Algorithm Template: Count Transforms

◮ selectively forget (or filter) some aspect of a solution, ◮ re-optimize from this new starting point, ◮ and take the better of the two. Spitkovsky et al. (Stanford & Google) Breaking out of Local Optima EMNLP (2013-10-21) 5 / 21

SLIDE 27

Transforms Symmetrizer

Transforms: Symmetrizer (Forget Polarity)

Spitkovsky et al. (Stanford & Google) Breaking out of Local Optima EMNLP (2013-10-21) 6 / 21

SLIDE 28

Transforms Symmetrizer

Transforms: Symmetrizer (Forget Polarity)

learn from the undirected arcs of skeletal structures

Spitkovsky et al. (Stanford & Google) Breaking out of Local Optima EMNLP (2013-10-21) 6 / 21

SLIDE 29

Transforms Symmetrizer

Transforms: Symmetrizer (Forget Polarity)

learn from the undirected arcs of skeletal structures

N N V P N ♦ | | | | | | Factory payrolls fell in September .

Spitkovsky et al. (Stanford & Google) Breaking out of Local Optima EMNLP (2013-10-21) 6 / 21

SLIDE 30

Transforms Symmetrizer

Transforms: Symmetrizer (Forget Polarity)

learn from the undirected arcs of skeletal structures

N N V P N ♦ | | | | | | Factory payrolls fell in September .

Spitkovsky et al. (Stanford & Google) Breaking out of Local Optima EMNLP (2013-10-21) 6 / 21

SLIDE 31

Transforms Symmetrizer

Transforms: Symmetrizer (Forget Polarity)

learn from the undirected arcs of skeletal structures

N N V P N ♦ | | | | | | Factory payrolls fell in September .

◮ once we kind of understand which words go together,

take another whack at making heads or tails of syntax!

Spitkovsky et al. (Stanford & Google) Breaking out of Local Optima EMNLP (2013-10-21) 6 / 21

SLIDE 32

Transforms Filter

Transforms: Filter (Forget Incomplete Fragments)

Spitkovsky et al. (Stanford & Google) Breaking out of Local Optima EMNLP (2013-10-21) 7 / 21

SLIDE 33

Transforms Filter

Transforms: Filter (Forget Incomplete Fragments)

start by splitting text on punctuation (Spitkovsky et al., 2012)

Spitkovsky et al. (Stanford & Google) Breaking out of Local Optima EMNLP (2013-10-21) 7 / 21

SLIDE 34

Transforms Filter

Transforms: Filter (Forget Incomplete Fragments)

start by splitting text on punctuation (Spitkovsky et al., 2012) Linguistics

from (simple) Wikipedia

Linguistics (sometimes called philology) is the science that studies language. Scientists who study language are called linguists.

Spitkovsky et al. (Stanford & Google) Breaking out of Local Optima EMNLP (2013-10-21) 7 / 21

SLIDE 35

Transforms Filter

Transforms: Filter (Forget Incomplete Fragments)

start by splitting text on punctuation (Spitkovsky et al., 2012) Linguistics

from (simple) Wikipedia

Linguistics (sometimes called philology) is the science that studies language. Scientists who study language are called linguists.

Spitkovsky et al. (Stanford & Google) Breaking out of Local Optima EMNLP (2013-10-21) 7 / 21

SLIDE 36

Transforms Filter

Transforms: Filter (Forget Incomplete Fragments)

start by splitting text on punctuation (Spitkovsky et al., 2012) Linguistics

from (simple) Wikipedia

Linguistics Stage I (sometimes called philology) is the science that studies language. Scientists who study language are called linguists.

Spitkovsky et al. (Stanford & Google) Breaking out of Local Optima EMNLP (2013-10-21) 7 / 21

SLIDE 37

Transforms Filter

Transforms: Filter (Forget Incomplete Fragments)

start by splitting text on punctuation (Spitkovsky et al., 2012) Linguistics

from (simple) Wikipedia

Linguistics Stage I (sometimes called philology) is the science that studies language. Scientists who study language are called linguists.

◮ once we’ve bootstrapped a rudimentary grammar,

retry from just the clean, simple complete sentences!

Spitkovsky et al. (Stanford & Google) Breaking out of Local Optima EMNLP (2013-10-21) 7 / 21

SLIDE 38

Transforms Filter

Transforms: Filter (Forget Incomplete Fragments)

start by splitting text on punctuation (Spitkovsky et al., 2012) Stage II Scientists who study language are called linguists.

◮ once we’ve bootstrapped a rudimentary grammar,

retry from just the clean, simple complete sentences!

Spitkovsky et al. (Stanford & Google) Breaking out of Local Optima EMNLP (2013-10-21) 7 / 21

SLIDE 39

Transforms Decoder

Transforms: Decoder (Forget Unlikely Parses)

Spitkovsky et al. (Stanford & Google) Breaking out of Local Optima EMNLP (2013-10-21) 8 / 21

SLIDE 40

Transforms Decoder

Transforms: Decoder (Forget Unlikely Parses)

discard most interpretations (a step of Viterbi training)

Spitkovsky et al. (Stanford & Google) Breaking out of Local Optima EMNLP (2013-10-21) 8 / 21

SLIDE 41

Transforms Decoder

Transforms: Decoder (Forget Unlikely Parses)

discard most interpretations (a step of Viterbi training)

0.4 N N V P N ♦ | | | | | | Factory payrolls fell in September . 0.3 N N V P N ♦ | | | | | | Factory payrolls fell in September . 0.3 N N V P N ♦ | | | | | | Factory payrolls fell in September . Spitkovsky et al. (Stanford & Google) Breaking out of Local Optima EMNLP (2013-10-21) 8 / 21

SLIDE 42

Transforms Decoder

Transforms: Decoder (Forget Unlikely Parses)

discard most interpretations (a step of Viterbi training)

1.0 N N V P N ♦ | | | | | | Factory payrolls fell in September . Spitkovsky et al. (Stanford & Google) Breaking out of Local Optima EMNLP (2013-10-21) 8 / 21

SLIDE 43

Transforms Decoder

Transforms: Decoder (Forget Unlikely Parses)

discard most interpretations (a step of Viterbi training)

1.0 N N V P N ♦ | | | | | | Factory payrolls fell in September . ◮ many reasons why Viterbi steps are a good idea:

e.g., M-step initialization (Klein and Manning, 2004)

(Cohen and Smith, 2010) (Spitkovsky et al., 2010) (Allahverdyan and Galstyan, 2011)

Spitkovsky et al. (Stanford & Google) Breaking out of Local Optima EMNLP (2013-10-21) 8 / 21

SLIDE 44

Pop-Up Generic

Pop-up: This is not specific to grammar induction!

Spitkovsky et al. (Stanford & Google) Breaking out of Local Optima EMNLP (2013-10-21) 9 / 21

SLIDE 45

Pop-Up Generic

Pop-up: This is not specific to grammar induction!

proposed primitive transform operators (unary):

Spitkovsky et al. (Stanford & Google) Breaking out of Local Optima EMNLP (2013-10-21) 9 / 21

SLIDE 46

Pop-Up Generic

Pop-up: This is not specific to grammar induction!

proposed primitive transform operators (unary):

◮ model ablation (i.e., forget something you learned) Spitkovsky et al. (Stanford & Google) Breaking out of Local Optima EMNLP (2013-10-21) 9 / 21

SLIDE 47

Pop-Up Generic

Pop-up: This is not specific to grammar induction!

proposed primitive transform operators (unary):

◮ model ablation (i.e., forget something you learned); ◮ data filtering (e.g., drop complex inputs); Spitkovsky et al. (Stanford & Google) Breaking out of Local Optima EMNLP (2013-10-21) 9 / 21

SLIDE 48

Pop-Up Generic

Pop-up: This is not specific to grammar induction!

proposed primitive transform operators (unary):

◮ model ablation (i.e., forget something you learned); ◮ data filtering (e.g., drop complex inputs); ◮ Viterbi stepping (i.e., decode your data). Spitkovsky et al. (Stanford & Google) Breaking out of Local Optima EMNLP (2013-10-21) 9 / 21

SLIDE 49

Pop-Up Generic

Pop-up: This is not specific to grammar induction!

proposed primitive transform operators (unary):

◮ model ablation (i.e., forget something you learned); ◮ data filtering (e.g., drop complex inputs); ◮ Viterbi stepping (i.e., decode your data).

just need operators (binary or higher) to combine them:

◮ a robust way to merge alternatives of varying quality... Spitkovsky et al. (Stanford & Google) Breaking out of Local Optima EMNLP (2013-10-21) 9 / 21

SLIDE 50

Pop-Up Generic

Pop-up: This is not specific to grammar induction!

proposed primitive transform operators (unary):

◮ model ablation (i.e., forget something you learned); ◮ data filtering (e.g., drop complex inputs); ◮ Viterbi stepping (i.e., decode your data).

just need operators (binary or higher) to combine them:

◮ a robust way to merge alternatives of varying quality...

could construct complex networks that fork/join inputs:

◮ useful for many (non-convex) optimization problems! Spitkovsky et al. (Stanford & Google) Breaking out of Local Optima EMNLP (2013-10-21) 9 / 21

SLIDE 51

Goal Improving II

Goal: How to not get stuck and make progress?

Challenge #2:

Spitkovsky et al. (Stanford & Google) Breaking out of Local Optima EMNLP (2013-10-21) 10 / 21

SLIDE 52

Goal Improving II

Goal: How to not get stuck and make progress?

Challenge #2:

◮ given multiple (local) solutions, find a better one Spitkovsky et al. (Stanford & Google) Breaking out of Local Optima EMNLP (2013-10-21) 10 / 21

SLIDE 53

Goal Improving II

Goal: How to not get stuck and make progress?

Challenge #2:

◮ given multiple (local) solutions, find a better one

Algorithm #2:

Spitkovsky et al. (Stanford & Google) Breaking out of Local Optima EMNLP (2013-10-21) 10 / 21

SLIDE 54

Goal Improving II

Goal: How to not get stuck and make progress?

Challenge #2:

◮ given multiple (local) solutions, find a better one

Algorithm #2:

◮ compute a mixture model Spitkovsky et al. (Stanford & Google) Breaking out of Local Optima EMNLP (2013-10-21) 10 / 21

SLIDE 55

Goal Improving II

Goal: How to not get stuck and make progress?

Challenge #2:

◮ given multiple (local) solutions, find a better one

Algorithm #2:

◮ compute a mixture model, ◮ re-optimize from this new starting point Spitkovsky et al. (Stanford & Google) Breaking out of Local Optima EMNLP (2013-10-21) 10 / 21

SLIDE 56

Goal Improving II

Goal: How to not get stuck and make progress?

Challenge #2:

◮ given multiple (local) solutions, find a better one

Algorithm #2:

◮ compute a mixture model, ◮ re-optimize from this new starting point, ◮ and take the better of the three. Spitkovsky et al. (Stanford & Google) Breaking out of Local Optima EMNLP (2013-10-21) 10 / 21

SLIDE 57

Goal Improving II

Goal: How to not get stuck and make progress?

Challenge #2:

◮ given multiple (local) solutions, find a better one

Algorithm #2: Model Combination

◮ compute a mixture model, ◮ re-optimize from this new starting point, ◮ and take the better of the three. Spitkovsky et al. (Stanford & Google) Breaking out of Local Optima EMNLP (2013-10-21) 10 / 21

SLIDE 58

Goal Improving II

Goal: How to not get stuck and make progress?

Challenge #2:

◮ given multiple (local) solutions, find a better one

Algorithm #2: Model Combination

◮ compute a mixture model, ◮ re-optimize from this new starting point, ◮ and take the better of the three.

Improved Algorithm #2:

Spitkovsky et al. (Stanford & Google) Breaking out of Local Optima EMNLP (2013-10-21) 10 / 21

SLIDE 59

Goal Improving II

Goal: How to not get stuck and make progress?

Challenge #2:

◮ given multiple (local) solutions, find a better one

Algorithm #2: Model Combination

◮ compute a mixture model, ◮ re-optimize from this new starting point, ◮ and take the better of the three.

Improved Algorithm #2:

◮ don’t have to stop there... Spitkovsky et al. (Stanford & Google) Breaking out of Local Optima EMNLP (2013-10-21) 10 / 21

SLIDE 60

Goal Improving II

Goal: How to not get stuck and make progress?

Challenge #2:

◮ given multiple (local) solutions, find a better one

Algorithm #2: Model Combination

◮ compute a mixture model, ◮ re-optimize from this new starting point, ◮ and take the better of the three.

Improved Algorithm #2:

◮ don’t have to stop there... ◮ if output is better than the worst input,

replace and recurse!

Spitkovsky et al. (Stanford & Google) Breaking out of Local Optima EMNLP (2013-10-21) 10 / 21

SLIDE 61

Goal Improving II

Goal: How to not get stuck and make progress?

Challenge #2:

◮ given multiple (local) solutions, find a better one

Algorithm #2: Model Combination

◮ compute a mixture model, ◮ re-optimize from this new starting point, ◮ and take the better of the three.

Improved Algorithm #2: Model Recombination

◮ don’t have to stop there... ◮ if output is better than the worst input,

replace and recurse!

Spitkovsky et al. (Stanford & Google) Breaking out of Local Optima EMNLP (2013-10-21) 10 / 21

SLIDE 62

Theme Story

Theme: Try, try again!!

Spitkovsky et al. (Stanford & Google) Breaking out of Local Optima EMNLP (2013-10-21) 11 / 21

SLIDE 63

Theme Story

Theme: Try, try again!!

Story-telling time...

Spitkovsky et al. (Stanford & Google) Breaking out of Local Optima EMNLP (2013-10-21) 11 / 21

SLIDE 64

Theme Story

Theme: Try, try again!!

Story-telling time...

Spitkovsky et al. (Stanford & Google) Breaking out of Local Optima EMNLP (2013-10-21) 11 / 21

SLIDE 65

Theme Story

Theme: Try, try again!!

Story-telling time...

Spitkovsky et al. (Stanford & Google) Breaking out of Local Optima EMNLP (2013-10-21) 11 / 21

SLIDE 66

Theme Story

Theme: Try, try again!!

Story-telling time...

Dr. Wiesner

Spitkovsky et al. (Stanford & Google) Breaking out of Local Optima EMNLP (2013-10-21) 11 / 21

SLIDE 67

Theme Story

Theme: Try, try again!!

Story-telling time...

Dr. Wiesner, you said “Keep on moving; keep on moving!”

http://web.mit.edu/newsoffice/1995/vest-weisner-0621.html

Spitkovsky et al. (Stanford & Google) Breaking out of Local Optima EMNLP (2013-10-21) 11 / 21

SLIDE 68

Theme Lateen

Theme: Many many ways to “keep on moving!”

Challenge #3:

Spitkovsky et al. (Stanford & Google) Breaking out of Local Optima EMNLP (2013-10-21) 12 / 21

SLIDE 69

Theme Lateen

Theme: Many many ways to “keep on moving!”

Challenge #3:

◮ everything else has failed, Spitkovsky et al. (Stanford & Google) Breaking out of Local Optima EMNLP (2013-10-21) 12 / 21

SLIDE 70

Theme Lateen

Theme: Many many ways to “keep on moving!”

Challenge #3:

◮ everything else has failed, ◮ all transformers and combiners are stuck... Spitkovsky et al. (Stanford & Google) Breaking out of Local Optima EMNLP (2013-10-21) 12 / 21

SLIDE 71

Theme Lateen

Theme: Many many ways to “keep on moving!”

Challenge #3:

◮ everything else has failed, ◮ all transformers and combiners are stuck...

Algorithm #3: “lateen EM” (Spitkovsky et al., 2011)

◮ use multiple objectives (they are all wrong anyway) ◮ e.g., if soft EM is stuck, use hard EM to dig it out... Spitkovsky et al. (Stanford & Google) Breaking out of Local Optima EMNLP (2013-10-21) 12 / 21

SLIDE 72

Theme Lateen

Theme: Many many ways to “keep on moving!”

Challenge #3:

◮ everything else has failed, ◮ all transformers and combiners are stuck...

Algorithm #3: “lateen EM” (Spitkovsky et al., 2011)

◮ use multiple objectives (they are all wrong anyway) ◮ e.g., if soft EM is stuck, use hard EM to dig it out...

many useful alternative ways to view data:

◮ sentence strings or parse trees

(Spitkovsky et al., 2010; 2011)

Spitkovsky et al. (Stanford & Google) Breaking out of Local Optima EMNLP (2013-10-21) 12 / 21

SLIDE 73

Theme Lateen

Theme: Many many ways to “keep on moving!”

Challenge #3:

◮ everything else has failed, ◮ all transformers and combiners are stuck...

Algorithm #3: “lateen EM” (Spitkovsky et al., 2011)

◮ use multiple objectives (they are all wrong anyway) ◮ e.g., if soft EM is stuck, use hard EM to dig it out...

many useful alternative ways to view data:

◮ sentence strings or parse trees

(Spitkovsky et al., 2010; 2011)

◮ all data or just short sentences

(Klein and Manning, 2004)

Spitkovsky et al. (Stanford & Google) Breaking out of Local Optima EMNLP (2013-10-21) 12 / 21

SLIDE 74

Theme Lateen

Theme: Many many ways to “keep on moving!”

Challenge #3:

◮ everything else has failed, ◮ all transformers and combiners are stuck...

Algorithm #3: “lateen EM” (Spitkovsky et al., 2011)

◮ use multiple objectives (they are all wrong anyway) ◮ e.g., if soft EM is stuck, use hard EM to dig it out...

many useful alternative ways to view data:

◮ sentence strings or parse trees

(Spitkovsky et al., 2010; 2011)

◮ all data or just short sentences

(Klein and Manning, 2004)

◮ words or categories

(Paskin, 2001; vs. Carroll and Charniak, 1992)

Spitkovsky et al. (Stanford & Google) Breaking out of Local Optima EMNLP (2013-10-21) 12 / 21

SLIDE 75

Theme Lateen

Theme: Many many ways to “keep on moving!”

Challenge #3:

◮ everything else has failed, ◮ all transformers and combiners are stuck...

Algorithm #3: “lateen EM” (Spitkovsky et al., 2011)

◮ use multiple objectives (they are all wrong anyway) ◮ e.g., if soft EM is stuck, use hard EM to dig it out...

many useful alternative ways to view data:

◮ sentence strings or parse trees

(Spitkovsky et al., 2010; 2011)

◮ all data or just short sentences

(Klein and Manning, 2004)

◮ words or categories

(Paskin, 2001; vs. Carroll and Charniak, 1992)

◮ feature-rich or bare-bones models

(Cohen and Smith, 2009;

vs. Spitkovsky et al., 2012)

Spitkovsky et al. (Stanford & Google) Breaking out of Local Optima EMNLP (2013-10-21) 12 / 21

SLIDE 76

Theme Lateen

Theme: Many many ways to “keep on moving!”

Challenge #3:

◮ everything else has failed, ◮ all transformers and combiners are stuck...

Algorithm #3: “lateen EM” (Spitkovsky et al., 2011)

◮ use multiple objectives (they are all wrong anyway) ◮ e.g., if soft EM is stuck, use hard EM to dig it out...

many useful alternative ways to view data:

◮ sentence strings or parse trees

(Spitkovsky et al., 2010; 2011)

◮ all data or just short sentences

(Klein and Manning, 2004)

◮ words or categories

(Paskin, 2001; vs. Carroll and Charniak, 1992)

◮ feature-rich or bare-bones models

(Cohen and Smith, 2009;

vs. Spitkovsky et al., 2012)

never let convergence interfere with your (non-convex) optimization...

Spitkovsky et al. (Stanford & Google) Breaking out of Local Optima EMNLP (2013-10-21) 12 / 21

SLIDE 77

Networks Fork/Join (FJ)

Networks: Fork/Join (FJ)

counts

Spitkovsky et al. (Stanford & Google) Breaking out of Local Optima EMNLP (2013-10-21) 13 / 21

SLIDE 78

Networks Fork/Join (FJ)

Networks: Fork/Join (FJ)

counts Simple Filter Symmetrizer

Spitkovsky et al. (Stanford & Google) Breaking out of Local Optima EMNLP (2013-10-21) 13 / 21

SLIDE 79

Networks Fork/Join (FJ)

Networks: Fork/Join (FJ)

counts Simple Filter Symmetrizer Full Model Optimizer Sparse Model Optimizer

Spitkovsky et al. (Stanford & Google) Breaking out of Local Optima EMNLP (2013-10-21) 13 / 21

SLIDE 80

Networks Fork/Join (FJ)

Networks: Fork/Join (FJ)

counts Simple Filter Symmetrizer Full Model Optimizer Sparse Model Optimizer

Spitkovsky et al. (Stanford & Google) Breaking out of Local Optima EMNLP (2013-10-21) 13 / 21

SLIDE 81

Networks Fork/Join (FJ)

Networks: Fork/Join (FJ)

counts Simple Filter Symmetrizer Full Model Optimizer Sparse Model Optimizer

Spitkovsky et al. (Stanford & Google) Breaking out of Local Optima EMNLP (2013-10-21) 13 / 21

SLIDE 82

Networks Fork/Join (FJ)

Networks: Fork/Join (FJ)

counts Simple Filter Symmetrizer Full Model Optimizer Sparse Model Optimizer Combiner

Spitkovsky et al. (Stanford & Google) Breaking out of Local Optima EMNLP (2013-10-21) 13 / 21

SLIDE 83

Networks Fork/Join (FJ)

Networks: Fork/Join (FJ)

full full sparse counts

F S

Spitkovsky et al. (Stanford & Google) Breaking out of Local Optima EMNLP (2013-10-21) 13 / 21

SLIDE 84

Networks Fork/Join (FJ)

Networks: Fork/Join (FJ)

full full sparse counts

F S

Decoders

Spitkovsky et al. (Stanford & Google) Breaking out of Local Optima EMNLP (2013-10-21) 13 / 21

SLIDE 85

Networks Fork/Join (FJ)

Networks: Fork/Join (FJ)

full full sparse counts

F S

lexicalized hard EM soft EM soft EM

Spitkovsky et al. (Stanford & Google) Breaking out of Local Optima EMNLP (2013-10-21) 13 / 21

SLIDE 86

Networks Fork/Join (FJ)

Networks: Fork/Join (FJ)

full full sparse counts

F S

Spitkovsky et al. (Stanford & Google) Breaking out of Local Optima EMNLP (2013-10-21) 13 / 21

SLIDE 87

Networks Fork/Join (FJ)

Networks: Fork/Join (FJ)

full full sparse counts

F S

a “grammar inductor” will represent FJ subnetworks:

Spitkovsky et al. (Stanford & Google) Breaking out of Local Optima EMNLP (2013-10-21) 13 / 21

SLIDE 88

Networks Iterated Fork/Join (IFJ)

Networks: Iterated Fork/Join (IFJ)

Spitkovsky et al. (Stanford & Google) Breaking out of Local Optima EMNLP (2013-10-21) 14 / 21

SLIDE 89

Networks Iterated Fork/Join (IFJ)

Networks: Iterated Fork/Join (IFJ)

daisy-chain inductors, as in “baby steps” (Spitkovsky et al., 2009)

inputs up to length one up to length two up to length l

Spitkovsky et al. (Stanford & Google) Breaking out of Local Optima EMNLP (2013-10-21) 14 / 21

SLIDE 90

Networks Iterated Fork/Join (IFJ)

Networks: Iterated Fork/Join (IFJ)

daisy-chain inductors, as in “baby steps” (Spitkovsky et al., 2009)

inputs up to length one up to length two up to length l

◮ start with inputs up to length one ⋆ they have unique parses — an easy (convex) case Spitkovsky et al. (Stanford & Google) Breaking out of Local Optima EMNLP (2013-10-21) 14 / 21

SLIDE 91

Networks Iterated Fork/Join (IFJ)

Networks: Iterated Fork/Join (IFJ)

daisy-chain inductors, as in “baby steps” (Spitkovsky et al., 2009)

inputs up to length one up to length two up to length l

◮ start with inputs up to length one ⋆ they have unique parses — an easy (convex) case ◮ output initializes training with slightly longer inputs ⋆ gradually extend solutions to the fully complex target task Spitkovsky et al. (Stanford & Google) Breaking out of Local Optima EMNLP (2013-10-21) 14 / 21

SLIDE 92

Networks Iterated Fork/Join (IFJ)

Networks: Iterated Fork/Join (IFJ)

daisy-chain inductors, as in “baby steps” (Spitkovsky et al., 2009)

inputs up to length one up to length two up to length l

◮ start with inputs up to length one ⋆ they have unique parses — an easy (convex) case ◮ output initializes training with slightly longer inputs ⋆ gradually extend solutions to the fully complex target task Spitkovsky et al. (Stanford & Google) Breaking out of Local Optima EMNLP (2013-10-21) 14 / 21

SLIDE 93

Networks Iterated Fork/Join (IFJ)

Networks: Iterated Fork/Join (IFJ)

daisy-chain inductors, as in “baby steps” (Spitkovsky et al., 2009)

inputs up to length one up to length two up to length l

◮ start with inputs up to length one ⋆ they have unique parses — an easy (convex) case ◮ output initializes training with slightly longer inputs ⋆ gradually extend solutions to the fully complex target task

— an instance of deterministic annealing (Allgower and Georg, 1990;

Rose, 1998)

Spitkovsky et al. (Stanford & Google) Breaking out of Local Optima EMNLP (2013-10-21) 14 / 21

SLIDE 94

Networks Grounded Iterated Fork/Join (GIFJ)

Networks: Grounded Iterated Fork/Join (GIFJ)

Spitkovsky et al. (Stanford & Google) Breaking out of Local Optima EMNLP (2013-10-21) 15 / 21

SLIDE 95

Networks Grounded Iterated Fork/Join (GIFJ)

Networks: Grounded Iterated Fork/Join (GIFJ)

combine purely iterative (IFJ) and static (FJ) networks:

empty-set-of-counts counts-up-to-(l − 1) up to length l

Spitkovsky et al. (Stanford & Google) Breaking out of Local Optima EMNLP (2013-10-21) 15 / 21

SLIDE 96

Networks Grounded Iterated Fork/Join (GIFJ)

Networks: Grounded Iterated Fork/Join (GIFJ)

combine purely iterative (IFJ) and static (FJ) networks:

empty-set-of-counts counts-up-to-(l − 1) up to length l

Spitkovsky et al. (Stanford & Google) Breaking out of Local Optima EMNLP (2013-10-21) 15 / 21

SLIDE 97

Networks Grounded Iterated Fork/Join (GIFJ)

Networks: Grounded Iterated Fork/Join (GIFJ)

combine purely iterative (IFJ) and static (FJ) networks:

empty-set-of-counts counts-up-to-(l − 1) up to length l

Spitkovsky et al. (Stanford & Google) Breaking out of Local Optima EMNLP (2013-10-21) 15 / 21

SLIDE 98

Networks Grounded Iterated Fork/Join (GIFJ)

Networks: Grounded Iterated Fork/Join (GIFJ)

combine purely iterative (IFJ) and static (FJ) networks:

empty-set-of-counts counts-up-to-(l − 1) up to length l full counts-up-to-l

Spitkovsky et al. (Stanford & Google) Breaking out of Local Optima EMNLP (2013-10-21) 15 / 21

SLIDE 99

Networks Grounded Iterated Fork/Join (GIFJ)

Networks: Grounded Iterated Fork/Join (GIFJ)

combine purely iterative (IFJ) and static (FJ) networks:

empty-set-of-counts counts-up-to-(l − 1) up to length l full counts-up-to-l

◮ full network obtained by unrolling the template (as a DBN) Spitkovsky et al. (Stanford & Google) Breaking out of Local Optima EMNLP (2013-10-21) 15 / 21

SLIDE 100

Networks Grounded Iterated Fork/Join (GIFJ)

Networks: Grounded Iterated Fork/Join (GIFJ)

combine purely iterative (IFJ) and static (FJ) networks:

empty-set-of-counts counts-up-to-(l − 1) up to length l full counts-up-to-l

◮ full network obtained by unrolling the template (as a DBN) ⋆ can specify relatively “deep” learning architectures ⋆ without sacrificing (too much) clarity or simplicity Spitkovsky et al. (Stanford & Google) Breaking out of Local Optima EMNLP (2013-10-21) 15 / 21

SLIDE 101

Networks Grounded Iterated Fork/Join (GIFJ)

Networks: Grounded Iterated Fork/Join (GIFJ)

combine purely iterative (IFJ) and static (FJ) networks:

empty-set-of-counts counts-up-to-(l − 1) up to length l full counts-up-to-l

◮ full network obtained by unrolling the template (as a DBN) ⋆ can specify relatively “deep” learning architectures ⋆ without sacrificing (too much) clarity or simplicity ◮ a structured way of organizing optimizers into networks: ⋆ only a handful of primitives here ⋆ would be hard to do without modularity and abstraction ⋆ can understand and improve components in isolation Spitkovsky et al. (Stanford & Google) Breaking out of Local Optima EMNLP (2013-10-21) 15 / 21

SLIDE 102

Results Directed Dependency Accuracies

Results: Directed Dependency Accuracies

Section 23 of English WSJ (all sentences) System DDA (Gimpel and Smith, 2012) 53.1 (Gillenwater et al., 2010) 53.3 (Bisk and Hockenmaier, 2012) 53.3 (Blunsom and Cohn, 2010) 55.7 (Tu and Honavar, 2012) 57.0 (Spitkovsky et al., 2013) 64.4

Spitkovsky et al. (Stanford & Google) Breaking out of Local Optima EMNLP (2013-10-21) 16 / 21

SLIDE 103

Results Unlabeled Constituents

Results: Unlabeled Constituents

Section 23 of English WSJ (all sentences) System F1 F-CCM (Huang et al., 2012) 45.1 LLCCM (Golland et al., 2012) 47.6 CCL (Seginer, 2007) 52.8 PRLG (Ponvert et al., 2011) 54.6 (Spitkovsky et al., 2013) 54.2

Spitkovsky et al. (Stanford & Google) Breaking out of Local Optima EMNLP (2013-10-21) 17 / 21

SLIDE 104

Results Unlabeled Constituents

Results: Unlabeled Constituents

Section 23 of English WSJ (all sentences) System F1 F-CCM (Huang et al., 2012) 45.1 LLCCM (Golland et al., 2012) 47.6 P R CCL (Seginer, 2007) 52.8 54.6 51.1 PRLG (Ponvert et al., 2011) 54.6 60.4 49.8 (Spitkovsky et al., 2013) 54.2 55.6 52.8 Dependency-Based Upper Bound 87.2 100 77.3

Spitkovsky et al. (Stanford & Google) Breaking out of Local Optima EMNLP (2013-10-21) 17 / 21

SLIDE 105

Results Multi-Lingual Dependencies

Results: Multi-Lingual Dependencies

2006/7 CoNLL Data (19 languages): Arabic, Basque, Bulgarian, Catalan, Chinese, Czech, Danish, Dutch, English, German, Greek, Hungarian, Italian, Japanese, Portuguese, Slovenian, Spanish, Swedish, Turkish

Spitkovsky et al. (Stanford & Google) Breaking out of Local Optima EMNLP (2013-10-21) 18 / 21

SLIDE 106

Results Multi-Lingual Dependencies

Results: Multi-Lingual Dependencies

2006/7 CoNLL Data (19 languages): Arabic, Basque, Bulgarian, Catalan, Chinese, Czech, Danish, Dutch, English, German, Greek, Hungarian, Italian, Japanese, Portuguese, Slovenian, Spanish, Swedish, Turkish System DDA (Mareˇ cek and ˇ Zabokrtsk´ y, 2012) 40.0 (Spitkovsky et al., 2012b) 42.9 (Spitkovsky et al., 2013) 48.6

Spitkovsky et al. (Stanford & Google) Breaking out of Local Optima EMNLP (2013-10-21) 18 / 21

SLIDE 107

Results Multi-Lingual Dependencies

Results: Multi-Lingual Dependencies

2006/7 CoNLL Data (19 languages): Arabic, Basque, Bulgarian, Catalan, Chinese, Czech, Danish, Dutch, English, German, Greek, Hungarian, Italian, Japanese, Portuguese, Slovenian, Spanish, Swedish, Turkish System DDA (Mareˇ cek and ˇ Zabokrtsk´ y, 2012) 40.0 (Spitkovsky et al., 2012b) 42.9 (Spitkovsky et al., 2013) 48.6 (Mareˇ cek and Straka, 2013) 48.7

Spitkovsky et al. (Stanford & Google) Breaking out of Local Optima EMNLP (2013-10-21) 18 / 21

SLIDE 108

Conclusion Summary

Conclusion: Summary

Spitkovsky et al. (Stanford & Google) Breaking out of Local Optima EMNLP (2013-10-21) 19 / 21

SLIDE 109

Conclusion Summary

Conclusion: Summary

useful way of merging grammars of different quality:

Spitkovsky et al. (Stanford & Google) Breaking out of Local Optima EMNLP (2013-10-21) 19 / 21

SLIDE 110

Conclusion Summary

Conclusion: Summary

useful way of merging grammars of different quality:

◮ not always easy, e.g., in machine translation (Xiao et al., 2010) Spitkovsky et al. (Stanford & Google) Breaking out of Local Optima EMNLP (2013-10-21) 19 / 21

SLIDE 111

Conclusion Summary

Conclusion: Summary

useful way of merging grammars of different quality:

◮ not always easy, e.g., in machine translation (Xiao et al., 2010)

exploited multiple views of data:

Spitkovsky et al. (Stanford & Google) Breaking out of Local Optima EMNLP (2013-10-21) 19 / 21

SLIDE 112

Conclusion Summary

Conclusion: Summary

useful way of merging grammars of different quality:

◮ not always easy, e.g., in machine translation (Xiao et al., 2010)

exploited multiple views of data:

◮ simple sentences — easy to recognize root words Spitkovsky et al. (Stanford & Google) Breaking out of Local Optima EMNLP (2013-10-21) 19 / 21

SLIDE 113

Conclusion Summary

Conclusion: Summary

useful way of merging grammars of different quality:

◮ not always easy, e.g., in machine translation (Xiao et al., 2010)

exploited multiple views of data:

◮ simple sentences — easy to recognize root words ◮ fragments split on punctuation — learn word associations Spitkovsky et al. (Stanford & Google) Breaking out of Local Optima EMNLP (2013-10-21) 19 / 21

SLIDE 114

Conclusion Summary

Conclusion: Summary

useful way of merging grammars of different quality:

◮ not always easy, e.g., in machine translation (Xiao et al., 2010)

exploited multiple views of data:

◮ simple sentences — easy to recognize root words ◮ fragments split on punctuation — learn word associations ◮ skeleton parses — for recovering correct arc polarities Spitkovsky et al. (Stanford & Google) Breaking out of Local Optima EMNLP (2013-10-21) 19 / 21

SLIDE 115

Conclusion Summary

Conclusion: Summary

useful way of merging grammars of different quality:

◮ not always easy, e.g., in machine translation (Xiao et al., 2010)

exploited multiple views of data:

◮ simple sentences — easy to recognize root words ◮ fragments split on punctuation — learn word associations ◮ skeleton parses — for recovering correct arc polarities

state-of-the-art results for grammar induction:

◮ English WSJ (both dependency and constituency) ◮ 19 languages of the 2006/7 CoNLL data (dependency) Spitkovsky et al. (Stanford & Google) Breaking out of Local Optima EMNLP (2013-10-21) 19 / 21

SLIDE 116

Conclusion Implications

Conclusion: Implications (Why This Matters)

Spitkovsky et al. (Stanford & Google) Breaking out of Local Optima EMNLP (2013-10-21) 20 / 21

SLIDE 117

Conclusion Implications

Conclusion: Implications (Why This Matters)

applicable all over NLP

Spitkovsky et al. (Stanford & Google) Breaking out of Local Optima EMNLP (2013-10-21) 20 / 21

SLIDE 118

Conclusion Implications

Conclusion: Implications (Why This Matters)

applicable all over NLP, even within sampling methods:

Spitkovsky et al. (Stanford & Google) Breaking out of Local Optima EMNLP (2013-10-21) 20 / 21

SLIDE 119

Conclusion Implications

Conclusion: Implications (Why This Matters)

applicable all over NLP, even within sampling methods:

◮ transformed models as seeds to multi-chain MCMC ⋆ e.g., symmetrized models, which would tend to be rejected Spitkovsky et al. (Stanford & Google) Breaking out of Local Optima EMNLP (2013-10-21) 20 / 21

SLIDE 120

Conclusion Implications

Conclusion: Implications (Why This Matters)

applicable all over NLP, even within sampling methods:

◮ transformed models as seeds to multi-chain MCMC ⋆ e.g., symmetrized models, which would tend to be rejected ◮ combining as an alternative to swapping adjacent chains ⋆ e.g., in MCMCMC (Geyer, 1991) Spitkovsky et al. (Stanford & Google) Breaking out of Local Optima EMNLP (2013-10-21) 20 / 21

SLIDE 121

Conclusion Implications

Conclusion: Implications (Why This Matters)

applicable all over NLP, even within sampling methods:

◮ transformed models as seeds to multi-chain MCMC ⋆ e.g., symmetrized models, which would tend to be rejected ◮ combining as an alternative to swapping adjacent chains ⋆ e.g., in MCMCMC (Geyer, 1991)

working title: “the power of forgetting and starting over”

Spitkovsky et al. (Stanford & Google) Breaking out of Local Optima EMNLP (2013-10-21) 20 / 21

SLIDE 122

Conclusion Implications

Conclusion: Implications (Why This Matters)

applicable all over NLP, even within sampling methods:

◮ transformed models as seeds to multi-chain MCMC ⋆ e.g., symmetrized models, which would tend to be rejected ◮ combining as an alternative to swapping adjacent chains ⋆ e.g., in MCMCMC (Geyer, 1991)

working title: “the power of forgetting and starting over”

◮ “unlearning” — an old idea in machine learning ⋆ e.g., regularization, pruning of decision trees, etc. Spitkovsky et al. (Stanford & Google) Breaking out of Local Optima EMNLP (2013-10-21) 20 / 21

SLIDE 123

Conclusion Implications

Conclusion: Implications (Why This Matters)

applicable all over NLP, even within sampling methods:

◮ transformed models as seeds to multi-chain MCMC ⋆ e.g., symmetrized models, which would tend to be rejected ◮ combining as an alternative to swapping adjacent chains ⋆ e.g., in MCMCMC (Geyer, 1991)

working title: “the power of forgetting and starting over”

◮ “unlearning” — an old idea in machine learning ⋆ e.g., regularization, pruning of decision trees, etc. ◮ also important in neuroscience

(Craik and Bialystok, 2006;

⋆ e.g., neuronal shedding

Low and Cheng, 2006)

Spitkovsky et al. (Stanford & Google) Breaking out of Local Optima EMNLP (2013-10-21) 20 / 21

SLIDE 124

Conclusion Implications

Conclusion: Implications (Why This Matters)

applicable all over NLP, even within sampling methods:

◮ transformed models as seeds to multi-chain MCMC ⋆ e.g., symmetrized models, which would tend to be rejected ◮ combining as an alternative to swapping adjacent chains ⋆ e.g., in MCMCMC (Geyer, 1991)

working title: “the power of forgetting and starting over”

◮ “unlearning” — an old idea in machine learning ⋆ e.g., regularization, pruning of decision trees, etc. ◮ also important in neuroscience

(Craik and Bialystok, 2006;

⋆ e.g., neuronal shedding

Low and Cheng, 2006)

◮ some things we learn, which are responsible for our early

success, are also what holds us back later in life...

Spitkovsky et al. (Stanford & Google) Breaking out of Local Optima EMNLP (2013-10-21) 20 / 21

SLIDE 125

Conclusion Thanks! Questions?

Thanks!

Questions?

Spitkovsky et al. (Stanford & Google) Breaking out of Local Optima EMNLP (2013-10-21) 21 / 21