Predicting structures: Practical concerns CS 6355: Structured - - PowerPoint PPT Presentation

predicting structures practical concerns
SMART_READER_LITE
LIVE PREVIEW

Predicting structures: Practical concerns CS 6355: Structured - - PowerPoint PPT Presentation

Predicting structures: Practical concerns CS 6355: Structured Prediction 1 So far What are structures? A graph A collection of parts that are scored jointly A collection of interconnected decisions Conditional


slide-1
SLIDE 1

CS 6355: Structured Prediction

Predicting structures: Practical concerns

1

slide-2
SLIDE 2

So far…

  • What are structures?

– A graph – A collection of parts that are scored jointly – A collection of interconnected decisions

  • Conditional models

– We want to convert some input to an output – Model the conditional distribution of the output – Score groups of inter-connected variables

  • Algorithms for learning

– Local vs. global learning – Different algorithms

  • Inference algorithms

– Predicting the final output – Different algorithms, tradeoffs

2

slide-3
SLIDE 3

Using the tools: Practical concerns

  • What are structures?

– A graph – A collection of parts that are scored jointly – A collection of interconnected decisions

  • Conditional models

– We want to convert some input to an output – Model the conditional distribution of the output – Score groups of inter-connected variables

  • Algorithms for learning

– Local vs. global learning – Different algorithms

  • Inference algorithms

– Predicting the final output – Different algorithms, tradeoffs

3

This lecture:

slide-4
SLIDE 4

Using the tools: Practical concerns

  • What are structures?

– A graph – A collection of parts that are scored jointly – A collection of interconnected decisions

  • Conditional models

– We want to convert some input to an output – Model the conditional distribution of the output – Score groups of inter-connected variables

  • Algorithms for learning

– Local vs. global learning – Different algorithms

  • Inference algorithms

– Predicting the final output – Different algorithms, tradeoffs

4

This lecture:

  • We want to solve a task.
  • Many choices ahead!
slide-5
SLIDE 5

Using the tools: Practical concerns

  • What are structures?

– A graph – A collection of parts that are scored jointly – A collection of interconnected decisions

  • Conditional models

– We want to convert some input to an output – Model the conditional distribution of the output – Score groups of inter-connected variables

  • Algorithms for learning

– Local vs. global learning – Different algorithms

  • Inference algorithms

– Predicting the final output – Different algorithms, tradeoffs

5

This lecture:

  • We want to solve a task.
  • Many choices ahead!

What is the graph?

slide-6
SLIDE 6

Using the tools: Practical concerns

  • What are structures?

– A graph – A collection of parts that are scored jointly – A collection of interconnected decisions

  • Conditional models

– We want to convert some input to an output – Model the conditional distribution of the output – Score groups of inter-connected variables

  • Algorithms for learning

– Local vs. global learning – Different algorithms

  • Inference algorithms

– Predicting the final output – Different algorithms, tradeoffs

6

This lecture:

  • We want to solve a task.
  • Many choices ahead!

What is the graph?

  • Modeling our problem?
  • Identifying variables?
  • Identifying groups that are

scored together? (factors)

  • What are features?
slide-7
SLIDE 7

Using the tools: Practical concerns

  • What are structures?

– A graph – A collection of parts that are scored jointly – A collection of interconnected decisions

  • Conditional models

– We want to convert some input to an output – Model the conditional distribution of the output – Score groups of inter-connected variables

  • Algorithms for learning

– Local vs. global learning – Different algorithms

  • Inference algorithms

– Predicting the final output – Different algorithms, tradeoffs

7

This lecture:

  • We want to solve a task.
  • Many choices ahead!

What is the graph? The best way to learn?

  • Modeling our problem?
  • Identifying variables?
  • Identifying groups that are

scored together? (factors)

  • What are features?
slide-8
SLIDE 8

Using the tools: Practical concerns

  • What are structures?

– A graph – A collection of parts that are scored jointly – A collection of interconnected decisions

  • Conditional models

– We want to convert some input to an output – Model the conditional distribution of the output – Score groups of inter-connected variables

  • Algorithms for learning

– Local vs. global learning – Different algorithms

  • Inference algorithms

– Predicting the final output – Different algorithms, tradeoffs

8

This lecture:

  • We want to solve a task.
  • Many choices ahead!

What is the graph? The best way to learn?

  • Modeling our problem?
  • Identifying variables?
  • Identifying groups that are

scored together? (factors)

  • What are features?

What inference algorithm?

slide-9
SLIDE 9

Modeling your problem

  • Understand the problem: What should your program produce?

– Is there data? Very often, the answer is no. L

  • What are the decisions/random variables that constitute the output?
  • How do they interact? Identifying factors/parts

– Some interactions are natural, some are spurious (specific to your small collection of data) – Some interactions make inference impossible for computational reasons – What are the feature representations?

  • Learning

– What are the scoring functions? – Should every scoring function be jointly learned? – Perhaps, learn sub-sections independently and put them together with inference at the end – Which learning algorithm?

  • Inference

– What algorithm? How expensive is it? – Exact or approximate?

9

slide-10
SLIDE 10

Example 0: Named Entity Recognition

Goal: To identify persons, locations and organizations in text Facebook CEO Mark Zuckerberg announced new privacy features in the conference in San Francisco

10

slide-11
SLIDE 11

Example 0: Named Entity Recognition

Goal: To identify persons, locations and organizations in text Facebook CEO Mark Zuckerberg announced new privacy features in the conference in San Francisco

11

Organization Person Location

slide-12
SLIDE 12

Example 0: Named Entity Recognition

Goal: To identify persons, locations and organizations in text Facebook CEO Mark Zuckerberg announced new privacy features in the conference in San Francisco

12

Organization Person Location

Design choices:

  • 1. What are the set of decisions the predictor needs to make?
  • 2. How do these decisions interact? Factors?
  • 3. Features? Factor potentials/scoring functions?
  • 4. Learning? Inference?
slide-13
SLIDE 13

Example 0: Named Entity Recognition

Goal: To identify persons, locations and organizations in text

13

PER LOC ORG NONE Facebook ✗ ✗ ✓ ✗ Facebook CEO ✗ ✗ ✗ ✓ Facebook CEO Mark ✗ ✗ ✗ ✓ Facebook CEO Mark Zuckerberg ✗ ✗ ✗ ✓ … Mark Zuckerberg ✓ ✗ ✗ ✗ ….

What are the set of decisions the predictor needs to make? One option: Label spans of text

slide-14
SLIDE 14

Example 0: Named Entity Recognition

Goal: To identify persons, locations and organizations in text

14

PER LOC ORG NONE Facebook ? ? ? ? Facebook CEO ? ? ? ? Facebook CEO Mark ? ? ? ? Facebook CEO Mark Zuckerberg ? ? ? ? … Mark Zuckerberg ? ? ? ? ….

How do the decisions interact? A single word can have only one label

slide-15
SLIDE 15

Example 0: Named Entity Recognition

Goal: To identify persons, locations and organizations in text

15

PER LOC ORG NONE Facebook ✓ ? ? ? Facebook CEO ✓ ? ? ? Facebook CEO Mark ? ? ? ? Facebook CEO Mark Zuckerberg ? ? ? ? … Mark Zuckerberg ? ? ? ? ….

How do the decisions interact? A single word can have only one label Disallowed together

slide-16
SLIDE 16

Example 0: Named Entity Recognition

Goal: To identify persons, locations and organizations in text

16

PER LOC ORG NONE Facebook ✓ ? ? ? Facebook CEO ✓ ? ? ? Facebook CEO Mark ? ? ? ? Facebook CEO Mark Zuckerberg ? ? ? ? … Mark Zuckerberg ? ? ? ? ….

Features? Factor potentials/scoring functions? Score(span, label)

  • Could be linear in features
  • Could be a neural network

Disallowed together

slide-17
SLIDE 17

Example 0: Named Entity Recognition

Goal: To identify persons, locations and organizations in text

17

PER LOC ORG NONE Facebook ✓ ? ? ? Facebook CEO ✓ ? ? ? Facebook CEO Mark ? ? ? ? Facebook CEO Mark Zuckerberg ? ? ? ? … Mark Zuckerberg ? ? ? ? ….

Learning and inference Various learning regimes Various inference algorithms Disallowed together

slide-18
SLIDE 18

Example 0: Named Entity Recognition

Goal: To identify persons, locations and organizations in text

18

A different modeling choice: One label per word

slide-19
SLIDE 19

Example 0: Named Entity Recognition

Goal: To identify persons, locations and organizations in text Facebook CEO Mark Zuckerberg announced new privacy features in the conference in San Francisco

19

A different modeling choice: One label per word

slide-20
SLIDE 20

Example 0: Named Entity Recognition

Goal: To identify persons, locations and organizations in text Facebook CEO Mark Zuckerberg announced new privacy features in the conference in San Francisco

20

B-org = Start of organization B-per = Start of person I-per = In person B-loc = Start of location I-loc = In location O = Not a named entity A different modeling choice: One label per word

slide-21
SLIDE 21

Example 0: Named Entity Recognition

Goal: To identify persons, locations and organizations in text

B-org O B-per I-per O O

Facebook CEO Mark Zuckerberg announced new

O O O O O O B-loc I-loc

privacy features in the conference in San Francisco

21

B-org = Start of organization B-per = Start of person I-per = In person B-loc = Start of location I-loc = In location O = Not a named entity A different modeling choice: One label per word

slide-22
SLIDE 22

Example 0: Named Entity Recognition

Goal: To identify persons, locations and organizations in text

B-org O B-per I-per O O

Facebook CEO Mark Zuckerberg announced new

O O O O O O B-loc I-loc

privacy features in the conference in San Francisco

22

A different modeling choice: One label per word This modeling choice offers its own design choices

  • 1. How do these decisions interact? Factors?
  • 2. Features?
  • 3. Learning? Inference?
slide-23
SLIDE 23

Example 1: Detecting objects and parts

23

[Farhadi, et al] Let’s discuss the choices we have:

  • 1. What are the set of decisions the predictor needs to make?
  • 2. How do these decisions interact? Factors?
  • 3. Features?
  • 4. Learning? Inference?
slide-24
SLIDE 24

Example 2: Information extraction

Philae is a robotic European Space Agency lander that accompanied the Rosetta spacecraft until its designated landing on Comet 67P/Churyumov–Gerasimenko (67P), more than ten years after departing Earth. On 12 November 2014, the lander achieved the first-ever controlled touchdown on a comet nucleus. Its instruments are expected to obtain the first images from a comet's surface and make the first in situ analysis to determine its composition. Philae is tracked and operated from the European Space Operations Centre (ESOC) at Darmstadt, Germany.

24

slide-25
SLIDE 25

Example 2: Information extraction

Philae is a robotic European Space Agency lander that accompanied the Rosetta spacecraft until its designated landing on Comet 67P/Churyumov–Gerasimenko (67P), more than ten years after departing Earth. On 12 November 2014, the lander achieved the first-ever controlled touchdown on a comet nucleus. Its instruments are expected to obtain the first images from a comet's surface and make the first in situ analysis to determine its composition. Philae is tracked and operated from the European Space Operations Centre (ESOC) at Darmstadt, Germany.

25

How do we model this problem? Touchdown Lander Philae Destination Comet 67P When? 12 November 2014

slide-26
SLIDE 26

Example 2: Information extraction

Philae is a robotic European Space Agency lander that accompanied the Rosetta spacecraft until its designated landing on Comet 67P/Churyumov–Gerasimenko (67P), more than ten years after departing Earth. On 12 November 2014, the lander achieved the first-ever controlled touchdown on a comet nucleus. Its instruments are expected to obtain the first images from a comet's surface and make the first in situ analysis to determine its composition. Philae is tracked and operated from the European Space Operations Centre (ESOC) at Darmstadt, Germany.

26

Philae

slide-27
SLIDE 27

Example 2: Information extraction

Philae is a robotic European Space Agency lander that accompanied the Rosetta spacecraft until its designated landing on Comet 67P/Churyumov–Gerasimenko (67P), more than ten years after departing Earth. On 12 November 2014, the lander achieved the first-ever controlled touchdown on a comet nucleus. Its instruments are expected to obtain the first images from a comet's surface and make the first in situ analysis to determine its composition. Philae is tracked and operated from the European Space Operations Centre (ESOC) at Darmstadt, Germany.

27

Philae Comet 67P

slide-28
SLIDE 28

Example 2: Information extraction

Philae is a robotic European Space Agency lander that accompanied the Rosetta spacecraft until its designated landing on Comet 67P/Churyumov–Gerasimenko (67P), more than ten years after departing Earth. On 12 November 2014, the lander achieved the first-ever controlled touchdown on a comet nucleus. Its instruments are expected to obtain the first images from a comet's surface and make the first in situ analysis to determine its composition. Philae is tracked and operated from the European Space Operations Centre (ESOC) at Darmstadt, Germany.

28

Philae Comet 67P Touchdown 12 November 2014 Lander Dest. When

slide-29
SLIDE 29

Example 2: Information extraction

Philae is a robotic European Space Agency lander that accompanied the Rosetta spacecraft until its designated landing on Comet 67P/Churyumov–Gerasimenko (67P), more than ten years after departing Earth. On 12 November 2014, the lander achieved the first-ever controlled touchdown on a comet nucleus. Its instruments are expected to obtain the first images from a comet's surface and make the first in situ analysis to determine its composition. Philae is tracked and operated from the European Space Operations Centre (ESOC) at Darmstadt, Germany.

29

How do we model this problem? Philae Comet 67P Touchdown 12 November 2014 Lander Dest. When

slide-30
SLIDE 30

Example 2: Information extraction

Philae is a robotic European Space Agency lander that accompanied the Rosetta spacecraft until its designated landing on Comet 67P/Churyumov–Gerasimenko (67P), more than ten years after departing Earth. On 12 November 2014, the lander achieved the first-ever controlled touchdown on a comet nucleus. Its instruments are expected to obtain the first images from a comet's surface and make the first in situ analysis to determine its composition. Philae is tracked and operated from the European Space Operations Centre (ESOC) at Darmstadt, Germany.

30

Philae Comet 67P Touchdown 12 November 2014 Lander Dest. When Let’s discuss the choices we have:

  • 1. What are the set of decisions the predictor

needs to make?

  • 2. How do these decisions interact? Factors?
  • 3. Features?
  • 4. Learning? Inference?
slide-31
SLIDE 31

Modeling your problem

  • Understand the problem: What should your program produce?

– Is there data? Very often, the answer is no. L

  • What are the decisions/random variables that constitute the output?
  • How do they interact? Identifying factors/parts

– Some interactions are natural, some are spurious (specific to your small collection of data) – Some interactions make inference impossible for computational reasons – What are the feature representations?

  • Learning

– What are the scoring functions? – Should every scoring function be jointly learned? – Perhaps, learn sub-sections independently and put them together with inference at the end – Which learning algorithm?

  • Inference

– What algorithm? How expensive is it? – Exact or approximate?

31