Course Introduction
2
10-418 / 10-618 Machine Learning for Structured Data
Matt Gormley Lecture 1
- Aug. 26, 2019
Machine Learning Department School of Computer Science Carnegie Mellon University
Course Introduction Matt Gormley Lecture 1 Aug. 26, 2019 2 How - - PowerPoint PPT Presentation
10-418 / 10-618 Machine Learning for Structured Data Machine Learning Department School of Computer Science Carnegie Mellon University Course Introduction Matt Gormley Lecture 1 Aug. 26, 2019 2 How to define a structured prediction problem
Course Introduction
2
10-418 / 10-618 Machine Learning for Structured Data
Matt Gormley Lecture 1
Machine Learning Department School of Computer Science Carnegie Mellon University
STRUCTURED PREDICTION
How to define a structured prediction problem
3
Structured vs. Unstructured Data
Structured Data Examples
Unstructured Data Examples
4
ﻣﺳﺎء اﻟﺧﯾر! ﻣرﺣﺑﺎ ﺑﻛم ﻓﻲ اﻟدرﺟﺔ
Structured vs. Unstructured Data
Select all that apply: Which of the following are structured data? q spreadsheet q XML data q JSON data q mathematical equations
5
Answer:
Structured Prediction
for classification
– Given observations: x = (x1, x2, …, xK) – Predict a (binary) label: y
structured prediction
– Given observations: x = (x1, x2, …, xK) – Predict a structure: y = (y1, y2, …, yJ)
latent structure
7
Structured Prediction
Classification / Regression
1. Input can be semi- structured data 2. Output is a single number (integer / real) 3. In linear models, features can be arbitrary combinations of [input,
4. Output space is small 5. Inference is trivial
Structured Prediction
1. Input can be semi-structured data 2. Output is a sequence of numbers representing a structure 3. In linear models, features can be arbitrary combinations of [input, output] pair 4. Output space may be exponentially large in the input space 5. Inference problems are NP-hard
require approximations
8
Structured Prediction Examples
– Part-of-speech (POS) tagging – Handwriting recognition – Speech recognition – Object detection – Scene understanding – Machine translation – Protein sequencing
9
n n v d n Sample 2:
time like flies an arrow
Part-of-Speech (POS) Tagging
10
n v p d n Sample 1:
time like flies an arrow
p n n v v Sample 4:
with you time will see
n v p n n Sample 3:
flies with fly their wings
n n v d n Sample 2:
time like flies an arrow
Dataset for Supervised Part-of-Speech (POS) Tagging
11
n v p d n Sample 1:
time like flies an arrow
p n n v v Sample 4:
with you time will see
n v p n n Sample 3:
flies with fly their wings
D = {x(n), y(n)}N
n=1
Data:
y(1) x(1) y(2) x(2) y(3) x(3) y(4) x(4)
Handwriting Recognition
12
Figures from (Chatzis & Demiris, 2013) u e p c t Sample 1: n x e d e v l a i c Sample 2:
n e b a e s Sample 2: m r c
Dataset for Supervised Handwriting Recognition
13
D = {x(n), y(n)}N
n=1
Data:
Figures from (Chatzis & Demiris, 2013) u e p c t Sample 1:
y(1) x(1)
n x e d e v l a i c Sample 2:
n e b a e s Sample 2: m r c
y(2) x(2) y(3) x(3)
Dataset for Supervised Phoneme (Speech) Recognition
14
D = {x(n), y(n)}N
n=1
Data:
Figures from (Jansen & Niyogi, 2013) h# ih w z iy Sample 1:
y(1) x(1)
dh s uh iy z f r s h# Sample 2: ao ah s
y(2) x(2)
Case Study: Object Recognition
Data consists of images x and labels y.
15
pigeon leopard llama rhinoceros y(1) x(1) y(2) x(2) y(4) x(4) y(3) x(3)
Case Study: Object Recognition
Data consists of images x and labels y.
16
“patches”
describing the object’s parts (e.g. head, leg, tail, torso, grass)
leopard
model with these latent variables in mind
train or test time
Case Study: Object Recognition
Data consists of images x and labels y.
17
“patches”
describing the object’s parts (e.g. head, leg, tail, torso, grass)
leopard
model with these latent variables in mind
train or test time
X1 Z1 X2 Z2 X3 Z3 X4 Z4 X5 Z5 X7 Z7 X
6Z
6Y
Case Study: Object Recognition
Data consists of images x and labels y.
18
“patches”
describing the object’s parts (e.g. head, leg, tail, torso, grass)
leopard
model with these latent variables in mind
train or test time
ψ2 ψ4 X1 Z1 ψ1 X2 Z2 ψ3 X3 Z3 ψ5 X4 Z4 ψ7 X5 Z5 ψ9 X7 Z7 ψ1 X
6Z
6ψ
1ψ4 ψ4 Y
Structured Prediction
19
Preview of challenges to come…
assignment to the output
Classification Structured Prediction
ˆ y =
y
p(y|) where y ∈ {+1, −1} ˆ =
where ∈ Y and |Y| is very large
Structured Prediction
20
Data Model
Learning Inference
(Inference is usually called as a subroutine in learning)
time flies like an arrowObjective
X1 X3 X2 X4 X5
Structured Prediction
21
The data inspires the structures we want to predict It also tells us what to optimize Our model defines a score for each structure
Learning tunes the parameters of the model Inference finds {best structure, marginals,
partition function}for a
new observation
Domain Knowledge Mathematical Modeling Optimization Combinatorial OptimizationML
(Inference is usually called as a subroutine in learning)
Decomposing a Structure into Parts
– amenable to efficient inference – enable natural parameter sharing during learning – easier definition of fine-grained loss functions – clearer depiction of model’s uncertainty – easier specification of interactions between the parts – (may) lead to natural definition of a search problem
prediction
22
Scene Understanding
24
– boundaries of image regions – tags of regions
– semantic plausibility of nearby tags – continuity of tags across visually similar regions (i.e. patches) (Li et al., 2009)
Labels with top-down information
Scene Understanding
25
– boundaries of image regions – tags of regions
– semantic plausibility of nearby tags – continuity of tags across visually similar regions (i.e. patches) (Li et al., 2009)
Labels without top-down information
Word Alignment / Phrase Extraction
– For each (Chinese phrase, English phrase) pair, are they linked?
– Word fertilities – Few “jumps” (discontinuities) – Syntactic reorderings – “ITG contraint” on alignment – Phrases are disjoint (?)
26
(Burkett & Klein, 2012)
Congressional Voting
27
(Stoyanov & Eisner, 2012)
– Text of all speeches of a representative – Local contexts of references between two representatives
– Words used by representative and their vote – Pairs of representatives and their local context
Medical Diagnosis
28
– content of text field – checkmark – dropdown menu
– groups of related symptoms (e.g. that are predictive of a disease) – social history (e.g. smoker) and symptoms – risk factors (e.g. infant) and lab results
Wikipedia Infoboxes
29
Exercise: Wikipedia Infoboxes
Question: Suppose you want to populate missing infobox fields. 1. What are the variables? 2. What are the interactions?
30
Answer:
ROADMAP
31
Roadmap by Contrasts
– locally normalized vs. globally normalized – generative vs. discriminative – treewidth: high vs. low – cyclic vs. acyclic graphical models – exponential family vs. neural – deep vs. shallow (when viewed as neural network)
– exact vs. approximate (and which models admit which) – dynamic programming vs. sampling vs. optimization
– MAP vs. marginal vs. partition function
– fully-supervised vs. partially- supervised (latent variable models) vs. unsupervised – partially-supervised vs. semi- supervised (missing some variable values vs. missing labels for entire instances) – loss-aware vs. not – probabilistic vs. non- probabilistic – frequentist vs. Bayesian
32
Roadmap by Example
Whiteboard:
– Starting point: fully supervised HMM – modifications to the model, inference, and learning – corresponding technical terms of the result
33
SYLLABUS HIGHLIGHTS
44
Syllabus Highlights
The syllabus is located on the course webpage: http://418.mlcourse.org http://618.mlcourse.org The course policies are required reading.
45
…cs.cmu.edu…
Syllabus Highlights
midterm, 25% final, 5% participation
midterm, 15% final, 5% participation, 15% project
Thu, Oct. 17
TBD
– 6 grace days for homework assignments – Late submissions: 80% day 1, 60% day 2, 40% day 3, 20% day 4 – No submissions accepted after 4 days w/o extension – Extension requests: see syllabus
time/place as lecture (optional, interactive sessions)
recommended for after lecture
Autolab (programming), Canvas (quiz-style), Gradescope (open- ended)
– Collaboration encouraged, but must be documented – Solutions must always be written independently – No re-use of found code / past assignments – Severe penalties (i.e.. failure)
Calendar on “People” page
46
Lectures
– Interrupting (by raising a hand) to ask your question is strongly encouraged – Asking questions later (or in real time) on Piazza is also great
– I want you to answer – Even if you don’t answer, think it through as though I’m about to call on you
at my office hours)
47
Textbooks
You are not required to read a textbook, but Koller & Friedman is a thorough reference text that includes a lot of the topics we cover.
48
Prerequisites
What they are:
(i.e. 10-301, 10-315, 10-601, 10-701)
general programming language.
– Some homework may require you to use Python, so you will need to at least be proficient in the basics of Python.
algebra, and discrete mathematics.
49
Project (10-618 only)
– Present an empirical comparison of competing methods for a task of your choice – For example:
– Deeper understanding of methods in real-world application
1. Team Formation 2. Proposal 3. Midway Report 4. Final Report 5. Video Presentation
50
52