Light-Supervision of Structured Prediction Energy Networks Andrew - PowerPoint PPT Presentation

Light-Supervision of Structured Prediction Energy Networks Andrew McCallum SPENs Generalized Expectation Pedram Aishwarya [2016] [Mann; Druck 2010-12] Rooshenas Kamath David Belanger Greg Druck Oregon PhD → UMass Postdoc UMass MS UMass PhD → Google Brain UMass PhD → Yummly

Light-Supervision Prior Knowledge as Generalized Expectation …induces extra structural dependencies… Structured Prediction Complex dependencies with SPENs

Chapter 1 Generalized Expectation

Learning from small labeled data

Leverage unlabeled data

Family 1: Expectation Maximization [Dempster, Laird, Rubin, 1977]

Family 2: Graph-Based Methods [Szummer, Jaakkola, 2002] [Zhu, Ghahramani, 2002]

Family 3: Auxiliary-Task Methods [Ando and Zhang, 2005]

Family 4: Boundary in Sparse Region Transductive SVMs [Joachims, 1999]: Sparsity measured by margin Entropy Regularization [Grandvalet & Bengio, 2005]: minimize label entropy

Family 4: Boundary in Sparse Region Family 5: Generalized Expectation Criteria [Mann, McCallum 2010; Druck, Mann, McCallum 2011, Druck McCallum 2012] Transductive SVMs [Joachims, 1999]: Sparsity measured by margin Entropy Regularization [Grandvalet & Bengio, 2005]: minimize label entropy best solution? Label Proportions Student Faculty Label | Feature Label Prior 100 Expectations Expectations E [ p(y|f(x)) ] E [ p(y) ] 50 0

Expectations on Labels | Features   Classifying Baseball versus Hockey Generalized Traditional Expectation Human Brainstorm Labeling a few Effort Keywords p(HOCKEY | “puck”) = .9 ball puck field ice bat stick (Semi-)Supervised Training via Semi-Supervised Training via Maximum Likelihood Generalized Expectation

Labeling Features ~1000 unlabeled examples features labeled . . . hockey goal ball batting Edmonton Oilers baseball Buffalo Oilers base Toronto Maple HR Leafs Leafs Sox NHL Pittsburgh Mets puck Pens Bruins Penguins Lemieux runs Penguins Accuracy 85% 92% 94.5% 96%

Accuracy per Human E ff ort Labeling features Test accuracy Labeling instances Labeling time in seconds

Prior Knowledge Feature labels from humans baseball / hockey classification baseball hockey hit puck braves goal runs nhl many other sources resources on the web --- --- --- --- --- --- data from related tasks ----- -- ----- -- ----- -- -- --- -- --- -- --- W. H. Enright. Improving the efficiency of matrix operations in the numerical solution of stiff --- --- --- --- --- --- ordinary differential equations. ACM Trans. Math. ----- -- ----- -- ----- -- -- --- -- --- -- --- Softw., 4(2), 127-136, June 1978.

Generalized Expectation (GE) input variables output variables O ( θ ) = S (E ˜ p ( x ) [E p ( y | x ; θ ) [ g ( x , y )]]) + p ( θ ) constraint features returns 1 if x contains “hit” and y is baseball

Generalized Expectation (GE) assume general CRF [Lafferty et al. 01] 1 θ > f ( x , y ) � � p ( y | x ; θ ) = exp Z θ , x O ( θ ) = S (E ˜ p ( x ) [E p ( y | x ; θ ) [ g ( x , y )]]) + p ( θ ) model features model distribution model probability of baseball if x contains “hit”

Generalized Expectation (GE) O ( θ ) = S (E ˜ p ( x ) [E p ( y | x ; θ ) [ g ( x , y )]]) + p ( θ ) empirical distribution (can be defined as)   model’s probability that documents that contain “hit” are labeled baseball

Generalized Expectation (GE) (soft) expectation constraint O ( θ ) = S (E ˜ p ( x ) [E p ( y | x ; θ ) [ g ( x , y )]]) + p ( θ ) score function larger score if model expectation matches prior knowledge

Generalized Expectation (GE) Objective Function O ( θ ) = S (E ˜ p ( x ) [E p ( y | x ; θ ) [ g ( x , y )]]) + r ( θ ) regularization

GE Score Functions O ( θ ) = S (E ˜ p ( x ) [E p ( y | x ; θ ) [ g ( x , y )]]) + r ( θ ) model expectations target expectations ˆ g = g θ = � 2 � �� ˆ � �� 2 ( θ ) = − S l 2 squared error: g − g θ 2 target expectations model expectations } “puck” g = ˆ g θ = } “hit” g q log ˆ g q X KL divergence: S KL ( θ ) = − ˆ g θ ,q q

Estimating Parameters with GE O ( θ ) = S (E ˜ p ( x ) [E p ( y | x ; θ ) [ g ( x , y )]]) + r ( θ ) v i = ˆ g i v i = − 2(ˆ g i − g θ i ) violation term: sq. error: KL: g θ i ⇥ θ O ( θ ) = v � ⇣ p ( x ) [E p ( y | x ; θ ) [ g ( x , y ) f ( x , y ) � ] E ˜ violation ⌘ � E p ( y | x ; θ ) [ g ( x , y )]E p ( y | x ; θ ) [ f ( x , y ) � ]] + ⇥ θ r ( θ ) estimated covariance between model and constraint features

Learning About Unconstrained Features Trained Model generalizes beyond hit prior knowledge run hit pitcher unlabeled puck GE learned through puck covariance model goal NHL

Generalized Expectation criteria   Easy communication with domain experts • Inject domain knowledge into parameter estimation • Like “informative prior”... • ...but rather than the “language of parameters”   (di ffi cult for humans to understand) • ...use the “language of expectations”   (natural for humans)

IID Structured Prediction “classification” e.g. logistic regression Example: Spam Filtering Predicted Not Not Spam Spam Y Spam Spam Observed @ @ @ @ X

Structured Prediction 羅穆尼頭號對⼿扌桑托倫在三州勝選，⽽耍⾦釒瑞契只 e.g. “sequence labeling” Chinese Word Segmentation 贏得喬治亞州的初選。羅穆尼⾯靣臨的⼀丁⼤夨挑戰是， O ( θ ) = S (E ˜ p ( x ) [E p ( y | x ; θ ) [ g ( x , y )]]) + r ( θ ) 其他共和黨總統參選⼈亻⺫⽬盯前均表 Linear-chain CRF GE v > X X X p ( y i � 1 , y i , y j | x ; θ ) g ( x , y j , j ) f ( x , y i � 1 , y i , i ) > Gradient marginal over three, i j y non-consecutive positions Not Not Y Start Start Start Start 中国⼈亻民 X C h i n e s e P e o p l e

Natural Expectations lead to Di ffi cult Training Inference “AUTHOR field should be contiguous, only appearing once.” AUTHOR AUTHOR Anna Popescu (2004), “Interactive Clustering,” EDITOR EDITOR Wei Li (Ed.), Learning Handbook, Athos Press, LOCATION Souroti. The downfall of GE. p(y i-1 , y i , y j , y k )

Chapter 2 A framework providing easier inference for complex dependencies? Structured Prediction Energy Networks Deep Learning + Structured Prediction

Structured Prediction “classification” e.g. logistic regression Example: Spam Filtering Predicted Not Not Spam Spam Y Spam Spam =argmin Y E (Y;X)= 횺 Factor Factor Factor Factor Observed @ @ @ @ X

Structured Prediction 羅穆尼頭號對⼿扌桑托倫在三州勝選，⽽耍⾦釒瑞契只 e.g. “sequence labeling” 贏得喬治亞州的初選。羅穆尼⾯靣臨的⼀丁⼤夨挑戰是， Example: Chinese Word Segmentation 其他共和黨總統參選⼈亻⺫⽬盯前均表 E (Y,Y) Not Not Y Start Start Start Start =argmin Y E (Y;X) 中国⼈亻民 X C h i n e s e P e o p l e

Structured Prediction 羅穆尼頭號對⼿扌桑托倫在三州勝選，⽽耍⾦釒瑞契只 e.g. “sequence labeling” 贏得喬治亞州的初選。羅穆尼⾯靣臨的⼀丁⼤夨挑戰是， Example: Chinese Word Segmentation 其他共和黨總統參選⼈亻⺫⽬盯前均表 E (Y,Y) Not Not Y Start Start Start Start E (Y;X) Feature Engineering 中国⼈亻民 X C h i n e s e P e o p l e

Structured Prediction 羅穆尼頭號對⼿扌桑托倫在三州勝選，⽽耍⾦釒瑞契只 e.g. “sequence labeling” 贏得喬治亞州的初選。羅穆尼⾯靣臨的⼀丁⼤夨挑戰是， Example: Chinese Word Segmentation 其他共和黨總統參選⼈亻⺫⽬盯前均表 E (Y,Y) “Hidden Unit Conditional Random Fields” Maaten, Welling, Saul, AISTATS 2011 Not Not Y Start Start Start Start E (Y,Z;X) Feature Engineering Z 1 Z 2 Z 3 Z 4 中国⼈亻民 X C h i n e s e P e o p l e

Structured Prediction 羅穆尼頭號對⼿扌桑托倫在三州勝選，⽽耍⾦釒瑞契只 e.g. “sequence labeling” 贏得喬治亞州的初選。羅穆尼⾯靣臨的⼀丁⼤夨挑戰是， Example: Chinese Word Segmentation 其他共和黨總統參選⼈亻⺫⽬盯前均表 E (Y,Y) Not Not Y Start Start Start Start E (Y,Z;X) Feature Engineering Z 1 Z 2 Z 3 Z 4 中国⼈亻民 X C h i n e s e P e o p l e

Structured Prediction 羅穆尼頭號對⼿扌桑托倫在三州勝選，⽽耍⾦釒瑞契只 e.g. “sequence labeling” 贏得喬治亞州的初選。羅穆尼⾯靣臨的⼀丁⼤夨挑戰是， Example: Chinese Word Segmentation 其他共和黨總統參選⼈亻⺫⽬盯前均表 E (Y,Y) Dependency structure Not Not Y Start Start Start Start E (Y,Z;X) Feature Engineering Z 1 Z 2 Z 3 Z 4 中国⼈亻民 X C h i n e s e P e o p l e

Light-Supervision of Structured Prediction Energy Networks Andrew - PowerPoint PPT Presentation

Light-Supervision of Structured Prediction Energy Networks Andrew McCallum SPENs Generalized Expectation Pedram Aishwarya [2016] [Mann; Druck 2010-12] Rooshenas Kamath David Belanger Greg Druck Oregon PhD UMass Postdoc UMass MS

Noise2Self: Blind Denoising by Self-Supervision Joshua Batson Loc Royer Noisy Data

Supervision Strengthening Our Practice The plan Supervision what is it? Benefits

Supervision Mandatory Webinar 4 Webinar overview I. Background II. Why supervision? III.

Red- -Light Running Light Running Red Red-Light Running 2 Traffic Signals Traffic Signals

Red- -Light Running Light Running Red Red-Light Running 2 Traffic Signals Traffic Signals

Outline Light Real light How humans see light How computers trick humans into

Light Energy Gabriella Bicknell Mrs.Branin Grade 5 What is Light? Light is like sound. We

light right light right light right light right to steady the tongue, hold the sides of

What is Light ? Discussion Questions: 1) What is light? 2) How fast does light travel? 3) What

Computer Graphics - Light Transport - Philipp Slusallek LIGHT 2 What is Light ?

Group and Commercial Insurer Supervision Presenter: Gerald Gakundi Assistant Director of

Properties of Light All About Light What is light? It is a small part of the EM spectrum, but it

What is light? Light is a form of energy that can be detected by the human eye. Light cannot be

Chapter 5 Light: The Cosmic Messenger 5.1 Basic Properties of Light and Matter Our goals for

Let There be Light Let There be Light: Let There be Light: Let There be Light Climatic

BEACHWOOD BEACHWOOD LIGHT UP LIGHT UP LIGHT UP LIGHT UP BEACHWOOD BEACHWOOD Friday October

CHALLENGES AND APPLICATIONS with : N. Bassiliades, W. Groves, M. Laliotis, N. Markantonatos,

EPAs Integrated Risk Information System (IRIS) IRIS Process, Opportunities for Stakeholder

Su us st ta ai in na ab bl le e O Op pi iu um m P Po op pp py y S El li

Kitchen Table Climate Conversation - Agenda: 1] Introductions including Why we Bother?

Educatio ation acro ross s NHGRIs Ge Genomic ic Medicine cine Resea earch rch Port

Preconception Care: What Should a Reproductive Healthcare Provider Do? Jody Steinauer, MD, MAS

Module 5 Understanding DIC and How to Apply Topics Covered in This Module Entitlement to

Disclosures Respiratory Hazards of Military Restricted research grants through Service