CSCE 970 Lecture 8: Prediction Stephen Scott Structured Prediction - PowerPoint PPT Presentation

CSCE 970 Lecture 8: Structured CSCE 970 Lecture 8: Prediction Stephen Scott Structured Prediction and Vinod Variyam Introduction Definitions Stephen Scott and Vinod Variyam Applications Graphical Models (Adapted from Sebastian Nowozin and Christoph H. Lampert) Training sscott@cse.unl.edu 1 / 80

Introduction Out with the old ... CSCE 970 Lecture 8: Structured We now know how to answer the question: Prediction Does this picture contain a cat? Stephen Scott and Vinod Variyam Introduction Definitions Applications Graphical Models Training E.g., convolutional layers feeding connected layers feeding softmax 2 / 80

Introduction ... and in with the new. CSCE 970 Lecture 8: What we want to know now is: Where are the cats? Structured Prediction Stephen Scott and Vinod Variyam Introduction Definitions Applications Graphical Models Training No longer a classification problem; need more sophisticated ( structured ) output 3 / 80

Outline CSCE 970 Lecture 8: Structured Prediction Stephen Scott and Vinod Variyam Definitions Introduction Applications Definitions Graphical modeling of probability distributions Applications Graphical Training models Models Training Inference 4 / 80

Definitions Structured Outputs CSCE 970 Lecture 8: Structured Prediction Stephen Scott Most machine learning approaches learn function and Vinod Variyam f : X → R Inputs X are any kind of objects Introduction Output y is a real number (classification, regression, Definitions density estimation, etc.) Applications Structured output learning approaches learn function Graphical Models f : X → Y Training Inputs X are any kind of objects Outputs y ∈ Y are complex (structured) objects (images, text, audio, etc.) 5 / 80

Definitions Structured Outputs (2) CSCE 970 Lecture 8: Structured Prediction Stephen Scott Can think of structured data as consisting of parts, where and Vinod Variyam each part contains information, as well as how they fit together Introduction Definitions Text: Word sequence matters Applications Hypertext: Links between documents matter Graphical Models Chemical structures: Relative positions of molecules Training matter Images: Relative positions of pixels matter 6 / 80

Applications Image Processing CSCE 970 Lecture 8: Structured Prediction Stephen Scott and Vinod { 0 ,..., 255 } 3 ( m × n ) { 0 , 1 } m × n Variyam z }| { z }| { Semantic image segmentation: f : { images } { masks } → Introduction Definitions Applications Graphical Models Training 7 / 80

Applications Image Processing (2) CSCE 970 Lecture 8: Structured Prediction Stephen Scott and Vinod { 0 ,..., 255 } 3 ( m × n ) R 3 K Variyam z }| { z }| { { images } { K positions & angles } Pose estimation: f : → Introduction Definitions Applications Graphical Models Training 8 / 80

Applications Image Processing (3) CSCE 970 Point matching: Lecture 8: f : { image pairs } → { mappings between images } Structured Prediction Stephen Scott and Vinod Variyam Introduction Definitions Applications Graphical Models Training 9 / 80

Applications Image Processing (4) CSCE 970 Lecture 8: Structured Prediction Stephen Scott and Vinod Variyam Object localization f : { images } → { bounding box coordinates } Introduction Definitions Applications Graphical Models Training 10 / 80

Applications Others CSCE 970 Lecture 8: Structured Prediction Stephen Scott Natural language processing (e.g., translation; output is and Vinod Variyam sentences) Introduction Bioinformatics (e.g., structure prediction; output is Definitions graphs) Applications Speech processing (e.g., recognition; output is Graphical Models sentences) Training Robotics (e.g., planning; output is action plan) Image denoising (output is “clean” version of image) 11 / 80

Graphical Models Probabilistic Modeling CSCE 970 Lecture 8: Structured Prediction Stephen Scott and Vinod Variyam To represent structured outputs, we will often employ probabilistic modeling Introduction Joint distributions (e.g., P ( A , B , C ) ) Definitions Conditional distributions (e.g., P ( A | B , C ) ) Applications Graphical Can estimate joint and conditional probabilities by Models counting and normalizing, but have to be careful about Directed Undirected representation Energy Separation Training 12 / 80

Graphical Models Probabilistic Modeling (2) CSCE 970 Lecture 8: E.g., I have a coin with unknown probability p of heads Structured Prediction I want to estimate the probability of flipping it ten times Stephen Scott and Vinod and getting the sequence HHTTHHTTTT Variyam One way of representing this joint distribution is a Introduction single, big lookup table: Definitions Applications Each experiment consists of Graphical ten coin flips Outcome Count Models Directed 1 For each outcome, increment TTHHTTHHTH Undirected Energy 0 its counter HHHTHTTTHH Separation 0 Training HTTTTTHHHT After n experiments, divide 1 TTHTHTHHTT HHTTHHTTTT ’s counter by n to . . . . get the estimate . . Will this work? 13 / 80

Graphical Models Probabilistic Modeling (3) CSCE 970 Lecture 8: Structured Prediction Stephen Scott and Vinod Variyam Problem: Number of possible outcomes grows exponentially with number of variables (flips) Introduction ⇒ Most outcomes will have count = 0 , a few with 1, Definitions probably none with more Applications ⇒ Lousy probability estimates Graphical Models Ten flips is bad enough, but consider 100 .. _ Directed Undirected How would you solve this problem? Energy Separation Training 14 / 80

Graphical Models Factoring a Distribution CSCE 970 Lecture 8: Structured Prediction Of course, we recognize that all flips are independent, Stephen Scott and Vinod so Variyam Pr [ HHTTHHTTTT ] = p 4 ( 1 − p ) 6 Introduction So we can count n coin flips to estimate p and use the Definitions formula above Applications I.e., we factor the joint distribution into independent Graphical Models components and multiply the results: Directed Undirected Energy Pr [ HHTTHHTTTT ] = Pr [ f 1 = H ] Pr [ f 2 = H ] Pr [ f 3 = T ] · · · Pr [ f 10 = T ] Separation Training We greatly reduce the number of parameters to estimate 15 / 80

Graphical Models Factoring a Distribution (2) CSCE 970 Lecture 8: Structured Prediction Stephen Scott Another example: Relay racing team and Vinod Variyam Alice, then Bob, then Carol Introduction Let t A = Alice’s finish time (in seconds), t B = Bob’s, Definitions t C = Carol’s Applications Want to model the joint distribution Pr [ t A , t B , t C ] Graphical Models Let t C , t B , t A ∈ { 1 , . . . , 1000 } Directed Undirected Energy How large would the table be for Pr [ t A , t B , t C ] ? Separation How many races must they run to populate the table? Training 16 / 80

Graphical Models Factoring a Distribution (3) CSCE 970 Lecture 8: Structured Prediction But we can factor this distribution by observing that t A is Stephen Scott independent of t B and t C and Vinod Variyam ⇒ Can estimate t A on its own Also, t B directly depends on t A , but is independent of t C Introduction Definitions t C directly depends on t B , and indirectly on t A Applications Can display this graphically: Graphical Models Directed Undirected Energy Separation Training 17 / 80

Graphical Models Factoring a Distribution (4) CSCE 970 Lecture 8: Structured Prediction Stephen Scott and Vinod Variyam Introduction Definitions Applications This directed graphical model (often called a Graphical Models Bayesian network or Bayes net ) represents Directed conditional dependencies among variables Undirected Energy Separation Makes factoring easy: Training Pr [ t A , t B , t C ] = Pr [ t A ] Pr [ t B | t A ] Pr [ t C | t B ] 18 / 80

Graphical Models Factoring a Distribution (5) CSCE 970 Lecture 8: Structured Prediction Pr [ t A , t B , t C ] = Pr [ t A ] Pr [ t B | t A ] Pr [ t C | t B ] Stephen Scott and Vinod Variyam Introduction Table for Pr [ t A ] requires 1 1000 entries, while Pr [ t B | t A ] Definitions requires 10 6 , as does Pr [ t C | t B ] Applications ⇒ Total 2 . 001 × 10 6 , versus 10 9 Graphical Models Idea easily extends to continuous distributions by Directed Undirected changing discrete probability Pr [ · ] to pdf p ( · ) Energy Separation Training 1 Technically, we only need 999 entries, since the value of the last one is implied since probabilities must sum to one. However, then the analysis requires the use of a lot of “9”s, and that’s not something I’m willing to take on at this point in my life. 19 / 80

Directed Models Conditional Independence CSCE 970 Lecture 8: Definition : X is conditionally independent of Y given Z if Structured Prediction the probability distribution governing X is independent of the Stephen Scott value of Y given the value of Z ; that is, if and Vinod Variyam ( ∀ x i , y j , z k ) Pr [ X = x i | Y = y j , Z = z k ] = Pr [ X = x i | Z = z k ] Introduction Definitions more compactly, we write Applications Graphical Pr [ X | Y , Z ] = Pr [ X | Z ] Models Directed Undirected Example: Thunder is conditionally independent of Rain , Energy Separation given Lightning Training Pr [ Thunder | Rain , Lightning ] = Pr [ Thunder | Lightning ] 20 / 80

CSCE 970 Lecture 8: Prediction Stephen Scott Structured Prediction - PowerPoint PPT Presentation

CSCE 970 Lecture 8: Structured CSCE 970 Lecture 8: Prediction Stephen Scott Structured Prediction and Vinod Variyam Introduction Definitions Stephen Scott and Vinod Variyam Applications Graphical Models (Adapted from Sebastian Nowozin

Introduction CSCE 970 CSCE 970 Lecture 3: Lecture 3: Regularization Regularization CSCE 970

Introduction Out with the old ... CSCE 970 CSCE 970 Lecture 8: Lecture 8: Structured

Introduction CSCE 970 CSCE 970 Lecture 4: Lecture 4: Convolutional Convolutional Neural

CSCE 625: Artificial Intelligence Dr. Dylan Shell 1 Shell CSCE 625 TAMU 2 Shell CSCE 625 TAMU

Why Are We Here? CSCE CSCE 496/896 496/896 Lecture 10: Lecture 10: CSCE 496/896 Lecture 10:

CSCE 625: Artificial Intelligence Dr. Dylan Shell 1 Shell CSCE 625 TAMU CSCE 625: Artificial

Introduction CSCE CSCE 496/896 496/896 Lecture 6: Lecture 6: Recurrent Recurrent CSCE

Introduction CSCE CSCE 471/871 471/871 Lecture 6: Lecture 6: Multiple Multiple CSCE

Outline CSCE CSCE 471/871 471/871 Lecture 5: Lecture 5: Building Building CSCE 471/871

Class Overview 1 Shell CSCE 314 TAMU CSCE 314: Programming Languages Course Homepage:

Introduction CSCE CSCE 496/896 496/896 Lecture 7: Lecture 7: Reinforcement Reinforcement

Introduction CSCE CSCE 496/896 496/896 Lecture 9: Lecture 9: word2vec and word2vec and To

Introduction Supervised Learning CSCE CSCE 496/896 496/896 Lecture 2: Lecture 2: Basic

Introduction CSCE CSCE 479/879 479/879 Good for data with a grid-like topology Lecture 4:

Introduction CSCE CSCE In Homework 1, you are (supposedly) 478/878 478/878 Lecture 4:

Introduction CSCE CSCE Sometimes a single classifier (e.g., neural network, 478/878 478/878

t.IPCyilxi.BE o o.E iTNCyilBotp Xi NCpl0 o2I P43 Here dumb 13 is to estimate a algorithm

UPDATE ON Screening for unruptured aneurysms INTRACRANIAL Who to screen? ANEURYSM

Comparison of Cryptographic Verification Tools Dealing with Algebraic Properties Pascal LAFOURCADE

An Exploration of Group and Ring Signatures Sarah Meiklejohn ! ! ! ! UC San Diego Research

Dynamical generation of decoherence: Universal scaling of decoherence factors Amit Dutta

A statistical Bayesian framework for the identification of biological networks from perturbation

Sperner, Tucker and Ky Fans lemmas for manifolds Oleg R. Musin University of Texas at

Random planar maps, alternating knots and links Gilles Sc haeer CNRS Sbastien