Augmented Statistical Models: Exploiting Generative Models in - PowerPoint PPT Presentation

Augmented Statistical Models: Exploiting Generative Models in Discriminative Classifiers Martin Layton & Mark Gales 9 December 2005 Cambridge University Engineering Department NIPS 2005

Augmented Statistical Models: Exploiting Generative Models in Discriminative Classifiers Overview • Generative models in discriminative classifiers – Fisher score-space – Generative score-space • Augmented Statistical Models – extension of standard models, e.g. GMMs and HMMs – allows additional dependencies to be represented • Discriminative training – maximum margin – conditional maximum likelihood • TIMIT results Cambridge University NIPS 2005 1 Engineering Department

Augmented Statistical Models: Exploiting Generative Models in Discriminative Classifiers Generative Models in Discriminative Classifiers Cambridge University NIPS 2005 2 Engineering Department

Augmented Statistical Models: Exploiting Generative Models in Discriminative Classifiers The Hidden Markov Model o o o o o 1 2 3 4 T q t q t+1 () () b () b b 2 4 3 1 2 3 4 5 a a a 34 a o t o t+1 12 23 45 a a 33 a 22 44 (a) Standard HMM phone topology (b) HMM Dynamic Bayesian Network • Observations conditionally independent of other observations given state. • States conditionally independent of other states given previous states. • Poor model of the speech process - piecewise constant state-space. Cambridge University NIPS 2005 3 Engineering Department

Augmented Statistical Models: Exploiting Generative Models in Discriminative Classifiers Fisher Score-spaces • Jaakkola & Haussler (1999) • Method of incorporating generative models within a discriminative framework • Define a base generative model ˆ p ( O ; λ ) – 1-dimensional log-likelihood – not enough information for good classification • Instead use a score-space φ F ( O ; λ ) – tangent-space captures essence of generative process � � φ F ( O ; λ ) = ∇ λ ln ˆ p ( O ; λ ) – dimensionality of score-space: parameters λ – suitable for discriminative training (SVMs, etc) – has been applied to many tasks, e.g. comp. biology and speech recognition Cambridge University NIPS 2005 4 Engineering Department

Augmented Statistical Models: Exploiting Generative Models in Discriminative Classifiers Generative Score-spaces • Smith & Gales (2002) • Extension for supervised binary classification tasks p ( O ; λ (1) ) and ˆ p ( O ; λ (2) ) • Define class-conditional base models ˆ – includes log-likelihood ratio to improve discrimination – avoids wrap-around (different O ’s map to the same point in score-space) • Score-space φ LL ( O ; λ )   p ( O ; λ (1) ) − ln ˆ p ( O ; λ (2) ) ln ˆ p ( O ; λ (1) ) φ LL ( O ; λ ) = ∇ λ (1) ln ˆ   p ( O ; λ (2) ) −∇ λ (2) ln ˆ – suitable for discriminative training — SVMs – no probabilistic interpretation – restricted to binary problems Cambridge University NIPS 2005 5 Engineering Department

Augmented Statistical Models: Exploiting Generative Models in Discriminative Classifiers Augmented Statistical Models Cambridge University NIPS 2005 6 Engineering Department

Augmented Statistical Models: Exploiting Generative Models in Discriminative Classifiers Dependency Modelling • Speech data is dynamic — observations are not of a fixed length • Dependency modelling essential part of speech recognition p ( o 1 , . . . , o T ; λ ) = p ( o 1 ; λ ) p ( o 2 | o 1 ; λ ) . . . p ( o T | o 1 , . . . , o T − 1 ; λ ) – impractical to directly model in this form – make extensive use of conditional independence • Two possible forms of conditional independence – latent (unobserved) variables – observed variables • Even if given a set of dependencies (form of Bayesian Network) – need to determine how dependencies interact Cambridge University NIPS 2005 7 Engineering Department

Augmented Statistical Models: Exploiting Generative Models in Discriminative Classifiers Dependency Modelling q t−1 q q q t t+2 t+1 o t−1 o o o t+2 t t+1 • Commonly use a member (or mixture) of the exponential family 1 � � α T T ( O ) p ( O ; α ) = τ ( α ) h ( O ) exp h ( O ) is the reference distribution α are the natural parameters τ is the normalisation term T ( O ) are sufficient statistics • What is the appropriate form of statistics ( T ( O ) )? – for diagram above, T ( O ) = � T − 2 t =1 o t o t +1 o t +2 Cambridge University NIPS 2005 8 Engineering Department

Augmented Statistical Models: Exploiting Generative Models in Discriminative Classifiers Augmented Statistical Models • Augmented statistical models (related to fibre bundles)     ∇ λ ln ˆ p ( O ; λ ) � � 1 ∇ 2 1 2! vec λ ln ˆ p ( O ; λ )      α T p ( O ; λ , α ) = τ ( λ , α )ˆ p ( O ; λ ) exp   .   . .    � � ∇ ρ 1 ρ ! vec λ ln ˆ p ( O ; λ ) • Two sets of parameters: – λ - parameters of base distribution ( ˆ p ( O ; λ ) ) – α - natural parameters of local exponential model • Normalisation term τ ( λ , α ) ensures valid PDF � p ( O ; λ , α ) = ¯ p ( O ; λ , α ) p ( O ; λ , α ) d O = 1; τ ( λ , α ) – can be very difficult to estimate Cambridge University NIPS 2005 9 Engineering Department

Augmented Statistical Models: Exploiting Generative Models in Discriminative Classifiers Example: Augmented GMM p ( o ; λ ) = � M • Use a GMM as the base distribution: ˆ m =1 c m N ( o ; µ m , Σ m ) � M � M p ( o ; λ , α ) = 1 � � n Σ − 1 P ( n | o ; λ ) α T c m N ( o ; µ m , Σ m )exp n ( o − µ n ) τ m =1 n =1 • Simple two component one-dimensional example: 0.20 0.7 0.35 0.18 0.6 0.30 0.16 0.5 0.25 0.14 0.12 0.4 0.20 0.10 0.3 0.15 0.08 0.06 0.2 0.10 0.04 0.1 0.05 0.02 0.00 0.0 0.00 −10 −5 0 5 10 −10 −5 0 5 10 −10 −5 0 5 10 α = [0 . 0 , 0 . 0] T α = [ − 1 . 0 , − 1 . 0] T α = [1 . 0 , − 1 . 0] T Cambridge University NIPS 2005 10 Engineering Department

Augmented Statistical Models: Exploiting Generative Models in Discriminative Classifiers Augmented Model Dependencies • If the base distribution is a latent-variable model — GMM,HMM,... – Sufficient statistics contain a first-order differential T � P ( θ t = { s j , m }| O ; λ ) Σ − 1 ∇ µ jm ln ˆ p ( O ; λ ) = jm ( O t − µ jm ) t =1 – depends on a posterior – compact representation of effects of all observations • Augmented models of this form: – retain independence assumptions of the base model – remove conditional independence assumptions of the base model... ... since the local exponential model depends on a posterior • For HMM base models, – observations are dependent on all observations and all latent states – higher-order derivatives create increasingly powerful models Cambridge University NIPS 2005 11 Engineering Department

Augmented Statistical Models: Exploiting Generative Models in Discriminative Classifiers Discriminative Training Cambridge University NIPS 2005 12 Engineering Department

Augmented Statistical Models: Exploiting Generative Models in Discriminative Classifiers Maximum Margin Estimation • Consider the simplified two-class problem • Bayes’ decision rule (consider λ fixed) ω 1 P ( ω 2 | O ) = P ( ω 1 ) τ ( λ (2) , α (2) ) ¯ p ( O ; λ (1) , α (1) ) P ( ω 1 | O ) > 1 < P ( ω 2 ) τ ( λ (1) , α (1) ) ¯ p ( O ; λ (2) , α (2) ) ω 2 – class priors P ( ω 1 ) and P ( ω 2 ) • Can be rewritten as a linear decision boundary in a generative score-space, ω 1 � ¯ � � P ( ω 1 ) τ ( λ (2) , α (2) ) � p ( O ; λ (1) , α (1) ) 1 + 1 > T ln T ln 0 < p ( O ; λ (2) , α (2) ) P ( ω 2 ) τ ( λ (1) , α (1) ) ¯ ω 2 � �� w T φ LL ( O ; λ ) b – no need to explicitly calculate τ ( λ (1) , α (1) ) or τ ( λ (2) , α (2) ) • Note: restrictions on α ’s required to ensure a valid PDF Cambridge University NIPS 2005 13 Engineering Department

Augmented Statistical Models: Exploiting Generative Models in Discriminative Classifiers Maximum Margin Estimation (cont.) • First-order Generative score-space given by   p ( O ; λ (1) ) − ln ˆ p ( O ; λ (2) ) ln ˆ φ LL ( O ; λ ) = 1 p ( O ; λ (1) ) ∇ λ (1) ln ˆ   T p ( O ; λ (2) ) −∇ λ (2) ln ˆ – independent of augmented parameters α • Linear decision boundary specified by � α (2) T � T w T = α (1) T 1 – only a function of the exponential model parameters α • Bias calculated as a by-product of training — depends on both α and λ • Potentially many parameters to estimate: – maximum margin estimation (MME) good choice — SVM training Cambridge University NIPS 2005 14 Engineering Department

Augmented Statistical Models: Exploiting Generative Models in - PowerPoint PPT Presentation

Augmented Statistical Models: Exploiting Generative Models in Discriminative Classifiers Martin Layton & Mark Gales 9 December 2005 Cambridge University Engineering Department NIPS 2005 Augmented Statistical Models: Exploiting Generative

Network performance requirements of Augmented Reality Systems Mike P. Wittie 1 Augmented

generative design systems Generative Brief Design Definitions Workshop Processes

IMPACT OF AUGMENTED REALITY ON SOCIETY BY DEREK MANDL AND STEPHEN SLADEK WHAT IS AUGMENTED

9.4 Local Perception Filters 9.4 Local Perception Filters Exploiting Exploiting Perceptual

Generative networks part 2: GANs 23 / 54 Recap on generative networks Generative networks provide

CSC421/2516 Lecture 18: Generative Adversarial Networks Roger Grosse and Jimmy Ba Roger Grosse

1/08/2012 Augmented Reality How Does This Technology Fit in the Commercial World? Augmented

Portfolio of Work (9 pages) T H E N E X T R E V O L U T I O N I N R E T A I L AUGMENTED

ubiquitous computing and augmented realities virtual and augmented reality m aking the

AUGMENTED REALITY A complete overview of what augmented reality is and how it will revolutionize

Is Augmented Reality the Future? TJ VanToll (@tjvantoll) Augmented Reality TJ VanToll

Generative Adversarial Nets(GANs) Troy Cary and Chenzhi Zhao A generative adversarial net is

Learning Deep Generative Models Inference & Representation Lecture 12 Rahul G. Krishnan

Deep Generative models for Inverse Problems Alex Dimakis joint work with Ashish Bora, Dave Van

Invertible Generative Models for Inverse Problems Mitigating Representation Error and Dataset Bias

Introduction to Generative Models (and GANs) Haoqiang Fan fhq@megvii.com Nov. 2017 Figures

Building a digital business Kalman Tiboldi Founder & CEO ( Former CBIO of TVH) TVH Group: 2

FOCUS ON SATISFYING CUSTOMER NEEDS PROFITABLY MARKETING MIX Marketing Mix is the set of

Chapter 10.3 Counting walks in directed graphs Prof. Tesler Math 184A Winter 2017 Prof. Tesler

Constructed, Augmented MaxDiff Method and Case Study Chris Chapman , Principal Researcher,

Foundations of Property Introduction to the property industry Tanya Steinbeck, CEO, UDIA WA

Building Support for and Facilitating Change February 29, 2016 LS.1.6_Building

1. Why transparency? Results 1 - 100 of about 2,430,000 for transparency democracy No

Managers and Leaders: Are They Different? by Abraham Zaleznik FROM THE JANUARY 2004 ISSUE The

Augmented Statistical Models: Exploiting Generative Models in - PowerPoint PPT Presentation

Augmented Statistical Models: Exploiting Generative Models in Discriminative Classifiers Martin Layton & Mark Gales 9 December 2005 Cambridge University Engineering Department NIPS 2005 Augmented Statistical Models: Exploiting Generative

Network performance requirements of Augmented Reality Systems Mike P. Wittie 1 Augmented

generative design systems Generative Brief Design Definitions Workshop Processes

IMPACT OF AUGMENTED REALITY ON SOCIETY BY DEREK MANDL AND STEPHEN SLADEK WHAT IS AUGMENTED

9.4 Local Perception Filters 9.4 Local Perception Filters Exploiting Exploiting Perceptual

Generative networks part 2: GANs 23 / 54 Recap on generative networks Generative networks provide

CSC421/2516 Lecture 18: Generative Adversarial Networks Roger Grosse and Jimmy Ba Roger Grosse

1/08/2012 Augmented Reality How Does This Technology Fit in the Commercial World? Augmented

Portfolio of Work (9 pages) T H E N E X T R E V O L U T I O N I N R E T A I L AUGMENTED

ubiquitous computing and augmented realities virtual and augmented reality m aking the

AUGMENTED REALITY A complete overview of what augmented reality is and how it will revolutionize

Is Augmented Reality the Future? TJ VanToll (@tjvantoll) Augmented Reality TJ VanToll

Generative Adversarial Nets(GANs) Troy Cary and Chenzhi Zhao A generative adversarial net is

Learning Deep Generative Models Inference &amp; Representation Lecture 12 Rahul G. Krishnan

Deep Generative models for Inverse Problems Alex Dimakis joint work with Ashish Bora, Dave Van

Invertible Generative Models for Inverse Problems Mitigating Representation Error and Dataset Bias

Introduction to Generative Models (and GANs) Haoqiang Fan fhq@megvii.com Nov. 2017 Figures

Building a digital business Kalman Tiboldi Founder &amp; CEO ( Former CBIO of TVH) TVH Group: 2

FOCUS ON SATISFYING CUSTOMER NEEDS PROFITABLY MARKETING MIX Marketing Mix is the set of

Chapter 10.3 Counting walks in directed graphs Prof. Tesler Math 184A Winter 2017 Prof. Tesler

Constructed, Augmented MaxDiff Method and Case Study Chris Chapman , Principal Researcher,

Foundations of Property Introduction to the property industry Tanya Steinbeck, CEO, UDIA WA

Building Support for and Facilitating Change February 29, 2016 LS.1.6_Building

1. Why transparency? Results 1 - 100 of about 2,430,000 for transparency democracy No

Managers and Leaders: Are They Different? by Abraham Zaleznik FROM THE JANUARY 2004 ISSUE The

Learning Deep Generative Models Inference & Representation Lecture 12 Rahul G. Krishnan

Building a digital business Kalman Tiboldi Founder & CEO ( Former CBIO of TVH) TVH Group: 2