A Meta-Transfer Objective for Learning to Disentangle Causal - PowerPoint PPT Presentation

A Meta-Transfer Objective for Learning to Disentangle Causal Mechanisms Behrad Moniri Mahdiyar Shahbazi Department of Electrical Engineering Sharif University of Technology December 30, 2019 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Behrad Moniri - Mahdiyar Shahbazi Transfer Learning and Causal Inference December 30, 2019 1 / 23

Accepted for ICLR 2020 Code available on Github . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Behrad Moniri - Mahdiyar Shahbazi Transfer Learning and Causal Inference December 30, 2019 2 / 23

Introduction Idea 1 What are the right representations? Causal variables explaining the data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Behrad Moniri - Mahdiyar Shahbazi Transfer Learning and Causal Inference December 30, 2019 3 / 23

Introduction Idea 1 What are the right representations? Causal variables explaining the data Idea 2 How to modularize knowledge for easier re-use & adaptation, good transfer? How to disentangle the unobserved explanatory variables? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Behrad Moniri - Mahdiyar Shahbazi Transfer Learning and Causal Inference December 30, 2019 3 / 23

Hypotheses about how the environment changes Main Assumptions: Changing one mechanism does not change the others (Peters, Janzig & Scholkopf 2017) Non-stationarities, changes in distribution, involve few mechanisms (e.g. the result of a single-variable intervention) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Behrad Moniri - Mahdiyar Shahbazi Transfer Learning and Causal Inference December 30, 2019 4 / 23

Claims Under the hypothesis of independent mechanisms and small changes across different distributions: smaller sample complexity to recover from a distribution change E.g. for transfer learning, agent learning, domain adaptation, etc. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Behrad Moniri - Mahdiyar Shahbazi Transfer Learning and Causal Inference December 30, 2019 5 / 23

Learning a Causal Graph with two Discrete Variables If we have the right knowledge representation, then we should get fast adaptation to the transfer distribution when starting from a model that is well trained on the training distribution Core Idea: A ”Regret” function based on the speed of adaptation. However it is clear to us that much more work will be needed to evaluate the proposed approach in a diversity of settings and with different specific parametrizations, training objectives, environments, etc. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Behrad Moniri - Mahdiyar Shahbazi Transfer Learning and Causal Inference December 30, 2019 6 / 23

Let both A and B be discrete variables each taking N possible values and consider the following two parametrizations P A → B ( A , B ) = P A → B ( A ) P A → B ( B | A ) P B → A ( A , B ) = P B → A ( B ) P B → A ( A | B ) This amounts to four modules: P A → B ( A ), P A → B ( B | A ), P B → A ( B ) and P B → A ( A | B ). We will train both models independently. Maximum likelihood estimation of these parameters: normalized relative frequencies. θ : parameters of all these models θ A | B , θ B | A , θ B , θ A . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Behrad Moniri - Mahdiyar Shahbazi Transfer Learning and Causal Inference December 30, 2019 7 / 23

θ i = P A → B ( A = i ) θ j | i = P A → B ( B = j | A = i ) η j = P B → A ( B = j ) η i | j = P B → A ( A = i | B = j ) . ˆ ˆ θ i = n i / n θ j | i = n ij / n i η j = n j / n ˆ η i | j = n ij / n j ˆ We can now compute the likelihood for each model: P A → B ( A , B ) = ˆ ˆ θ i ˆ θ j | i = n ij / n ˆ P B → A ( A , B ) = ˆ η j ˆ η i | j = n ij / n Which direction can adapt faster? Answer: The causal direction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Behrad Moniri - Mahdiyar Shahbazi Transfer Learning and Causal Inference December 30, 2019 8 / 23

Simulation 4.2 4.4 ) 4.6 log P ( D 4.8 A B 5.0 B A 0 100 200 300 400 Number of examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Behrad Moniri - Mahdiyar Shahbazi Transfer Learning and Causal Inference December 30, 2019 9 / 23

Proposition The expected gradient over the transfer distribution of the regret (accumulated negative log-likelihood during the adaptation episode) with respect to the module parameters is zero for the parameters of the modules that (a) were correctly learned in the training phase, and (b) have the correct set of causal parents, corresponding to the ground truth causal graph, if (c) the corresponding ground truth conditional distributions did not change from the training distribution to the transfer distribution. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Behrad Moniri - Mahdiyar Shahbazi Transfer Learning and Causal Inference December 30, 2019 10 / 23

As a consequence, the effective number of parameters that need to be adapted, when one has the correct causal graph structure, is reduced to those of the mechanisms that actually changed from the training to the transfer distribution. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Behrad Moniri - Mahdiyar Shahbazi Transfer Learning and Causal Inference December 30, 2019 11 / 23

Proposition Consider conditional probability modules P θ i ( V i | pa ( i , V , B i )) where B ij = 1 indicates that V j is among the parents pa ( i , V , B i ) of V i in a directed acyclic causal graph. Consider ground truth training distribution P 1 and transfer distribution P 2 over these variables, and ground truth causal structure B. The joint log-likelihood L ( V ) for a sample V with respect to the module parameters θ decomposed into module parameters θ i is L ( V ) = ∑ i log P θ i ( V i | pa ( i , V , B i )) . If (a) a model has the correct causal structure B, and (b) it been trained perfectly on P 1 , leading to estimated parameters θ , and (c) the ground truth P 1 and P 2 only differ from each other only for some P ( V i | pa ( i , V , B i )) for i ∈ C, then E V ∼ P 2 [ ∂ L ( V ) ∂θ i ] = 0 for i / ∈ C. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Behrad Moniri - Mahdiyar Shahbazi Transfer Learning and Causal Inference December 30, 2019 12 / 23

Bi-Variate Example The transfer distribution only changed the true P ( A ) (the cause) For the correct model only N − 1 parameters need to be re-estimated. In the backward model, all N ( N − 1) + ( N − 1) = N 2 − 1 parameters must be re-estimated. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Behrad Moniri - Mahdiyar Shahbazi Transfer Learning and Causal Inference December 30, 2019 13 / 23

More than two parameters We won’t be able to enumerate all DAGs and pick the best one after observing episodes of adaptation. We can parameterize our belief about an exponentially large set of hypotheses by keeping track of the probability for each directed edge of the graph to be present . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Behrad Moniri - Mahdiyar Shahbazi Transfer Learning and Causal Inference December 30, 2019 14 / 23

Formulization Modeling edges ∏ B ij ∼ Bernoulli ( p ij ) , P ( B ) = P ( B ij ) . ij The parents of V i , given B , as the set of V j ’s such that B ij = 1: pa ( i , V , B i ) = { V j | B ij = 1 , j ̸ = i } The Structural Causal Model: V i = f i ( θ i , B i , V , N i ) N i is an independent noise source to generate V i f i parametrizes the generator (active if B ij = 1) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Behrad Moniri - Mahdiyar Shahbazi Transfer Learning and Causal Inference December 30, 2019 15 / 23

The conditional likelihood P B i ( V i = v ti | pa ( i , v t , B i )) measures how well the model that uses the incoming edges B i for node i performs for example v t . ∏ L B i = P B i ( V i = v ti | pa ( i , v t , B i )) . (1) t The overall exponentiated regret for the given graph structure B is ∏ L B = L B i i for the generalized multi-variable case R = − log E B [ L B ] (2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Behrad Moniri - Mahdiyar Shahbazi Transfer Learning and Causal Inference December 30, 2019 16 / 23

A Meta-Transfer Objective for Learning to Disentangle Causal - PowerPoint PPT Presentation

A Meta-Transfer Objective for Learning to Disentangle Causal Mechanisms Behrad Moniri Mahdiyar Shahbazi Department of Electrical Engineering Sharif University of Technology December 30, 2019 . . . . . . . . . . . . . . . . .

Meta- Meta -Programming with Programming with Modelica Modelica for Meta- for Meta

Meta-transfer Learning for Few-shot Learning Yaoyao Liu Tianjin University and NUS School of

META Seal of Recognition and META Prize Award Ceremony Georg Rehm (DFKI) on behalf of the

Bayesian Model-Agnostic Meta-Learning Taesup Kim* (presenter), Jaesik Yoon* Ousmane Dia,

The Meta-Learning Problem & Black-Box Meta-Learning CS 330 Logistics Homework 1 posted today,

Meta Learning Shengchao Liu Background Meta Learning (AKA Learning to Learn) A

Industrial Transfer Learning Introduction to Industrial Transfer Learning Industrial Transfer

Radiative Transfer Radiative Transfer Radiative transfer is a branch of atmospheric physics. We

Acute dyspnea: how to disentangle COPD & Acute Heart Failure Professor Christian Mueller

Disentangle Secure-Enclave Hardware from Software Andrew Ferraiuolo, Andrew Baumann, Chris

CONSTRUCTING LABELS FOR TRIAGE: The EUs struggle to disentangle mixed migration Migration,

STEPS IN TAKING SELF STEPS IN TAKING SELF RESPONSIBILITY RESPONSIBILITY Using CORE WORK to

A few meta learning papers Guy Gur-Ari Machine Learning Journal Club, September 2017 Meta

MetaFun: Meta-Learning with Iterative Functional Updates Jin Xu, Jean-Francois Ton, Hyunjik Kim,

Intelligent Tutoring Systems: A Meta-Analysis Meta-Analysis Wenting Ma March, 2011

Company profile Capabilities Customers & References META-LRA Kft. 8400 Ajka,

Multifaced noncommutative stochastic independence Malte Gerhold University of Greifswald 22 May

" Interest Free Wage Linkage of Personal Loans and Mortgages R.B. Yehezkael (formerly

REDUCING INEQUALITIES IN THE EARLY YEARS A RAPID EVIDENCE REVIEW TO INFORM LEAPS NEXT FIVE v

Orient ntatio ion W Webina inar Fri riday, No November 1 r 17 2: 2:00 00-3: 3:30 E 30

Is it a Young Earth After All? Two Foundational Introductory Matters 1. The matter of authority:

Algorithmic Problems in Network Economics Subhash Suri UC Santa Barbara SoCal NEGT Symposium,

Nano-containers and nano-scaffolds Katalin Kamars Institute for Solid State Physics and Optics,

Linking Linking the the Digital Ag Digital Agend enda to r to rur ural al and spar and

A Meta-Transfer Objective for Learning to Disentangle Causal - PowerPoint PPT Presentation

A Meta-Transfer Objective for Learning to Disentangle Causal Mechanisms Behrad Moniri Mahdiyar Shahbazi Department of Electrical Engineering Sharif University of Technology December 30, 2019 . . . . . . . . . . . . . . . . .

Meta- Meta -Programming with Programming with Modelica Modelica for Meta- for Meta

Meta-transfer Learning for Few-shot Learning Yaoyao Liu Tianjin University and NUS School of

META Seal of Recognition and META Prize Award Ceremony Georg Rehm (DFKI) on behalf of the

Bayesian Model-Agnostic Meta-Learning Taesup Kim* (presenter), Jaesik Yoon* Ousmane Dia,

The Meta-Learning Problem &amp; Black-Box Meta-Learning CS 330 Logistics Homework 1 posted today,

Meta Learning Shengchao Liu Background Meta Learning (AKA Learning to Learn) A

Industrial Transfer Learning Introduction to Industrial Transfer Learning Industrial Transfer

Radiative Transfer Radiative Transfer Radiative transfer is a branch of atmospheric physics. We

Acute dyspnea: how to disentangle COPD &amp; Acute Heart Failure Professor Christian Mueller

Disentangle Secure-Enclave Hardware from Software Andrew Ferraiuolo, Andrew Baumann, Chris

CONSTRUCTING LABELS FOR TRIAGE: The EUs struggle to disentangle mixed migration Migration,

STEPS IN TAKING SELF STEPS IN TAKING SELF RESPONSIBILITY RESPONSIBILITY Using CORE WORK to

A few meta learning papers Guy Gur-Ari Machine Learning Journal Club, September 2017 Meta

MetaFun: Meta-Learning with Iterative Functional Updates Jin Xu, Jean-Francois Ton, Hyunjik Kim,

Intelligent Tutoring Systems: A Meta-Analysis Meta-Analysis Wenting Ma March, 2011

Company profile Capabilities Customers &amp; References META-LRA Kft. 8400 Ajka,

Multifaced noncommutative stochastic independence Malte Gerhold University of Greifswald 22 May

&quot; Interest Free Wage Linkage of Personal Loans and Mortgages R.B. Yehezkael (formerly

REDUCING INEQUALITIES IN THE EARLY YEARS A RAPID EVIDENCE REVIEW TO INFORM LEAPS NEXT FIVE v

Orient ntatio ion W Webina inar Fri riday, No November 1 r 17 2: 2:00 00-3: 3:30 E 30

Is it a Young Earth After All? Two Foundational Introductory Matters 1. The matter of authority:

Algorithmic Problems in Network Economics Subhash Suri UC Santa Barbara SoCal NEGT Symposium,

Nano-containers and nano-scaffolds Katalin Kamars Institute for Solid State Physics and Optics,

Linking Linking the the Digital Ag Digital Agend enda to r to rur ural al and spar and

The Meta-Learning Problem & Black-Box Meta-Learning CS 330 Logistics Homework 1 posted today,

Acute dyspnea: how to disentangle COPD & Acute Heart Failure Professor Christian Mueller

Company profile Capabilities Customers & References META-LRA Kft. 8400 Ajka,

" Interest Free Wage Linkage of Personal Loans and Mortgages R.B. Yehezkael (formerly