The Regional Dimension A Bayesian Network Analysis Marco Scutari - PowerPoint PPT Presentation

The Regional Dimension A Bayesian Network Analysis Marco Scutari scutari@idsia.ch Dalle Molle Institute for Artificial Intelligence (IDSIA) December 19, 2019

Overview Bayesian networks: To showcase a proof-of-concept model we will use a sample of SDG indicators from a group of African countries and a group of Asian countries. • Network analysis: graphs and arcs. • Arcs and correlation. • Arcs and causality. • Network analysis and linear regression. • Model selection. • Parameter estimation. • Sensitivity analysis.

The General Idea A network analysis is based on the idea that: that each other. Hence nodes and variables are referred to interchangeably, as well as arcs and correlations (associations more in general). The conceptual steps are: 1. identify the variables that are the quantities of interest and draw one node for each of them; 2. collect data on them, gathering a sample of observations; 3. measure whether they are significantly correlated using the data and draw an arc between each such pair. • quantities of interest can be associated to the nodes of a graph; and • we can use arcs to represent which variables are correlated with

Data, Variables and Networks 0.71 1. identify the variables; -1.23 -1.01 0.13 -1.15 0.22 0.34 0.85 0.75 1.17 -0.24 -0.70 0.29 ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ 3. measure correlations and draw the arcs. -0.31 -0.16 -1.26 0.37 2. collect data; N1 N2 N3 N4 N5 N6 0.19 0.67 0.11 1.56 -0.44 -1.34 -0.77 -0.38 1.28 0.88 -0.52 -0.19 -0.77 N2 N1 N2 N3 N4 N5 N6 N6 N5 N3 N1 N4

Arcs and Correlations How do we interpret arcs? correlated: changes in one node suggest changes in the nodes that are directly connected with it. by the other nodes that are in between. arcs that allows to reach one node from the other, changes in one node will have no efgect at all on the other node. This difgerence is key to understanding network analysis: any action that afgects one node will produce efgects that can propagate to directly connected nodes, and from there to nodes that are indirectly connected; but the changes thus produced will become smaller and smaller the farther two nodes are. • Nodes that are directly connected with each other are directly • For nodes that are only indirectly connected, changes in are mediated • If two nodes are not connected at all, that is, there is no sequence of

Arcs and Causality We can give a causal interpretation to arcs, in addition to the statistical interpretation in terms of correlation. Causation implies correlation: a change in the variable that is identified as the cause may induce a change in the variable that is identified as the efgect, and this implies that the two variables are correlated. The additional information the arc direction provides is that inducing a change in the variable identified as the efgect does not imply a change in the variable identified as the cause. This is not the same as saying that the node identified as the efgect provides no information on the node identified as the cause; difgerent efgects can be attributed to difgerent causes.

Arcs and Causality: an Example Disease 1 Disease 2 Symptom 1 Symptom 2 Symptom 3 Symptom 4

Arcs and Causality: an Example When a patient requests a visit, the medical doctor will observe various symptoms to diagnose one of several possible diseases. cause the latter. possible for the medical doctor to decide which disease the patient most likely has; or to present a prognosis with a likely set of symptoms for a given disease. disease itself; and symptoms will resurface afuer the therapy stops. eliminate the symptoms. • Arcs should point from diseases to symptoms since the former • Diseases and symptoms are correlated, and that is what makes it • Prescribing a therapy that only cures the symptoms will not cure the • Directly curing the disease with the appropriate therapy will also

Difficulties in Getting Reliable Causal Networks Disease 2 Disease 2 Symptom 2 Symptom 2 Symptom 3 Symptom 3

Difficulties in Getting Reliable Causal Networks The ability of network analysis to correctly link nodes with arcs requires that all relevant variables are observed and included in the model. If that is not the case, arcs may actually represent indirect correlations (as opposed to direct correlations) or even spurious correlations, and their causal interpretation is more difgicult to defend. This is a difgicult assumption to defend in any analysis involving socio-economic data such as SDGs. Even when a large amount of data are available and all relevant variables node is the cause and which is the efgect just using data. are included in the network it is not always possible to identify which

Fundamental Connections divergent connection convergent connection serial connection N1 N1 N1 N2 N2 N2 N3 N3 N3

Fundamental Connections We can identify some arc directions just from the data. The serial and divergent connections describe the same statistical distribution since P ( N1 ) P ( N2 ∣ N1 ) P ( N3 ∣ N2 ) ⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟ N1 → N2 → N3 = P ( N2 , N1 ) P ( N3 ∣ N2 ) = P ( N1 ∣ N2 ) P ( N2 ) P ( N3 ∣ N2 ) ⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟ N1 ← N2 → N3 . makes if possible to identify it when it fits the data best, and thus assign interpretation or not. but convergent distribution is not equivalent to the other two. This directions to most arcs whether we are giving them a causal

Bayesian Networks Formally, we call this class of models a Bayesian network (BN). It comprises: direction and there are no cycles; which in turn is associated with one node in the graph; given the variables that correspond to the parent nodes in the graph. The key advantages of BNs are: variable; • a directed acyclic graph, that is, a graph in which all arcs have a • there is one probability distribution associated with each variable, • and each distribution is the conditional distribution of the variable • they decompose large models into simpler ones, one for each • the graph makes it easy to reason qualitatively about the model.

Bayesian Networks P ( N2 ) ⋅ univariate since it only contains Each of these distributions is P ( N6 ∣ N3 , N5 ) P ( N5 ∣ N4 ) ⋅ P ( N4 ∣ N1 ) ⋅ P ( N3 ) ⋅ = P ( N1 ) ⋅ P ( N1 , N2 , N2 , N3 , N4 , N5 , N6 ) one dependent variable. N2 N6 N5 N3 N1 N4

Distributions and Linear Models In the case of continuous variables, we model each node with a linear distribution with mean zero. residuals. This can be done with a textbook ordinary least squares regression, individually for each node. which we need to learn from the data as well since we do not know which arcs it contains. regression in which: • the node is the response variable; • the parents of the nodes are the explanatory variables; • there is an error term which contains errors that follow a normal Hence, the parameters of each linear regression are the regression coefgicients associated with each parent and the standard error of the All parameters are unknown and thus must be estimated from the data. The only quantity we need to do that, apart from the data, is the graph,

Distributions and Linear Models N1 ) N5 ) N4 ) N3 ) Before we saw that the BN induces the decomposition N2 ) P ( N1 ) P ( N2 ) P ( N3 ) P ( N4 ∣ N1 ) P ( N5 ∣ N4 ) P ( N6 ∣ N3 , N5 ) regressions: P ( N1 , N2 , N2 , N3 , N4 , N5 , N6 ) = N6 ) for the nodes; and the distribution for the individual nodes are the linear P ( N1 ) ∶ N1 = 𝜈 N1 + 𝜁 N1 ∼ 𝑂(0, 𝜏 2 P ( N2 ) ∶ N2 = 𝜈 N2 + 𝜁 N2 ∼ 𝑂(0, 𝜏 2 P ( N3 ) ∶ N3 = 𝜈 N3 + 𝜁 N3 ∼ 𝑂(0, 𝜏 2 P ( N4 ∣ N1 ) ∶ N4 = 𝜈 N4 + N1 𝛾 N1 + 𝜁 N4 ∼ 𝑂(0, 𝜏 2 P ( N5 ∣ N4 ) ∶ N5 = 𝜈 N5 + N4 𝛾 N4 + 𝜁 N5 ∼ 𝑂(0, 𝜏 2 P ( N6 ∣ N3 , N5 ) ∶ N6 = 𝜈 N6 + N3 𝛾 N3 + N5 𝛾 N5 + 𝜁 N6 ∼ 𝑂(0, 𝜏 2

Model Selection and Estimation The process of estimating such model is called *learning*, and consists in two steps: 1. structure learning: learning which arcs are present in the graph, that is, which nodes are statistically significant regressors for which other nodes; 2. parameter learning: learning the parameters that regulate the efgect sizes of those dependencies, that is, the regression coefgicients associated with the parents of each node. The former corresponds to model selection, and the latter to model literature from linear regression models. We will showcase how to learn a BN along those lines using, as a proof of concept, as set of indicator for SDGs. estimation. Both can be carried out with standard methods from classic

SDG Indicators, 2012–2014 The indicators have been recorded on two separate sets of 30 African countries and 26 Asian countries: d’Ivoire, Cabo Verde, Cameroon, Congo, Eswatini, Ethiopia, Ghana, Kenya, Lesotho, Madagascar, Malawi, Mali, Mauritius, Mozambique, Namibia, Niger, Nigeria, Rwanda, Senegal, Seychelles, Sierra Leone, South Africa, Togo, Uganda, United Republic of Tanzania, Zambia. Bhutan, Cambodia, China, Georgia, India, Indonesia, Iran (Islamic Republic of), Kazakhstan, Kyrgyzstan, Lao People’s Democratic Republic, Malaysia, Maldives, Mongolia, Myanmar, Nepal, Pakistan, Philippines, Sri Lanka, Thailand, Timor-Leste, Uzbekistan, Viet Nam. • African countries: Angola, Botswana, Burkina Faso, Burundi, Cote • Asian countries: Afghanistan, Armenia, Azerbaijan, Bangladesh,

The Regional Dimension A Bayesian Network Analysis Marco Scutari - PowerPoint PPT Presentation

The Regional Dimension A Bayesian Network Analysis Marco Scutari scutari@idsia.ch Dalle Molle Institute for Artificial Intelligence (IDSIA) December 19, 2019 Overview Bayesian networks: To showcase a proof-of-concept model we will use a

VC-dimension and Erd os-P osa property Nicolas Bousquet LIRMM, University Montpellier II

The Human Dimension Sue Manns Regional Director Pegasus The Human Dimension The Human

Dimension Reduction and Nearest Neighbor Search Advanced Algorithms Nanjing University, Fall

Dimension Reduction CSE 6242 / CX 4242 Thanks : Prof. Jaegul Choo , Dr. Ramakrishnan Kannan,

The Metric Dimension Problem. J. D az Monash U., May 2018 The Metric Dimension problem

Packing Dimension Results for Anisotropic Gaussian Random Fields Dongsheng Wu Department of

Developing the intercultural dimension Developing the intercultural dimension in teaching and

1 In this lecture we discuss Pansus conformal dimension . Definition (Pansu, 1989) Let X be a

Portland Vancouver ULTRA-Ex Ecological Dimension Social Dimension Riparian greenspaces Land use

Boolean Function Jean Vuillemin ENS, Paris Dimension d= D (f) Bound d |DD(f)| on all

Linear Dimension Reduction (in L 2 ) Linear Dimension Reduction: R D R d Goal: Find a low-dim.

Crossed product C -algebras and nuclear dimension Jianchao Wu University of M unster Aug

The theory of essential dimension was born in 1997 with the publication of On the essential

Examples of the VC Dimension prof. dr Arno Siebes Algorithmic Data Analysis Group Department of

Technical Information Rack Slide Dimension Drawing and Usage Table Dimension Diagram Rack

Computing Singularity Dimension Mark Pollicott 12 December 2012 1 / 27 Introduction Self

Causal Data Science Roman Kern Knowledge Discovery and Data Mining 2 (Version 1.0.4) Roman Kern,

An Investigation of Why Overparameterization Exacerbates Spurious Correlation Authors: Shiori

Week 13 -Wednesday Image based effects Skyboxes Lightfields Sprites Billboards

Register Allocation (via graph coloring and spilling) Register allocation LLVM IR uses an

Causality V. Bunkin, L. Steffen (Seminar in Statistics) Causality 02.05.2016 1 / 23

Party on! A new, conditional variable importance A new, conditional importance measure for

The Problem of Size prof. dr Arno Siebes Algorithmic Data Analysis Group Department of

Findings Related to Anomaly Trends of AIRS V5 L3 Products Joel Susskind and Gyula Molnar NASA