DAGs with NO TEARS Continuous Optimization for Structure Learning - - PowerPoint PPT Presentation

dags with no tears
SMART_READER_LITE
LIVE PREVIEW

DAGs with NO TEARS Continuous Optimization for Structure Learning - - PowerPoint PPT Presentation

DAGs with NO TEARS Continuous Optimization for Structure Learning Xun Zheng Bryon Aragam Pradeep Ravikumar Eric Xing Machine Learning Department Carnegie Mellon University November 28, 2018 Xun Zheng (CMU) DAGs with NO TEARS November 28,


slide-1
SLIDE 1

DAGs with NO TEARS

Continuous Optimization for Structure Learning Xun Zheng Bryon Aragam Pradeep Ravikumar Eric Xing

Machine Learning Department Carnegie Mellon University

November 28, 2018

Xun Zheng (CMU) DAGs with NO TEARS November 28, 2018 1 / 8

slide-2
SLIDE 2

Background

Graphical models: compact models of p(x1, . . . , xd) x1 x2 x3 x4

sample

− − − − − →    x1 x2 x3 x4 4.00 −1.14 0.20 −2.37 −1.05 0.35 −0.66 −0.39 . . . . . . . . . . . .    Structure learning: what graph fits the data best?

?

x1 x2 x3 x4

estimate

← − − − − −    x1 x2 x3 x4 4.00 −1.14 0.20 −2.37 −1.05 0.35 −0.66 −0.39 . . . . . . . . . . . .   

Xun Zheng (CMU) DAGs with NO TEARS November 28, 2018 2 / 8

slide-3
SLIDE 3

Structure Learning: Where Are We?

MNs BNs Comments constraint-based

  • need faithfulness

score-based, local search

  • combinatorial opt.

Xun Zheng (CMU) DAGs with NO TEARS November 28, 2018 3 / 8

slide-4
SLIDE 4

Structure Learning: Where Are We?

MNs BNs Comments constraint-based

  • need faithfulness

score-based, local search

  • combinatorial opt.

score-based, global search † ?∗ continuous opt.

†Breakthough in Markov Networks:

Huge success of methods like graphical lasso Widely applied in various fields, e.g. bioinformatics

∗Challenges in Bayesian Networks:

Directed graph → asymmetric matrix Acyclic graph → combinatorial constraint

Xun Zheng (CMU) DAGs with NO TEARS November 28, 2018 3 / 8

slide-5
SLIDE 5

Structure Learning: Where Are We?

MNs BNs Comments constraint-based

  • need faithfulness

score-based, local search

  • combinatorial opt.

score-based, global search † this work continuous opt.

†Breakthough in Markov Networks:

Huge success of methods like graphical lasso Widely applied in various fields, e.g. bioinformatics

∗Challenges in Bayesian Networks:

Directed graph → asymmetric matrix Acyclic graph → combinatorial constraint

Xun Zheng (CMU) DAGs with NO TEARS November 28, 2018 3 / 8

slide-6
SLIDE 6

tl;dr

max

G

score(G) s.t. G ∈ DAG ⇐ ⇒ max

W

score(W ) s.t. h(W ) ≤ 0 (combinatorial ) (smooth )

Xun Zheng (CMU) DAGs with NO TEARS November 28, 2018 4 / 8

slide-7
SLIDE 7

tl;dr

max

G

score(G) s.t. G ∈ DAG ⇐ ⇒ max

W

score(W ) s.t. h(W ) ≤ 0 (combinatorial ) (smooth )

Smooth Characterization of DAG

Such function exists: h(W ) = tr(eW ◦W ) − d. Moreover, simple gradient: ∇h(W ) = (eW ◦W )T ◦ 2W .

Xun Zheng (CMU) DAGs with NO TEARS November 28, 2018 4 / 8

slide-8
SLIDE 8

NO TEARS

Available at: github.com/xunzheng/notears 30 lines (function, gradient) + 20 lines (optimize) ≈ 50 lines Existing algorithms: ≫ 1000 lines

Xun Zheng (CMU) DAGs with NO TEARS November 28, 2018 5 / 8

slide-9
SLIDE 9

Results: Recovering Erdos-Renyi Graph

5 10 15 5 10 15

W

5 10 15 5 10 15

  • W

5 10 15 5 10 15

FGS

  • 2

+2

ground truth this work baseline (FGS)

Xun Zheng (CMU) DAGs with NO TEARS November 28, 2018 6 / 8

slide-10
SLIDE 10

Results: Recovering Scale-free Graph

5 10 15 5 10 15

W

5 10 15 5 10 15

  • W

5 10 15 5 10 15

FGS

  • 2

+2

ground truth this work baseline (FGS)

Xun Zheng (CMU) DAGs with NO TEARS November 28, 2018 7 / 8

slide-11
SLIDE 11

Summary

A smooth characterization of DAG: h(W ) = tr(eW ◦W ) − d ≤ 0 ⇐ ⇒ G(W ) ∈ DAG Use existing solvers for constrained optimization problem: max

W

score(W ) s.t. h(W ) ≤ 0 Bridge optimization and structure learning

Xun Zheng (CMU) DAGs with NO TEARS November 28, 2018 8 / 8