Error Exponents for Composite Hypothesis Testing of Markov Forest - - PowerPoint PPT Presentation

error exponents for composite hypothesis testing of
SMART_READER_LITE
LIVE PREVIEW

Error Exponents for Composite Hypothesis Testing of Markov Forest - - PowerPoint PPT Presentation

Error Exponents for Composite Hypothesis Testing of Markov Forest Distributions Vincent Tan, Anima Anandkumar, Alan S. Willsky Stochastic Systems Group, Laboratory for Information and Decision Systems, Massachusetts Institute of Technology


slide-1
SLIDE 1

Error Exponents for Composite Hypothesis Testing

  • f Markov Forest Distributions

Vincent Tan, Anima Anandkumar, Alan S. Willsky

Stochastic Systems Group, Laboratory for Information and Decision Systems, Massachusetts Institute of Technology

ISIT (Jun 18, 2010)

1/17 Vincent Tan (SSG, LIDS, MIT) Testing Markov Forest Distributions ISIT 2010 1 / 17

slide-2
SLIDE 2

Motivation

Continuation of line of work on error exponents for learning tree-structured graphical models:

Discrete Case: Tan, Anandkumar, Tong, Willsky, ISIT 2009.

2/17 Vincent Tan (SSG, LIDS, MIT) Testing Markov Forest Distributions ISIT 2010 2 / 17

slide-3
SLIDE 3

Motivation

Continuation of line of work on error exponents for learning tree-structured graphical models:

Discrete Case: Tan, Anandkumar, Tong, Willsky, ISIT 2009. Gaussian Case: Tan, Anandkumar, Willsky, Trans. SP 2010.

2/17 Vincent Tan (SSG, LIDS, MIT) Testing Markov Forest Distributions ISIT 2010 2 / 17

slide-4
SLIDE 4

Motivation

Continuation of line of work on error exponents for learning tree-structured graphical models:

Discrete Case: Tan, Anandkumar, Tong, Willsky, ISIT 2009. Gaussian Case: Tan, Anandkumar, Willsky, Trans. SP 2010.

Instead of learning, we instead focus on hypothesis testing.

2/17 Vincent Tan (SSG, LIDS, MIT) Testing Markov Forest Distributions ISIT 2010 2 / 17

slide-5
SLIDE 5

Motivation

Continuation of line of work on error exponents for learning tree-structured graphical models:

Discrete Case: Tan, Anandkumar, Tong, Willsky, ISIT 2009. Gaussian Case: Tan, Anandkumar, Willsky, Trans. SP 2010.

Instead of learning, we instead focus on hypothesis testing. Provides intuition for which classes of graphical models are easy for learning in terms of the detection error exponent.

2/17 Vincent Tan (SSG, LIDS, MIT) Testing Markov Forest Distributions ISIT 2010 2 / 17

slide-6
SLIDE 6

Motivation

Continuation of line of work on error exponents for learning tree-structured graphical models:

Discrete Case: Tan, Anandkumar, Tong, Willsky, ISIT 2009. Gaussian Case: Tan, Anandkumar, Willsky, Trans. SP 2010.

Instead of learning, we instead focus on hypothesis testing. Provides intuition for which classes of graphical models are easy for learning in terms of the detection error exponent. Is there a relation between the detection error exponent and the exponent associated to structure learning?

2/17 Vincent Tan (SSG, LIDS, MIT) Testing Markov Forest Distributions ISIT 2010 2 / 17

slide-7
SLIDE 7

Background on Tree-Structured Graphical Models

Graphical model: family of multivariate probability distributions that factorize according to a given graph G = (V, E). Vertices in the set V = {1, . . . , d} correspond to variables and edges in E ⊂ V

2

  • to conditional independences.

3/17 Vincent Tan (SSG, LIDS, MIT) Testing Markov Forest Distributions ISIT 2010 3 / 17

slide-8
SLIDE 8

Background on Tree-Structured Graphical Models

Graphical model: family of multivariate probability distributions that factorize according to a given graph G = (V, E). Vertices in the set V = {1, . . . , d} correspond to variables and edges in E ⊂ V

2

  • to conditional independences.

Example for tree-structured P(x) with d = 4.

3/17 Vincent Tan (SSG, LIDS, MIT) Testing Markov Forest Distributions ISIT 2010 3 / 17

slide-9
SLIDE 9

Background on Tree-Structured Graphical Models

Graphical model: family of multivariate probability distributions that factorize according to a given graph G = (V, E). Vertices in the set V = {1, . . . , d} correspond to variables and edges in E ⊂ V

2

  • to conditional independences.

Example for tree-structured P(x) with d = 4.

✉ ✉ ✉ ✉ ❅ ❅ ❅ ❅

  • X4

X1 X3 X2

3/17 Vincent Tan (SSG, LIDS, MIT) Testing Markov Forest Distributions ISIT 2010 3 / 17

slide-10
SLIDE 10

Background on Tree-Structured Graphical Models

Graphical model: family of multivariate probability distributions that factorize according to a given graph G = (V, E). Vertices in the set V = {1, . . . , d} correspond to variables and edges in E ⊂ V

2

  • to conditional independences.

Example for tree-structured P(x) with d = 4.

✉ ✉ ✉ ✉ ❅ ❅ ❅ ❅

  • X4

X1 X3 X2 P(x1, x2, x3, x4) = P1(x1) × P1,2(x1, x2) P1(x1) × P1,3(x1, x3) P1(x1) × P1,4(x1, x4) P1(x1) .

3/17 Vincent Tan (SSG, LIDS, MIT) Testing Markov Forest Distributions ISIT 2010 3 / 17

slide-11
SLIDE 11

Learning vs Hypothesis Testing

Canonical Problem: Given x1, . . . , xn ∼ P, learn structure of P.

4/17 Vincent Tan (SSG, LIDS, MIT) Testing Markov Forest Distributions ISIT 2010 4 / 17

slide-12
SLIDE 12

Learning vs Hypothesis Testing

Canonical Problem: Given x1, . . . , xn ∼ P, learn structure of P. If P is a tree, can use Chow and Liu (1968) as an efficient implementation of ML.

4/17 Vincent Tan (SSG, LIDS, MIT) Testing Markov Forest Distributions ISIT 2010 4 / 17

slide-13
SLIDE 13

Learning vs Hypothesis Testing

Canonical Problem: Given x1, . . . , xn ∼ P, learn structure of P. If P is a tree, can use Chow and Liu (1968) as an efficient implementation of ML. Denote set of distributions Markov on a tree T0 ∈ T as D(T0). Set

  • f distributions Markov on any tree is D(T ).

4/17 Vincent Tan (SSG, LIDS, MIT) Testing Markov Forest Distributions ISIT 2010 4 / 17

slide-14
SLIDE 14

Learning vs Hypothesis Testing

Canonical Problem: Given x1, . . . , xn ∼ P, learn structure of P. If P is a tree, can use Chow and Liu (1968) as an efficient implementation of ML. Denote set of distributions Markov on a tree T0 ∈ T as D(T0). Set

  • f distributions Markov on any tree is D(T ).

Composite hypothesis testing problem considered here: H0 : x1, . . . , xn ∼ Λ0 ⊂ D(T ) H1 : x1, . . . , xn ∼ Λ1 ⊂ D(T ) Λi closed and Λ0 ∩ Λ1 = ∅.

4/17 Vincent Tan (SSG, LIDS, MIT) Testing Markov Forest Distributions ISIT 2010 4 / 17

slide-15
SLIDE 15

Definition of Worst-Case Type-II Error Exponent

Neyman-Pearson setup. Acceptance regions (An).

5/17 Vincent Tan (SSG, LIDS, MIT) Testing Markov Forest Distributions ISIT 2010 5 / 17

slide-16
SLIDE 16

Definition of Worst-Case Type-II Error Exponent

Neyman-Pearson setup. Acceptance regions (An). Def: Type-II error exponent for a fixed Q ∈ Λ1 given (An): J(Λ0, Q; An) := lim inf

n→∞ −1

n log Qn(An)

5/17 Vincent Tan (SSG, LIDS, MIT) Testing Markov Forest Distributions ISIT 2010 5 / 17

slide-17
SLIDE 17

Definition of Worst-Case Type-II Error Exponent

Neyman-Pearson setup. Acceptance regions (An). Def: Type-II error exponent for a fixed Q ∈ Λ1 given (An): J(Λ0, Q; An) := lim inf

n→∞ −1

n log Qn(An) Def: Optimal Type-II error exponent J∗(Λ0, Q) := sup

An:Pn(An)≤α,∀P∈Λ0

J(Λ0, Q; An)

5/17 Vincent Tan (SSG, LIDS, MIT) Testing Markov Forest Distributions ISIT 2010 5 / 17

slide-18
SLIDE 18

Definition of Worst-Case Type-II Error Exponent

Neyman-Pearson setup. Acceptance regions (An). Def: Type-II error exponent for a fixed Q ∈ Λ1 given (An): J(Λ0, Q; An) := lim inf

n→∞ −1

n log Qn(An) Def: Optimal Type-II error exponent J∗(Λ0, Q) := sup

An:Pn(An)≤α,∀P∈Λ0

J(Λ0, Q; An) Def: Worst-Case Optimal Type-II error exponent J∗(Λ0, Λ1) := inf

Q∈Λ1 J∗(Λ0, Q)

5/17 Vincent Tan (SSG, LIDS, MIT) Testing Markov Forest Distributions ISIT 2010 5 / 17

slide-19
SLIDE 19

Definition of Worst-Case Type-II Error Exponent

Neyman-Pearson setup. Acceptance regions (An). Def: Type-II error exponent for a fixed Q ∈ Λ1 given (An): J(Λ0, Q; An) := lim inf

n→∞ −1

n log Qn(An) Def: Optimal Type-II error exponent J∗(Λ0, Q) := sup

An:Pn(An)≤α,∀P∈Λ0

J(Λ0, Q; An) Def: Worst-Case Optimal Type-II error exponent J∗(Λ0, Λ1) := inf

Q∈Λ1 J∗(Λ0, Q)

Optimizing distribution Q∗ called the least favorable distribution.

5/17 Vincent Tan (SSG, LIDS, MIT) Testing Markov Forest Distributions ISIT 2010 5 / 17

slide-20
SLIDE 20

Why Difficult?

6/17 Vincent Tan (SSG, LIDS, MIT) Testing Markov Forest Distributions ISIT 2010 6 / 17

slide-21
SLIDE 21

Why Difficult?

Many trees: If there are d nodes, there are dd−2 trees! Searching for the dominant error event may be intractable.

6/17 Vincent Tan (SSG, LIDS, MIT) Testing Markov Forest Distributions ISIT 2010 6 / 17

slide-22
SLIDE 22

Why Difficult?

Many trees: If there are d nodes, there are dd−2 trees! Searching for the dominant error event may be intractable. Natural Questions: Any closed-form expressions for the worst-case error exponent for special Λ0, Λ1? How does this depend on the true distribution?

6/17 Vincent Tan (SSG, LIDS, MIT) Testing Markov Forest Distributions ISIT 2010 6 / 17

slide-23
SLIDE 23

Why Difficult?

Many trees: If there are d nodes, there are dd−2 trees! Searching for the dominant error event may be intractable. Natural Questions: Any closed-form expressions for the worst-case error exponent for special Λ0, Λ1? How does this depend on the true distribution? Connections to learning? Intuition and characterization of the least favorable distribution?

6/17 Vincent Tan (SSG, LIDS, MIT) Testing Markov Forest Distributions ISIT 2010 6 / 17

slide-24
SLIDE 24

A Simplification

Assume that H0 is simple and P is Markov on T0 = (V, E0). H0 : x1, . . . , xn ∼ {P} H1 : x1, . . . , xn ∼ Λ1 = D(T ) \ D(T0)

7/17 Vincent Tan (SSG, LIDS, MIT) Testing Markov Forest Distributions ISIT 2010 7 / 17

slide-25
SLIDE 25

A Simplification

Assume that H0 is simple and P is Markov on T0 = (V, E0). H0 : x1, . . . , xn ∼ {P} H1 : x1, . . . , xn ∼ Λ1 = D(T ) \ D(T0)

✂ ✂ ✂

Λ1 = D(T ) \ D(T0) D(T0) P

J∗(P) Q∗

J∗(P) := J∗({P}, D(T ) \ D(T0))

7/17 Vincent Tan (SSG, LIDS, MIT) Testing Markov Forest Distributions ISIT 2010 7 / 17

slide-26
SLIDE 26

Setup for Main Result

For a non-edge e′ = (i, j), let Path(e′) be the unique path joining i and j. Let L(i, j) be the number of hops between i and j.

8/17 Vincent Tan (SSG, LIDS, MIT) Testing Markov Forest Distributions ISIT 2010 8 / 17

slide-27
SLIDE 27

Setup for Main Result

For a non-edge e′ = (i, j), let Path(e′) be the unique path joining i and j. Let L(i, j) be the number of hops between i and j.

✇ ✇ ✇ ✇ ✇

❅ ❅ ❅ ❅

i j k e′ =(i, j) e1 =(i, k) e2 =(k, j) l

8/17 Vincent Tan (SSG, LIDS, MIT) Testing Markov Forest Distributions ISIT 2010 8 / 17

slide-28
SLIDE 28

Setup for Main Result

For a non-edge e′ = (i, j), let Path(e′) be the unique path joining i and j. Let L(i, j) be the number of hops between i and j.

✇ ✇ ✇ ✇ ✇

❅ ❅ ❅ ❅

i j k e′ =(i, j) e1 =(i, k) e2 =(k, j) l

❅ ❅ ❅ ❅ Figure: Path(e′) = {(i, k), (k, j)}

8/17 Vincent Tan (SSG, LIDS, MIT) Testing Markov Forest Distributions ISIT 2010 8 / 17

slide-29
SLIDE 29

Setup for Main Result

For a non-edge e′ = (i, j), let Path(e′) be the unique path joining i and j. Let L(i, j) be the number of hops between i and j.

✇ ✇ ✇ ✇ ✇

❅ ❅ ❅ ❅

i j k e′ =(i, j) e1 =(i, k) e2 =(k, j) l

❅ ❅ ❅ ❅ Figure: Path(e′) = {(i, k), (k, j)}

Mutual information of joint distribution Pe = Pi,j denoted as I(Pe).

8/17 Vincent Tan (SSG, LIDS, MIT) Testing Markov Forest Distributions ISIT 2010 8 / 17

slide-30
SLIDE 30

Main Result

Proposition J∗(P) = min

e′=(i,j)/ ∈E0 L(i,j)=2

min

e∈Path(e′) {I(Pe) − I(Pe′)},

9/17 Vincent Tan (SSG, LIDS, MIT) Testing Markov Forest Distributions ISIT 2010 9 / 17

slide-31
SLIDE 31

Main Result

Proposition J∗(P) = min

e′=(i,j)/ ∈E0 L(i,j)=2

min

e∈Path(e′) {I(Pe) − I(Pe′)},

Illustration:

✇ ✇ ✇ ✇

❅ ❅ ❅ ❅ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣

X4 X1 X3 X2

5 6 4 1.5 3.9 2

9/17 Vincent Tan (SSG, LIDS, MIT) Testing Markov Forest Distributions ISIT 2010 9 / 17

slide-32
SLIDE 32

Main Result

Proposition J∗(P) = min

e′=(i,j)/ ∈E0 L(i,j)=2

min

e∈Path(e′) {I(Pe) − I(Pe′)},

Illustration:

✇ ✇ ✇ ✇

❅ ❅ ❅ ❅ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣

X4 X1 X3 X2

5 6 4 1.5 3.9 2

❆ ❆ ❆ ❆ ❆ ❆ ❆ ❆ ❆ ❆

3.9 4

9/17 Vincent Tan (SSG, LIDS, MIT) Testing Markov Forest Distributions ISIT 2010 9 / 17

slide-33
SLIDE 33

Main Result

Proposition J∗(P) = min

e′=(i,j)/ ∈E0 L(i,j)=2

min

e∈Path(e′) {I(Pe) − I(Pe′)},

Illustration:

✇ ✇ ✇ ✇

❅ ❅ ❅ ❅ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣

X4 X1 X3 X2

5 6 4 1.5 3.9 2

❆ ❆ ❆ ❆ ❆ ❆ ❆ ❆ ❆ ❆

3.9 4

J∗(P) = 4 − 3.9 = 0.1

9/17 Vincent Tan (SSG, LIDS, MIT) Testing Markov Forest Distributions ISIT 2010 9 / 17

slide-34
SLIDE 34

Least Favorable Distribution

The least favorable distribution Q∗ is characterized by EQ∗ = argmax

E=E0, E acyclic

  • e∈E

I(Pe) a second-best max-weight spanning tree problem,

10/17 Vincent Tan (SSG, LIDS, MIT) Testing Markov Forest Distributions ISIT 2010 10 / 17

slide-35
SLIDE 35

Least Favorable Distribution

The least favorable distribution Q∗ is characterized by EQ∗ = argmax

E=E0, E acyclic

  • e∈E

I(Pe) a second-best max-weight spanning tree problem, and Q∗

i (xi) = Pi(xi),

∀ i ∈ V Q∗

i,j(xi, xj) = Pi,j(xi, xj),

∀ (i, j) ∈ EQ∗

10/17 Vincent Tan (SSG, LIDS, MIT) Testing Markov Forest Distributions ISIT 2010 10 / 17

slide-36
SLIDE 36

Proof Outline

Optimization for worst-case exponent is inf

Q∈D(T )\D({T0}) D(Q || P)

11/17 Vincent Tan (SSG, LIDS, MIT) Testing Markov Forest Distributions ISIT 2010 11 / 17

slide-37
SLIDE 37

Proof Outline

Optimization for worst-case exponent is inf

Q∈D(T )\D({T0}) D(Q || P) =

min

T∈T \{T0}

  • inf

Q∈D(T) D(Q || P)

  • 11/17

Vincent Tan (SSG, LIDS, MIT) Testing Markov Forest Distributions ISIT 2010 11 / 17

slide-38
SLIDE 38

Proof Outline

Optimization for worst-case exponent is inf

Q∈D(T )\D({T0}) D(Q || P) =

min

T∈T \{T0}

  • inf

Q∈D(T) D(Q || P)

  • Use tree decomposition (junction tree theorem)

Q(x) =

  • i∈V(T)

Qi(xi)

  • (i,j)∈E(T)

Qi,j(xi, xj) Qi(xi)Qj(xj)

11/17 Vincent Tan (SSG, LIDS, MIT) Testing Markov Forest Distributions ISIT 2010 11 / 17

slide-39
SLIDE 39

Proof Outline

Optimization for worst-case exponent is inf

Q∈D(T )\D({T0}) D(Q || P) =

min

T∈T \{T0}

  • inf

Q∈D(T) D(Q || P)

  • Use tree decomposition (junction tree theorem)

Q(x) =

  • i∈V(T)

Qi(xi)

  • (i,j)∈E(T)

Qi,j(xi, xj) Qi(xi)Qj(xj) Emulate Chow and Liu (1968).

11/17 Vincent Tan (SSG, LIDS, MIT) Testing Markov Forest Distributions ISIT 2010 11 / 17

slide-40
SLIDE 40

Proof Outline

Optimization for worst-case exponent is inf

Q∈D(T )\D({T0}) D(Q || P) =

min

T∈T \{T0}

  • inf

Q∈D(T) D(Q || P)

  • Use tree decomposition (junction tree theorem)

Q(x) =

  • i∈V(T)

Qi(xi)

  • (i,j)∈E(T)

Qi,j(xi, xj) Qi(xi)Qj(xj) Emulate Chow and Liu (1968). Second-best max-weight spanning tree differs from best one by a single edge [Cormen et al. 2003].

11/17 Vincent Tan (SSG, LIDS, MIT) Testing Markov Forest Distributions ISIT 2010 11 / 17

slide-41
SLIDE 41

Proof Outline

Optimization for worst-case exponent is inf

Q∈D(T )\D({T0}) D(Q || P) =

min

T∈T \{T0}

  • inf

Q∈D(T) D(Q || P)

  • Use tree decomposition (junction tree theorem)

Q(x) =

  • i∈V(T)

Qi(xi)

  • (i,j)∈E(T)

Qi,j(xi, xj) Qi(xi)Qj(xj) Emulate Chow and Liu (1968). Second-best max-weight spanning tree differs from best one by a single edge [Cormen et al. 2003]. Data processing inequality.

11/17 Vincent Tan (SSG, LIDS, MIT) Testing Markov Forest Distributions ISIT 2010 11 / 17

slide-42
SLIDE 42

Intuition

J∗(P) = min

e′=(i,j)/ ∈E0 L(i,j)=2

min

e∈Path(e′) {I(Pe) − I(Pe′)},

Smaller the difference between MI on true edge and MI on non-edge (along path), smaller the detection error exponent.

12/17 Vincent Tan (SSG, LIDS, MIT) Testing Markov Forest Distributions ISIT 2010 12 / 17

slide-43
SLIDE 43

Intuition

J∗(P) = min

e′=(i,j)/ ∈E0 L(i,j)=2

min

e∈Path(e′) {I(Pe) − I(Pe′)},

Smaller the difference between MI on true edge and MI on non-edge (along path), smaller the detection error exponent. Detection error exponent depends only on bottleneck edges.

12/17 Vincent Tan (SSG, LIDS, MIT) Testing Markov Forest Distributions ISIT 2010 12 / 17

slide-44
SLIDE 44

Comparison to Existing Results

J∗(P) = min

e′=(i,j)/ ∈E0 L(i,j)=2

min

e∈Path(e′) {I(Pe) − I(Pe′)},

Intuitive in light of the Chow-Liu algorithm for learning trees. ˆ EML := argmax

E=E0, E acyclic

  • e∈E

I(ˆ µe) where ˆ µe is the pairwise type on edge e.

13/17 Vincent Tan (SSG, LIDS, MIT) Testing Markov Forest Distributions ISIT 2010 13 / 17

slide-45
SLIDE 45

Comparison to Existing Results

J∗(P) = min

e′=(i,j)/ ∈E0 L(i,j)=2

min

e∈Path(e′) {I(Pe) − I(Pe′)},

Intuitive in light of the Chow-Liu algorithm for learning trees. ˆ EML := argmax

E=E0, E acyclic

  • e∈E

I(ˆ µe) where ˆ µe is the pairwise type on edge e. Learning error exponent in very-noisy regime

  • K(P) := min

e′ / ∈E0

min

e∈Path(e′)

(I(Pe) − I(Pe′))2 2Var(Se − Se′)

13/17 Vincent Tan (SSG, LIDS, MIT) Testing Markov Forest Distributions ISIT 2010 13 / 17

slide-46
SLIDE 46

Comparison to Existing Results

J∗(P) = min

e′=(i,j)/ ∈E0 L(i,j)=2

min

e∈Path(e′) {I(Pe) − I(Pe′)},

Intuitive in light of the Chow-Liu algorithm for learning trees. ˆ EML := argmax

E=E0, E acyclic

  • e∈E

I(ˆ µe) where ˆ µe is the pairwise type on edge e. Learning error exponent in very-noisy regime

  • K(P) := min

e′ / ∈E0

min

e∈Path(e′)

(I(Pe) − I(Pe′))2 2Var(Se − Se′) J∗(P) and K(P) depend on the difference of mutual informations.

13/17 Vincent Tan (SSG, LIDS, MIT) Testing Markov Forest Distributions ISIT 2010 13 / 17

slide-47
SLIDE 47

Performing the Hypothesis Test

Known that the worst-case error exponent is achieved by the Hoeffding Test. But hard to implement for tree distributions.

14/17 Vincent Tan (SSG, LIDS, MIT) Testing Markov Forest Distributions ISIT 2010 14 / 17

slide-48
SLIDE 48

Performing the Hypothesis Test

Known that the worst-case error exponent is achieved by the Hoeffding Test. But hard to implement for tree distributions. The generalized likelihood ratio test (GLRT) has acceptance regions An :=

  • xn : 1

n log maxQ∈Λ1 Qn(xn) maxP∈Λ0 Pn(xn) ≥ γ

  • 14/17

Vincent Tan (SSG, LIDS, MIT) Testing Markov Forest Distributions ISIT 2010 14 / 17

slide-49
SLIDE 49

Performing the Hypothesis Test

Known that the worst-case error exponent is achieved by the Hoeffding Test. But hard to implement for tree distributions. The generalized likelihood ratio test (GLRT) has acceptance regions An :=

  • xn : 1

n log maxQ∈Λ1 Qn(xn) maxP∈Λ0 Pn(xn) ≥ γ

  • When the null hypothesis is simple, the GLRT also simplifies.

14/17 Vincent Tan (SSG, LIDS, MIT) Testing Markov Forest Distributions ISIT 2010 14 / 17

slide-50
SLIDE 50

Performing the Hypothesis Test

Known that the worst-case error exponent is achieved by the Hoeffding Test. But hard to implement for tree distributions. The generalized likelihood ratio test (GLRT) has acceptance regions An :=

  • xn : 1

n log maxQ∈Λ1 Qn(xn) maxP∈Λ0 Pn(xn) ≥ γ

  • When the null hypothesis is simple, the GLRT also simplifies.

H0 : x1, . . . , xn ∼ {P} H1 : x1, . . . , xn ∼ Λ1 = D(T ) \ D(T0)

14/17 Vincent Tan (SSG, LIDS, MIT) Testing Markov Forest Distributions ISIT 2010 14 / 17

slide-51
SLIDE 51

The Generalized Likelihood Ratio Test

Denote the joint type of xn as ˆ µ := ˆ µ(· ; xn). Denote the pairwise type on e as ˆ µe. True set of edges: E0. Proposition The GLRT simplifies as An =

  • xn :
  • e∈E∗

I(ˆ µe) −

  • e∈E0

I(ˆ µe) ≥ γ

  • where the “dominating edge set” is

E∗ = argmax

E=E0, E acyclic

  • e∈E

I(ˆ µe)

15/17 Vincent Tan (SSG, LIDS, MIT) Testing Markov Forest Distributions ISIT 2010 15 / 17

slide-52
SLIDE 52

Interpretation and Extensions

Easy to implement the GLRT for testing between trees.

1VTan, A. Anandkumar, A. Willsky “Learning High-Dimensional Markov Forest Distributions:

Analysis of Error Rates”, Submitted to JMLR, May 2010.

16/17 Vincent Tan (SSG, LIDS, MIT) Testing Markov Forest Distributions ISIT 2010 16 / 17

slide-53
SLIDE 53

Interpretation and Extensions

Easy to implement the GLRT for testing between trees. Can find the tree structure E∗ efficiently once pairwise types ˆ µe have been computed.

1VTan, A. Anandkumar, A. Willsky “Learning High-Dimensional Markov Forest Distributions:

Analysis of Error Rates”, Submitted to JMLR, May 2010.

16/17 Vincent Tan (SSG, LIDS, MIT) Testing Markov Forest Distributions ISIT 2010 16 / 17

slide-54
SLIDE 54

Interpretation and Extensions

Easy to implement the GLRT for testing between trees. Can find the tree structure E∗ efficiently once pairwise types ˆ µe have been computed. Extensions to forest-structured distributions for error exponent and GLRT are straightforward.

1VTan, A. Anandkumar, A. Willsky “Learning High-Dimensional Markov Forest Distributions:

Analysis of Error Rates”, Submitted to JMLR, May 2010.

16/17 Vincent Tan (SSG, LIDS, MIT) Testing Markov Forest Distributions ISIT 2010 16 / 17

slide-55
SLIDE 55

Interpretation and Extensions

Easy to implement the GLRT for testing between trees. Can find the tree structure E∗ efficiently once pairwise types ˆ µe have been computed. Extensions to forest-structured distributions for error exponent and GLRT are straightforward. Recent work on high-dimensional learning of forest-structured

  • distributions. 1

1VTan, A. Anandkumar, A. Willsky “Learning High-Dimensional Markov Forest Distributions:

Analysis of Error Rates”, Submitted to JMLR, May 2010.

16/17 Vincent Tan (SSG, LIDS, MIT) Testing Markov Forest Distributions ISIT 2010 16 / 17

slide-56
SLIDE 56

Concluding Remarks

Analyzed the worst-case type-II error exponent for composite hypothesis testing of Markov forest distributions. Close relations to learning.

17/17 Vincent Tan (SSG, LIDS, MIT) Testing Markov Forest Distributions ISIT 2010 17 / 17

slide-57
SLIDE 57

Concluding Remarks

Analyzed the worst-case type-II error exponent for composite hypothesis testing of Markov forest distributions. Close relations to learning. Possible extension 1: Bayesian formulation (Chernoff Information).

17/17 Vincent Tan (SSG, LIDS, MIT) Testing Markov Forest Distributions ISIT 2010 17 / 17

slide-58
SLIDE 58

Concluding Remarks

Analyzed the worst-case type-II error exponent for composite hypothesis testing of Markov forest distributions. Close relations to learning. Possible extension 1: Bayesian formulation (Chernoff Information). Possible extension 2: Decomposable graphical models.

17/17 Vincent Tan (SSG, LIDS, MIT) Testing Markov Forest Distributions ISIT 2010 17 / 17