Kernel methods for hypothesis testing and inference MLSS T - - PowerPoint PPT Presentation

kernel methods for hypothesis testing and inference
SMART_READER_LITE
LIVE PREVIEW

Kernel methods for hypothesis testing and inference MLSS T - - PowerPoint PPT Presentation

Kernel methods for hypothesis testing and inference MLSS T ubingen, 2015 Arthur Gretton Gatsby Unit, CSML, UCL Some motivating questions... Detecting differences in brain signals The problem: Do local field potential (LFP) signals change


slide-1
SLIDE 1

Kernel methods for hypothesis testing and inference

MLSS T¨ ubingen, 2015

Arthur Gretton Gatsby Unit, CSML, UCL

slide-2
SLIDE 2

Some motivating questions...

slide-3
SLIDE 3

Detecting differences in brain signals The problem: Do local field potential (LFP) signals change when measured near a spike burst?

20 40 60 80 100 −0.4 −0.3 −0.2 −0.1 0.1 0.2 0.3

LFP near spike burst Time LFP amplitude

20 40 60 80 100 −0.4 −0.3 −0.2 −0.1 0.1 0.2 0.3

LFP without spike burst Time LFP amplitude

slide-4
SLIDE 4

Detecting differences in brain signals The problem: Do local field potential (LFP) signals change when measured near a spike burst?

slide-5
SLIDE 5

Detecting differences in brain signals The problem: Do local field potential (LFP) signals change when measured near a spike burst?

slide-6
SLIDE 6

Detecting differences in amplitude modulated signals

Samples from P Samples from Q

slide-7
SLIDE 7

Adversarial training of deep neural networks

From ICML 2015:

Generative Moment Matching Networks

Yujia Li1

YUJIALI@CS.TORONTO.EDU

Kevin Swersky1

KSWERSKY@CS.TORONTO.EDU

Richard Zemel1,2

ZEMEL@CS.TORONTO.EDU

1Department of Computer Science, University of Toronto, Toronto, ON, CANADA 2Canadian Institute for Advanced Research, Toronto, ON, CANADA

arXiv:1502.02761v1 [cs.LG] 10 Feb 2015

From UAI 2015:

Training generative neural networks via Maximum Mean Discrepancy

  • ptimization

Gintare Karolina Dziugaite University of Cambridge Daniel M. Roy University of Toronto Zoubin Ghahramani University of Cambridge

Idea: In adversarial nets (Goodfellow et al. NIPS 2014), replace discriminator network with maximum mean discrepancy, a kernel distance between distributions.

slide-8
SLIDE 8

Case of discrete domains

  • How do you compare distributions. . .
  • . . .in a discrete domain?

[Read and Cressie, 1988]

slide-9
SLIDE 9

Case of discrete domains

  • How do you compare distributions. . .
  • . . .in a discrete domain?

[Read and Cressie, 1988]

X1:

Now disturbing reports out of Newfound- land show that the fragile snow crab industry is in serious decline. First the west coast salmon, the east coast salmon and the cod, and now the snow crabs off Newfoundland.

Y1:

Honourable senators, I have a question for the Leader of the Government in the Senate with regard to the support funding to farmers that has been announced. Most farmers have not received any money yet.

X2: To my pleasant surprise he responded that

he had personally visited those wharves and that he had already announced money to fix them. What wharves did the minister visit in my riding and how much additional funding is he going to provide for Delaps Cove, Hampton, Port Lorne,

· · ·

?

PX = PY

Y2:On the grain transportation system we have

had the Estey report and the Kroeger report. We could go on and on. Recently programs have been announced over and over by the government such as money for the disaster in agriculture on the prairies and across Canada.

· · · Are the pink extracts from the same distribution as the gray ones?

slide-10
SLIDE 10

Detecting statistical dependence, continuous domain

  • How do you detect dependence. . .
  • . . .in a continuous domain?

−1.5 −1 −0.5 0.5 1 1.5 −1.5 −1 −0.5 0.5 1 1.5

X Y Sample from PXY

ր ? ց

Dependent PXY

−1.5 −1 −0.5 0.5 1 1.5 −1.5 −1 −0.5 0.5 1 1.5

Independent PXY=PX PY

−1.5 −1 −0.5 0.5 1 1.5 −1.5 −1 −0.5 0.5 1 1.5

slide-11
SLIDE 11

Detecting statistical dependence, continuous domain

  • How do you detect dependence. . .
  • . . .in a continuous domain?

−1.5 −1 −0.5 0.5 1 1.5 −1.5 −1 −0.5 0.5 1 1.5

X Y Sample from PXY

ր ? ց

Discretized empirical PXY Discretized empirical PX PY

slide-12
SLIDE 12

Detecting statistical dependence, continuous domain

  • How do you detect dependence. . .
  • . . .in a continuous domain?

−1.5 −1 −0.5 0.5 1 1.5 −1.5 −1 −0.5 0.5 1 1.5

X Y Sample from PXY

ր ? ց

Discretized empirical PXY Discretized empirical PX PY

slide-13
SLIDE 13

Detecting statistical dependence, continuous domain

  • How do you detect dependence. . .
  • . . .in a continuous domain?
  • Problem: fails even in “low” dimensions!

[NIPS07a, ALT08]

– X and Y in R4, statistic=Power divergence, samples= 1024, cases where dependence detected=0/500

  • Too few points per bin
slide-14
SLIDE 14

Detecting statistical dependence, discrete domain

  • How do you detect dependence. . .
  • . . .in a discrete domain?

[Read and Cressie, 1988]

X1:

Honourable senators, I have a ques- tion for the Leader of the Government in the Senate with regard to the support funding to farmers that has been announced. Most farmers have not received any money yet.

Y1:

Honorables s´ enateurs, ma question s’adresse au leader du gouvernement au S´ enat et concerne l’aide financi´ ere qu’on a annonc´ ee pour les agriculteurs. La plupart des agriculteurs n’ont encore rien reu de cet argent.

X2:

No doubt there is great pressure on provincial and municipal governments in re- lation to the issue of child care, but the re- ality is that there have been no cuts to child care funding from the federal government to the provinces. In fact, we have increased federal investments for early childhood de- velopment.

· · ·

?

PXY = PXPY

Y2:Il

est ´ evident que les

  • rdres

de gouvernements provinciaux et municipaux subissent de fortes pressions en ce qui con- cerne les services de garde, mais le gou- vernement n’a pas r´ eduit le financement qu’il verse aux provinces pour les services de

  • garde. Au contraire, nous avons augment´

e le financement f´ ed´ eral pour le d´ eveloppement des jeunes enfants.

· · · Are the French text extracts translations of the English ones?

slide-15
SLIDE 15

Detecting a higher order interaction

  • How to detect V-structures with pairwise weak (or nonexistent)

dependence? X Y Z

slide-16
SLIDE 16

Detecting a higher order interaction

  • How to detect V-structures with pairwise weak (or nonexistent)

dependence?

slide-17
SLIDE 17

Detecting a higher order interaction

  • How to detect V-structures with pairwise weak (or nonexistent)

dependence?

  • X ⊥

⊥ Y , Y ⊥ ⊥ Z, X ⊥ ⊥ Z

X vs Y Y vs Z X vs Z XY vs Z

X Y Z

  • X, Y i.i.d.

∼ N(0, 1),

  • Z| X, Y ∼ sign(XY )Exp( 1

√ 2)

Faithfulness violated here

slide-18
SLIDE 18

V-structure Discovery

X Y Z

Assume X ⊥ ⊥ Y has been established. V-structure can then be detected by:

  • CI test: H0 : X ⊥

⊥ Y |Z (Zhang et al 2011) or

slide-19
SLIDE 19

V-structure Discovery

X Y Z

Assume X ⊥ ⊥ Y has been established. V-structure can then be detected by:

  • CI test: H0 : X ⊥

⊥ Y |Z (Zhang et al 2011) or

  • Factorisation test: H0 : (X, Y ) ⊥

⊥ Z ∨ (X, Z) ⊥ ⊥ Y ∨ (Y, Z) ⊥ ⊥ X (multiple two-variable independence tests) – compute p-values for each of the marginal tests for (Y, Z) ⊥ ⊥ X, (X, Z) ⊥ ⊥ Y , or (X, Y ) ⊥ ⊥ Z – apply Holm-Bonferroni (HB) sequentially rejective correction

(Holm 1979)

slide-20
SLIDE 20

V-structure Discovery (2)

  • How to detect V-structures with pairwise weak (or nonexistent)

dependence?

  • X ⊥

⊥ Y , Y ⊥ ⊥ Z, X ⊥ ⊥ Z

X1 vs Y1 Y1 vs Z1 X1 vs Z1 X1*Y1 vs Z1

X Y Z

  • X1, Y1

i.i.d.

∼ N(0, 1),

  • Z1| X1, Y1 ∼ sign(X1Y1)Exp( 1

√ 2)

  • X2:p, Y2:p, Z2:p

i.i.d.

∼ N(0, Ip−1)

Faithfulness violated here

slide-21
SLIDE 21

V-structure Discovery (3)

CI: X ⊥ ⊥Y |Z 2var: Factor

Null acceptance rate (Type II error) V-structure discovery: Dataset A Dimension

1 3 5 7 9 11 13 15 17 19 0.2 0.4 0.6 0.8 1

Figure 1: CI test for X ⊥ ⊥ Y |Z from Zhang et al (2011), and a factorisation test with a HB correction, n = 500

slide-22
SLIDE 22

Outline

  • Intro to reproducing kernel Hilbert spaces (RKHS)
  • An RKHS metric on the space of probability measures

– Distance between means in space of features (RKHS) – Characteristic kernels: feature space mappings of probabilities unique – Nonparametric two-sample test

  • Dependence detection

– Covariance in feature space and test

  • Relation with energy distance and distance covariance
  • Advanced topics

– Interactions with three (or more) variables, conditional indep. test – Optimal kernel choice – Bayesian inference without models

slide-23
SLIDE 23

References

  • T. Read and N. Cressie. Goodness-Of-Fit Statistics for Discrete Multivariate Anal-
  • ysis. Springer-Verlag, New York, 1988.

12-1