MultivariateAnalysis MultivariateAnalysis AUnifiedPerspective - - PowerPoint PPT Presentation

multivariate analysis multivariate analysis
SMART_READER_LITE
LIVE PREVIEW

MultivariateAnalysis MultivariateAnalysis AUnifiedPerspective - - PowerPoint PPT Presentation

MultivariateAnalysis MultivariateAnalysis AUnifiedPerspective AUnifiedPerspective HarrisonB.Prosper FloridaStateUniversity AdvancedStatisticalTechniquesinParticlePhysics


slide-1
SLIDE 1

MultivariateAnalysisHarrisonB.ProsperDurham,UK2002 1

MultivariateAnalysis MultivariateAnalysis

AUnifiedPerspective AUnifiedPerspective HarrisonB.Prosper

FloridaStateUniversity AdvancedStatisticalTechniquesinParticlePhysics Durham,UK,20March2002

slide-2
SLIDE 2

MultivariateAnalysisHarrisonB.ProsperDurham,UK2002 2

Outline Outline

Introduction SomeMultivariateMethods FisherLinearDiscriminant (FLD) PrincipalComponentAnalysis (PCA) IndependentComponentAnalysis (ICA) SelfOrganizingMap (SOM) RandomGridSearch (RGS) ProbabilityDensityEstimation (PDE) ArtificialNeuralNetwork (ANN) SupportVectorMachine (SVM) Comments Summary

slide-3
SLIDE 3

MultivariateAnalysisHarrisonB.ProsperDurham,UK2002 3

Introduction Introduction– – i i

Multivariateanalysisishard! Ourmathematicalintuitionbasedonanalysisinone dimensionoftenfailsratherbadlyforspacesofvery highdimension. Oneshoulddistinguishtheproblemtobesolvedfromthe algorithmtosolveit. Typically,theproblemstobesolved,whenviewedwith sufficientdetachment,arerelativelyfewinnumber whereasalgorithmstosolvethemareinventedeveryday.

slide-4
SLIDE 4

MultivariateAnalysisHarrisonB.ProsperDurham,UK2002 4

Introduction Introduction– – ii ii

Sowhybotherwithmultivariateanalysis? Because: Thevariablesweusetodescribeeventsare usuallystatisticallydependent. Therefore,theN-ddensityofthevariables containsmoreinformationthaniscontainedin thesetof1-dmarginaldensitiesfi(xi). Thisextrainformationmaybeuseful

slide-5
SLIDE 5

MultivariateAnalysisHarrisonB.ProsperDurham,UK2002 5

0.1 0.2 0.3

✂✁ ✄ ✁

105

☎ ✆
  • 1
✝✞

tt

  • 7
✟ ✆
  • 1

0.1 0.2 0.3 100 200 300 400

✝✂✠ ✡ ✄ ☛☞✂✌ ✄

700

☎ ✆
  • 1

H (GeV) Aplanarity

100 200 300 400

✍ ✎☞✂✌ ✄ ✏ ✝✞

W 385

☎ ✆
  • 1

T

jets l t t p p + → →

Dzero1995 Top Discovery

slide-6
SLIDE 6

MultivariateAnalysisHarrisonB.ProsperDurham,UK2002 6

Introduction Introduction-

  • iii

iii

Problemsthatmaybenefitfrommultivariate analysis: Signaltobackgrounddiscrimination Variableselection(e.g.,togivemaximum signal/backgrounddiscrimination) Dimensionalityreductionofthefeature space Findingregionsofinterest inthedata Simplifyingoptimization(by) Modelcomparison Measuringstuff(e.g.,tanβ inSUSY)

1

: U f

N →

slide-7
SLIDE 7

MultivariateAnalysisHarrisonB.ProsperDurham,UK2002 7

FisherLinear FisherLinearDiscriminant Discriminant

Purpose Signal/backgrounddiscrimination

( ) ( )

b x w x g x g + ⋅ → − → Σ Σ ) ( ) ( , | , | log

1 2 2 2 2 1

µ χ µ χ µ µ

gisaGaussian

w

> + ⋅ b x w < + ⋅ b x w

slide-8
SLIDE 8

MultivariateAnalysisHarrisonB.ProsperDurham,UK2002 8

PrincipalComponentAnalysis PrincipalComponentAnalysis

Purpose Reducedimensionality

  • fdata
  • =

=

K i i w

d w

1 2 1

) ( max arg

1st principalaxis

2 1 1 1 2

))] ( ( [ max arg w d w x w w

i K i i

  • =

− ⋅ =

  • x2

x1 di

i i

x w d

= w

i

x

  • 1

= w

2nd principalaxis

slide-9
SLIDE 9

MultivariateAnalysisHarrisonB.ProsperDurham,UK2002 9

PCAalgorithminpractice PCAalgorithminpractice

TransformfromX=(x1,..xN)T toU=(u1,..uN)T in whichlowestordercorrelationsareabsent. ComputeCov(X) Computeitseigenvalues λ λ λ λ λ λ λ λi

i andeigenvectorsvi

ConstructmatrixT=Col(vi)T U=TX Typically,oneeliminatesui withsmallestamount

  • fvariation
slide-10
SLIDE 10

MultivariateAnalysisHarrisonB.ProsperDurham,UK2002 10

IndependentComponentAnalysis IndependentComponentAnalysis

Purpose Findstatisticallyindependentvariables. Dimensionalityreduction BasicIdea AssumeX =(x1,..,xN)T isalinearsumX=AS

  • findependentsourcesS=(s1,..,sN)T.BothA,

themixingmatrix,andS areunknown. Findade-mixing matrixT suchthatthe componentsofU=TX arestatistically independent

slide-11
SLIDE 11

MultivariateAnalysisHarrisonB.ProsperDurham,UK2002 11

ICA ICA-

  • Algorithm

Algorithm

) ( ) ( log ) ( ) | ( ≥

dU U g U f U f g f K

Giventwodensitiesf(U) andg(U) onemeasureoftheir“closeness” istheKullback-Leiblerdivergence whichiszero if,andonlyif,f(U)=g(U). Weset

=

i i i u

f U g ) ( ) (

andminimizeK(f|g) (nowcalledthemutual information)withrespecttothede-mixingmatrixT.

slide-12
SLIDE 12

MultivariateAnalysisHarrisonB.ProsperDurham,UK2002 12

SelfOrganizingMap SelfOrganizingMap

Purpose Findregionsofinterestindata;thatis,clusters. Summarizedata BasicIdea(Kohonen,1988) MapeachofK featurevectorsX =(x1,..,xN)T intooneofMregionsofinterestdefinedbythe vectorwm sothatallX mappedtoagivenwm areclosertoitthantoallremainingwm. Basically,performacoarse-grainingofthe featurespace.

slide-13
SLIDE 13

MultivariateAnalysisHarrisonB.ProsperDurham,UK2002 13

x

y

Applycutsat eachgridpoint Applycutsat eachgridpoint

GridSearch GridSearch x x y y

i i

> >

Wereferto asacut-point Wereferto asacut cut-

  • point

point

( , ) x y

i i

Numberofcut-points~ Nbin

Ndim

Numberofcut-points~ Nbin

Ndim

Purpose:Signal/Backgrounddiscrimination

slide-14
SLIDE 14

MultivariateAnalysisHarrisonB.ProsperDurham,UK2002 14

RandomGridSearch RandomGridSearch

x

y

Signalfraction Backgroundfraction 1 1

Ntot =#eventsbeforecuts Ncut =#eventsaftercuts Fraction =Ncut/Ntot Ntot =#eventsbeforecuts Ncut =#eventsaftercuts Fraction =Ncut/Ntot Takeeachpointof thesignalclassas acut-point Takeeachpoint eachpointof thesignalclassas acut acut-

  • point

point

x x y y

i i

> >

H.B.P.etal,Proceedings,CHEP1995

slide-15
SLIDE 15

MultivariateAnalysisHarrisonB.ProsperDurham,UK2002 15

ProbabilityDensityEstimation ProbabilityDensityEstimation

Purpose Signal/backgrounddiscrimination Parameterestimation BasicIdea Parzen Estimation(1960s) Mixtures

N n h x x h N x p

n n d

≤ ≤

=

  • 1

1 1 ) ( ϕ N j j q j x x p

j

<< = ) ( ) | ( ) ( ϕ

slide-16
SLIDE 16

MultivariateAnalysisHarrisonB.ProsperDurham,UK2002 16

ArtificialNeuralNetworks ArtificialNeuralNetworks

Purpose Signal/backgrounddiscrimination Parameterestimation Functionestimation Densityestimation BasicIdea Encodemapping(Kolmogorov,1950s). Usingasetof1-Dfunctions.

] ,.., [ ) ( :

1 K M N

F x f U U f ϕ ϕ = →

slide-17
SLIDE 17

MultivariateAnalysisHarrisonB.ProsperDurham,UK2002 17

Feedforward Feedforward Networks Networks

) ) ( ( ) , (

5 1

θ + =

= i i i

a f w f w x n

) (

2 1 i i j j ij i

a f x w a → + =

=

θ

Inputnodes Hiddennodes Outputnode

f(a) a

  • 1

x

2

x

ij

w

i

w

i

θ θ

slide-18
SLIDE 18

MultivariateAnalysisHarrisonB.ProsperDurham,UK2002 18

Minimizetheempiricalriskfunction withrespecttoω

=

i i i N

x n t R

2 1

)] , ( [ ) ( ω ω

ANN ANN-

  • Algorithm

Algorithm

Solution(forlargeN)

dt x t p x t x n ) | ( ) ( ) , ( ω

  • =

k

k p k x p k p k x p x k p x n ) ( ) | ( / ) ( ) | ( ) | ( ) , ( ω

Ift(x)=kδ[1−I(x)],whereI(x)=1ifxisofclassk,0otherwise

D.W. Ruck etal.,IEEETrans.NeuralNetworks1(4),296-298(1990) E.A.Wan,IEEETrans.NeuralNetworks1(4),303-305(1990)

slide-19
SLIDE 19

MultivariateAnalysisHarrisonB.ProsperDurham,UK2002 19

SupportVectorMachines SupportVectorMachines

Purpose Signal/backgrounddiscrimination BasicIdea Datathatarenon-separableinN-dimensions haveahigherchanceofbeingseparableif mappedintoaspaceofhigherdimension Usealineardiscriminant topartitionthehigh dimensionalfeaturespace.

b x w x D + ⋅ = ) ( ) ( ϕ

Huge N

ℜ → ℜ : ϕ

slide-20
SLIDE 20

MultivariateAnalysisHarrisonB.ProsperDurham,UK2002 20

SVM SVM– – KernelTrick KernelTrick

Orhowtocopewithapossiblyinfinitenumberofparameters! Orhowtocopewithapossiblyinfinitenumberofparameters!

) , , ( ) , ( :

3 2 1 2 1

z z z x x → ϕ

x1 x2 z1 z2 z3

b x x y x D

j i i

+ ⋅ = )] ( ) ( [ ) ( ϕ ϕ α b x w x D + ⋅ = ) ( ) ( ϕ ) ( ) ( ) , (

j j

x x x x K ϕ ϕ ⋅ ≡

Trydifferent becausemappingunknown! y=−1 y=+1

slide-21
SLIDE 21

MultivariateAnalysisHarrisonB.ProsperDurham,UK2002 21

Comments Comments– – i i

Everyclassificationtasktriestosolvesthesame fundamentalproblem,whichis: Afteradequatelypre-processingthedata …findagood,andpractical,approximationto theBayes decisionrule:GivenX,ifP(S|X)> P(B|X) ,choosehypothesisS otherwisechoose B. Ifweknewthedensitiesp(X|S)andp(X|B)and thepriorsp(S)andp(B)wecouldcomputethe Bayes Discriminant Function (BDF): D(X)=P(S|X)/P(B|X)

slide-22
SLIDE 22

MultivariateAnalysisHarrisonB.ProsperDurham,UK2002 22

Comments Comments– – ii ii

TheFisherdiscriminant (FLD),randomgrid search(RGS),probabilitydensityestimation (PDE),neuralnetwork(ANN)andsupportvector machine(SVM)aresimplydifferentalgorithmsto approximatetheBayes discriminant function D(X),orafunctionthereof. Itfollows,therefore,thatifamethodisalready closetotheBayes limit,thenno othermethod, howeversophisticated,canbeexpectedtoyield dramaticimprovements.

slide-23
SLIDE 23

MultivariateAnalysisHarrisonB.ProsperDurham,UK2002 23

Summary Summary

Multivariateanalysisishard,butusefulifitis importanttoextractasmuchinformationfromthe dataaspossible. Forclassificationproblems,thecommonmethods providedifferentapproximationstotheBayes discriminant. Thereisconsiderablyempiricalevidencethat,as yet,nouniformlymostpowerfulmethodexists. Therefore,bewaryofclaimstothecontrary!