A Plot for Visualizing Multivariate Data Rida E. A. Moustafa - - PowerPoint PPT Presentation

a plot for visualizing multivariate data
SMART_READER_LITE
LIVE PREVIEW

A Plot for Visualizing Multivariate Data Rida E. A. Moustafa - - PowerPoint PPT Presentation

A Plot for Visualizing Multivariate Data Rida E. A. Moustafa George Mason University ADM Group,AAL rmoustaf@galaxy.gmu.edu rmustafa@aalcpas.com Talk Outline The Theory of MV-Plot. Detecting Linear Structures with MV-plot.


slide-1
SLIDE 1

A Plot for Visualizing Multivariate Data

Rida E. A. Moustafa George Mason University ADM Group,AAL rmoustaf@galaxy.gmu.edu rmustafa@aalcpas.com

slide-2
SLIDE 2

Talk Outline

The Theory of MV-Plot. Detecting Linear Structures with MV-plot. Detecting Non-Linear Structures with MV-plot. Comparisons with other methods and application on real data.

slide-3
SLIDE 3

MV-Plot Theory

∑ ∑

= =

− = = = =

d j j d d j j d

x f x x f x g v x x f m

1 2 1 1 1

| ) ( | )) ( , ( | | ) ( Given an observation x=(x1,x2,…,xd) We define m and v as follows: Computing m and v for every observation produces vector of m and v. What is the relationship between m and v?

slide-4
SLIDE 4

MV-Relationship in 2-d

2 1 2 1 2 2 1 2 1 2 1 2 1 2 1 2 1

| | |) | | (| | |

i i i j ij i i i j ij i

x x m x v x x x m − = − = + = =

∑ ∑

= =

  • Normalizing the data in range (0,1) avoid the abs-value in computing m.
  • Close to the PC in 2-d
slide-5
SLIDE 5

MV- detects linear structure(s)

1 1 1 1 1 1 1 1 1 1 2 1 1 1 2 1 1 1 2

; ; ) 1 ( ) 1 ( if ) 1 ( ; ) 1 ( a x a v a x a m a w a w w w w x w v w x w m w x w x

i i i i i i i i i i

+ = + = ⇒ = = − ≈ + ⇒ + + + − = + + = ⇒ + =

If the data is linear in the original space It will be linear in the MV-space!!

slide-6
SLIDE 6

MV- detects linear structure(s)

      − + − − =       + + =

∑ ∑

− = − − = 1 1 1 1 1 1

) 1 ( ) 1 ) 1 ( ) 1 (

2

d j ij j d d j d j ij j d j

w d x w d v w x w m

      + ≈       + =

∑ ∑

− = − = 1 1 1 1 d j ij j j d j ij j j

a x a v a x a m

slide-7
SLIDE 7

Detecting Linear structure(s) Example I

slide-8
SLIDE 8

Detecting Linear structure(s) Example II

slide-9
SLIDE 9

Detecting Linear structure(s) Example III

slide-10
SLIDE 10

Detecting nonlinear data with MV-plot

MV- plot can detect nonlinear structure

in the data set without any changes in the equations.

slide-11
SLIDE 11

Detecting nonlinear structure

| ) sin( | ), sin( ) sin( , | ) cos( | ), cos( ) cos( , x x v x x m x x x x v x x m x x − = + = → − = + = →

slide-12
SLIDE 12

Detecting Sphere(s)

( ) ( )

.

2

2 2 1 2 2 1 2 1 1 2 d R i i d j i ij d d j i ij d i

m v dm x m x v = + ⇒ − = − =

∑ ∑

= =

Case I:

  • The sphere radius R
  • The sphere center is the origin
slide-13
SLIDE 13

Detecting Sphere(s)

( ) ( )

. ) ( ) (

2

2 2 1 2 2 1 2 1 1 2 d R i i d j i c j c j ij d d j i c j c j ij d i

m v m x d x x m x x x v = + ⇒ − − − = = − + − =

∑ ∑

= =

Case II:

  • The sphere radius R
  • The sphere center is not the origin
slide-14
SLIDE 14

Detecting Sphere(s)

slide-15
SLIDE 15

Fisher’s IRIS data (150x4)

3-classes of( 50 point each)

Process control data (600x60)

6-classes of (100 points each)

Pollen data (3,848x5) (Wegman’s data)

2-classes (linear and nonlinear)

Application on Real data

slide-16
SLIDE 16

Multidimensional Scaling Fisher Discriminate Analysis Principal Component

Related Dimensional Reduction Methods

slide-17
SLIDE 17

IRIS (R. A. Fisher) Dataset 150-cases in 4-dim

slide-18
SLIDE 18

Time Series Dataset

600-cases in 60-dim

slide-19
SLIDE 19

Pollen dataset

3,848-points in 5-dim

Other methods: Require more storage and speed. Even if it work, we expect bad results on this particular data. (Wegman2002)

slide-20
SLIDE 20

Pollen dataset

Linear and Nonlinear mixed structures.

slide-21
SLIDE 21

The linear structure in the Pollen data set

17+16+18+17+14+16=98 Linear, 3750 nonlinear

slide-22
SLIDE 22

Summary

MV-algorithm can discover the linear and

nonlinear pattern at the same time.

MV-algorithm can discover symmetric data. MV-algorithm deals with large multivariate

data.