Visual Data Mining for Quantized, Spatial Data Amy Braverman Jet - - PowerPoint PPT Presentation

visual data mining for quantized spatial data
SMART_READER_LITE
LIVE PREVIEW

Visual Data Mining for Quantized, Spatial Data Amy Braverman Jet - - PowerPoint PPT Presentation

Visual Data Mining for Quantized, Spatial Data Amy Braverman Jet Propulsion Laboratory California Institute of Technology Mail Stop 169-237 QuickTime and a Microsoft Video 1 decompressor are needed to see this picture. 4800 Oak Grove Drive


slide-1
SLIDE 1

Visual Data Mining for Quantized, Spatial Data

Amy Braverman Jet Propulsion Laboratory California Institute of Technology Mail Stop 169-237 4800 Oak Grove Drive Pasadena, CA 91109-8099

Amy.Braverman@jpl.nasa.gov

QuickTime™ and a Microsoft Video 1 decompressor are needed to see this picture.

slide-2
SLIDE 2

Outline

1. Motivation. 2. Approach. 3. AIRS data collection. 4. Quantization. 5. Visual data mining (I). 6. Visual data mining (II). 7. Hierarchical Quantization. 8. Visual data mining (III). 9. Summary.

slide-3
SLIDE 3
  • Earth Observing System satellites return “massive” data volume.
  • Traditional approach to data exploration: produce maps of one degree

averages and standard deviations for each parameter of interest.

  • Good news: this is easy, practical, and everybody understands it.
  • Bad news: the method throws away almost all of the distributional

information in the data including covariance and higher-order statistics.

  • Need: to “mine” the data, i.e. how do characteristics of joint

distributions change in (time and space) and across resolutions? Characterize forcings and feedbacks.

Motivation

slide-4
SLIDE 4
  • New approach: produce an estimate of the joint (empirical) probability

distribution of variables of interest within each one degree grid cell.

  • Use a clustering algorithm such as K-means to partition data into

groups, represent each group by its centroid and (normalized) membership count.

  • Collection of all 180 x 360 = 64,800 grid cell distribution estimates is a

proxy for the original data.

  • How to find relationships? We need to visualize multivariate

relationships while maintaining spatial context.

Approach

slide-5
SLIDE 5

AIRS Data Collection

QuickTime™ and a YUV420 codec decompressor are needed to see this picture.

slide-6
SLIDE 6

1 3 5 f

  • t

p r i n t s

A I R S G r a n u l e s

1 degree lat/lon 1500 km 2250 km 9 f

  • t

p r i n t s

Geographic space

Nk = 1[x ∈ k]

n=1 N

x1 x2 xN

1 1 1

yK

y1

y 2

N1 N2 NK

yk = 1 Nk xn1[x ∈ k]

n=1 N

X Y = E(X |Y) High-dimensional data space (!)

Quantization

slide-7
SLIDE 7

Visual Data Mining (I)

  • Data: 11 AIRS channels observed over 3 days (July 20-22, 2002).
  • Compare joint distributions among grid cells:
  • Are the grid cell data homogeneous or heterogeneous?
  • What physical processes account for the shapes of the representatives and

the distribution?

  • What physical processes might account for differences between grid cells?
  • Are there “outliers”?
slide-8
SLIDE 8
  • Data in this region: 10,498 clusters representing 60,681 observations.
  • Can we summarize the whole region as one?

Visual Data Mining (II)

slide-9
SLIDE 9

W

Yj = E(X j |Yj) Y = Y jI(V = j)

j=1 4

P(V = j) = N j N j

W = E(Y |W ) X = X j

j=1 4

I(V = j) δ(X j,Yj) = E X j −Yj

2

δ(X,Y) = E X −Y

2

δ(Y,W ) = E Y −W

2

δ(X,W ) = δ(Y,W ) + δ(X j,Y j)

j=1 4

P(V = j)

1 degree 1 degree X1 X2 X3 X 4 Y4 Y3 Y2 Y

1

Y

Hierarchical Quantization

slide-10
SLIDE 10
  • From where do the clusters come?

More questions:

  • How do the distributions change as you move from east to west?

(Suggested approach: subdivide the region into western half and eastern

  • half. Summarize separately and compare to each other and summary of

the whole. Subdivide again, etc.) North to south?

  • What other regions are similar to this one? Are they the ones we expect

based on physics? Does spatial resolution matter for answering the question? If so, how?

  • Where are the regions of high complexity (variability or distribution

entropy)? Do the physics support this?

  • How does the regression of channel 1 on channel 2 change spatially?

Visual Data Mining (III)

slide-11
SLIDE 11

Summary

  • Accept coarser spatial resolution (one degree) to achieve replication and

estimate distributions.

  • Explore quantized data interactively by comparing distributions at different

levels of aggregation and in different locations (and times).

  • We are mining the data, not making inferences. No spatial statistical

models.

  • AIRS data will be available at

http://daac.gsfc.nasa.gov/atmodyn/airs/index.html.

  • More information about AIRS: http://www-airs.jpl.nasa.gov.