Gaussian Process Models of Spatial Aggregation Algorithms Naren - - PowerPoint PPT Presentation

gaussian process models of spatial aggregation algorithms
SMART_READER_LITE
LIVE PREVIEW

Gaussian Process Models of Spatial Aggregation Algorithms Naren - - PowerPoint PPT Presentation

Gaussian Process Models of Spatial Aggregation Algorithms Naren Ramakrishnan Virginia Tech Computer Science http://people.cs.vt.edu/~ramakris/ Chris Bailey-Kellogg Purdue Computer Sciences http://www.cs.purdue.edu/homes/cbk/ Big Picture


slide-1
SLIDE 1

Gaussian Process Models

  • f Spatial Aggregation Algorithms

Naren Ramakrishnan

Virginia Tech Computer Science http://people.cs.vt.edu/~ramakris/

Chris Bailey-Kellogg

Purdue Computer Sciences http://www.cs.purdue.edu/homes/cbk/

slide-2
SLIDE 2

Big Picture

Spatial Aggregation: generic mechanism for spatial data mining, parameterized by domain knowledge.

classes Equivalence

  • bjects

Spatial N-graph Ambiguities Sample Aggregate Interpolate Localize Redescribe Localize Redescribe Lower-Level Objects

Higher-Level Objects

Abstract Description

Classify

Input Field

Gaussian Processes: generic framework for spatial statistical modeling, parameterized by covariance structure. SA+GP: model the mining mechanism for meta-level reasoning, e.g. targeting samples and characterizing sensitivity to parameters and inputs.

slide-3
SLIDE 3

Example: Wireless System Configuration

Optimize performance (e.g. signal-to-noise, bit error probability) of wireless system configuration (e.g. distance between antennae). Simulate across range of configurations (hours to days per simulation).

10 20 30 40 10 20 30 40 SNR1, dB SNR2, dB

Aggregate structures in configuration space. In shaded region, 99% confidence that average error is acceptable. Analyze structures to characterize performance. Configs in upper right less sensitive to power imbalance (region width).

slide-4
SLIDE 4

General Features

Problem: scarce spatial data mining in physical domains

  • Expensive data collection. Much implicit but little explicit

data.

  • Control over data collection.
  • Available physical knowledge — continuity, locality, symmetry,

etc. Approach: multi-level qualitative analysis

  • Exploit domain knowledge to uncover qualitative structures in

data.

  • Sample optimization driven by model selection — maximize

expected information gain, minimize expense, . . . .

  • Decisions explainable in terms of problem structures & physical

knowledge.

slide-5
SLIDE 5

Mining Mechanism: Spatial Aggregation (SA)

Local operations for finding multi-level structures in spatial data.

  • Input: numerical field.

Ex: weather maps, numerical simulation output.

  • Output: high-level description of

structure, behavior, and design. Ex: fronts, stability regions in dy- namical systems.

  • Bridge quantitative ↔ qualita-

tive via increasingly abstract structural descriptions.

  • Key domain knowledge: locality

in domain, similarity in feature.

classes Equivalence

  • bjects

Spatial N-graph Ambiguities Sample Aggregate Interpolate Localize Redescribe Localize Redescribe

Lower-Level Objects

Higher-Level Objects

Abstract Description

Classify

Input Field

slide-6
SLIDE 6

Spatial Aggregation Example

Goal: find flows in vector field (e.g. wind velocity, temp. gradient). (a) Input (b) Localize (distance < r) (c) Test similarity (angle < θ) (d) Select succ (d · distance + angle)

slide-7
SLIDE 7

(e) Select pred (d · distance + angle) (f) Redescribe (points → curve) (g) Bundle curves by higher-level locality, similarity

slide-8
SLIDE 8

Reasoning About SA Applications

  • Sensitivity to input?
  • Sensitivity to parameters (locality, similarity metrics)?
  • Optimization of additional samples?

Approach: probabilistic model of spatial relationships, in terms of Gaussian Processes.

classes Equivalence

  • bjects

Spatial N-graph Ambiguities Sample Aggregate Interpolate Localize Redescribe Localize Redescribe

Lower-Level Objects

Higher-Level Objects

Abstract Description

Classify

Input Field

slide-9
SLIDE 9

Gaussian Processes: Intuition

  • 1D version of vector flow analysis:

2 4 6 8 10 12 14 16 18 20 −3 −2 −1 1 2 3 x gradient

Qualitative structure: same-direction flow.

  • Regression: given angles at some sample points, predict at new,

unobserved points.

2 4 6 8 10 12 14 16 18 20 −3 −2 −1 1 2 3 x vector angle (values or distributions): radians

Gaussian conditional distribution; covariance structure captures locality.

slide-10
SLIDE 10
  • Classification: apply logistic (higher-D: softmax) function to

estimate latent variable representing class:

2 4 6 8 10 12 14 16 18 20 −3 −2 −1 1 2 3 x gradient
  • −10
−8 −6 −4 −2 2 4 6 8 10 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

2 4 6 8 10 12 14 16 18 20 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
slide-11
SLIDE 11

GP as Spatial Interpolation (Kriging)

  • Given set of observations {(x1, y1), . . . , (xk, yk)} (vector angles

at positions), want to model y = f(x).

  • Possible form f(x) = α + Z(x).
  • Model Z with Gaussian: mean 0, covariance σ2R.
  • Key: structure of R captures neighborhood relationships

among samples. Ex: R(xi, xj) = e−ρ|xi−xj|2

−10 −8 −6 −4 −2 2 4 6 8 10 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 −10 −8 −6 −4 −2 2 4 6 8 10 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

ρ = 0.1 ρ = 1 Note: exact interpolation at data points.

slide-12
SLIDE 12
  • Optimize parameters given observations, to estimate f ′.

Ex: minimize mean squared error E{(f ′ − f)2}: max

ρ

  • −k

2(ln σ2 + ln |R|)

  • where R is k × k symmetric correlation matrix from R.
  • One-D optimization straightforward; higher-D requires MCMC.
  • Once optimized, prediction for xk+1 is easy, based on

correlation to samples: f ′(xk+1) = ˆ α + rT (xk+1)R−1(y − ˆ αIk) r is correlation vector for xk+1 vs. sample points. ˆ α estimates α: ˆ α = (IT

k R−1Ik)−1IT k R−1y.

Then the estimate’s variance is ˆ σ2 = (y − ˆ αIk)T R−1(y − ˆ αIk) k

slide-13
SLIDE 13

Gaussian Processes in General

−3 −2 −1 1 2 3 −2 −1 1 2 3 4 5 6

Keys:

  • Bayesian modeling, with prior directly on function space.
  • Generalize Gaussian distribution over finite vectors to one over

functions, using mean and covariance functions.

  • Fully specified by distributions on finite sample sets, so still
  • nly perform nice matrix operations.
slide-14
SLIDE 14

Related Work:

  • Rasmussen: unifying framework for multivariate regression.
  • Williams and Barber: classification.
  • MacKay: pattern recognition.
  • Neal: model for neural networks.
  • Sacks: model deterministic computer experiments with

stochastic processes.

slide-15
SLIDE 15

Multi-Layer GP

  • SAL programs repeatedly aggregate/classify/redescribe, up an

abstraction hierarchy. → sequence of GP models, each with covariance; superpose for composite.

  • Input data field: interpolated surrogate for sparse samples.
  • Locality (neighborhood graph — “close enough”) modeled by

R(x(k), x(l)) = ζ

n

  • i=1

e−ρi|x(k)

i

−x(l)

i

  • Similarity in feature (equivalence predicate — “good-direction

flow”) only applicable when combined with locality. ⇒ Combined hyperparameters for position and direction. Hierarchical prior allows for determination of relative importance.

slide-16
SLIDE 16

Case Study: Pocket Identification

Abstract wireless problem with de Boor “pocket” function.

α(X) = cos n

  • i=1

2i

  • 1 +

xi | xi |

  • − 2

δ(X) = X − 0.5I p(X) = α(X)(1 − δ2(X)(3 − 2δ(X))) + 1

−1 −0.5 0.5 1 −1 −0.5 0.5 1 −2 −1.5 −1 −0.5 0.5 1

Goal: identify number & locations of pockets (not func. approx.), with minimal # samples.

slide-17
SLIDE 17

SAL Pocket Finding

−1 −0.5 0.5 1 −1 −0.5 0.5 1

slide-18
SLIDE 18

Test

Vary parameters (close-enough wrt r, similar-enough angle wrt θ, weight d for combining distance and angle):

r ∈ {1, √ 2, 1.5, √ 3, 2} θ ∈ {0.7, 0.8, 0.85, 0.9, 0.95} d ∈ {0.01, 0.02, 0.03, 0.04, 0.05}

Construct GP (i.e. estimate covariance terms) for flow classes using Neal’s software, hybrid MC.

slide-19
SLIDE 19

Number of Pockets

  • d had little effect in this field, due to symmetry.
  • Averaged over d, at varying (r, θ):

1 1.414 1.5 1.732 2 20 40 60 80 100 120 140 # pockets 0.70 0.80 0.85 0.90 0.95

  • Abrupt jump at θ = 0.95 — stringent vector similarity.
slide-20
SLIDE 20

Covariance Contributions

1 1.414 1.5 1.732 2 1 2 3 4 5 covar contrib ρx 0.70 0.80 0.85 0.90 0.95 1 1.414 1.5 1.732 2 1 2 3 4 5 covar contrib ρy 0.70 0.80 0.85 0.90 0.95

  • Basically symmetric.
  • Increase quadratically with # pockets — can’t stray “too far”

for prediction.

  • Characteristic length, 1/ρ, decreases with # pockets —

identified pockets occupy less of the space.

slide-21
SLIDE 21

Discussion

  • Model qualitative spatial data mining with stochastic process

framework, summarizing transformation from input to high-level abstractions.

  • Probabilistic basis allows sample optimization, studies of

parameter sensitivity, reasoning about algorithm applicability.

  • Next steps: combined modeling of sensitivity to input and

parameters.

  • Thanks to Feng Zhao (PARC), Layne T. Watson (Va. Tech).
  • Funding: NR (NSF EIA-9974956, EIA-9984317, and

EIA-0103660) and CBK (NSF IIS-0237654).