SLIDE 1 Gaussian Process Models
- f Spatial Aggregation Algorithms
Naren Ramakrishnan
Virginia Tech Computer Science http://people.cs.vt.edu/~ramakris/
Chris Bailey-Kellogg
Purdue Computer Sciences http://www.cs.purdue.edu/homes/cbk/
SLIDE 2 Big Picture
Spatial Aggregation: generic mechanism for spatial data mining, parameterized by domain knowledge.
classes Equivalence
Spatial N-graph Ambiguities Sample Aggregate Interpolate Localize Redescribe Localize Redescribe Lower-Level Objects
Higher-Level Objects
Abstract Description
Classify
Input Field
Gaussian Processes: generic framework for spatial statistical modeling, parameterized by covariance structure. SA+GP: model the mining mechanism for meta-level reasoning, e.g. targeting samples and characterizing sensitivity to parameters and inputs.
SLIDE 3 Example: Wireless System Configuration
Optimize performance (e.g. signal-to-noise, bit error probability) of wireless system configuration (e.g. distance between antennae). Simulate across range of configurations (hours to days per simulation).
10 20 30 40 10 20 30 40 SNR1, dB SNR2, dB
Aggregate structures in configuration space. In shaded region, 99% confidence that average error is acceptable. Analyze structures to characterize performance. Configs in upper right less sensitive to power imbalance (region width).
SLIDE 4 General Features
Problem: scarce spatial data mining in physical domains
- Expensive data collection. Much implicit but little explicit
data.
- Control over data collection.
- Available physical knowledge — continuity, locality, symmetry,
etc. Approach: multi-level qualitative analysis
- Exploit domain knowledge to uncover qualitative structures in
data.
- Sample optimization driven by model selection — maximize
expected information gain, minimize expense, . . . .
- Decisions explainable in terms of problem structures & physical
knowledge.
SLIDE 5 Mining Mechanism: Spatial Aggregation (SA)
Local operations for finding multi-level structures in spatial data.
Ex: weather maps, numerical simulation output.
- Output: high-level description of
structure, behavior, and design. Ex: fronts, stability regions in dy- namical systems.
- Bridge quantitative ↔ qualita-
tive via increasingly abstract structural descriptions.
- Key domain knowledge: locality
in domain, similarity in feature.
classes Equivalence
Spatial N-graph Ambiguities Sample Aggregate Interpolate Localize Redescribe Localize Redescribe
Lower-Level Objects
Higher-Level Objects
Abstract Description
Classify
Input Field
SLIDE 6
Spatial Aggregation Example
Goal: find flows in vector field (e.g. wind velocity, temp. gradient). (a) Input (b) Localize (distance < r) (c) Test similarity (angle < θ) (d) Select succ (d · distance + angle)
SLIDE 7
(e) Select pred (d · distance + angle) (f) Redescribe (points → curve) (g) Bundle curves by higher-level locality, similarity
SLIDE 8 Reasoning About SA Applications
- Sensitivity to input?
- Sensitivity to parameters (locality, similarity metrics)?
- Optimization of additional samples?
Approach: probabilistic model of spatial relationships, in terms of Gaussian Processes.
classes Equivalence
Spatial N-graph Ambiguities Sample Aggregate Interpolate Localize Redescribe Localize Redescribe
Lower-Level Objects
Higher-Level Objects
Abstract Description
Classify
Input Field
↔
SLIDE 9 Gaussian Processes: Intuition
- 1D version of vector flow analysis:
2 4 6 8 10 12 14 16 18 20 −3 −2 −1 1 2 3 x gradient
Qualitative structure: same-direction flow.
- Regression: given angles at some sample points, predict at new,
unobserved points.
2 4 6 8 10 12 14 16 18 20 −3 −2 −1 1 2 3 x vector angle (values or distributions): radians
Gaussian conditional distribution; covariance structure captures locality.
SLIDE 10
- Classification: apply logistic (higher-D: softmax) function to
estimate latent variable representing class:
2 4 6 8 10 12 14 16 18 20 −3 −2 −1 1 2 3 x gradient
−8 −6 −4 −2 2 4 6 8 10 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
→
2 4 6 8 10 12 14 16 18 20 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
SLIDE 11 GP as Spatial Interpolation (Kriging)
- Given set of observations {(x1, y1), . . . , (xk, yk)} (vector angles
at positions), want to model y = f(x).
- Possible form f(x) = α + Z(x).
- Model Z with Gaussian: mean 0, covariance σ2R.
- Key: structure of R captures neighborhood relationships
among samples. Ex: R(xi, xj) = e−ρ|xi−xj|2
−10 −8 −6 −4 −2 2 4 6 8 10 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 −10 −8 −6 −4 −2 2 4 6 8 10 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
ρ = 0.1 ρ = 1 Note: exact interpolation at data points.
SLIDE 12
- Optimize parameters given observations, to estimate f ′.
Ex: minimize mean squared error E{(f ′ − f)2}: max
ρ
2(ln σ2 + ln |R|)
- where R is k × k symmetric correlation matrix from R.
- One-D optimization straightforward; higher-D requires MCMC.
- Once optimized, prediction for xk+1 is easy, based on
correlation to samples: f ′(xk+1) = ˆ α + rT (xk+1)R−1(y − ˆ αIk) r is correlation vector for xk+1 vs. sample points. ˆ α estimates α: ˆ α = (IT
k R−1Ik)−1IT k R−1y.
Then the estimate’s variance is ˆ σ2 = (y − ˆ αIk)T R−1(y − ˆ αIk) k
SLIDE 13 Gaussian Processes in General
−3 −2 −1 1 2 3 −2 −1 1 2 3 4 5 6
Keys:
- Bayesian modeling, with prior directly on function space.
- Generalize Gaussian distribution over finite vectors to one over
functions, using mean and covariance functions.
- Fully specified by distributions on finite sample sets, so still
- nly perform nice matrix operations.
SLIDE 14 Related Work:
- Rasmussen: unifying framework for multivariate regression.
- Williams and Barber: classification.
- MacKay: pattern recognition.
- Neal: model for neural networks.
- Sacks: model deterministic computer experiments with
stochastic processes.
SLIDE 15 Multi-Layer GP
- SAL programs repeatedly aggregate/classify/redescribe, up an
abstraction hierarchy. → sequence of GP models, each with covariance; superpose for composite.
- Input data field: interpolated surrogate for sparse samples.
- Locality (neighborhood graph — “close enough”) modeled by
R(x(k), x(l)) = ζ
n
e−ρi|x(k)
i
−x(l)
i
|η
- Similarity in feature (equivalence predicate — “good-direction
flow”) only applicable when combined with locality. ⇒ Combined hyperparameters for position and direction. Hierarchical prior allows for determination of relative importance.
SLIDE 16 Case Study: Pocket Identification
Abstract wireless problem with de Boor “pocket” function.
α(X) = cos n
2i
xi | xi |
δ(X) = X − 0.5I p(X) = α(X)(1 − δ2(X)(3 − 2δ(X))) + 1
−1 −0.5 0.5 1 −1 −0.5 0.5 1 −2 −1.5 −1 −0.5 0.5 1
Goal: identify number & locations of pockets (not func. approx.), with minimal # samples.
SLIDE 17 SAL Pocket Finding
−1 −0.5 0.5 1 −1 −0.5 0.5 1
SLIDE 18
Test
Vary parameters (close-enough wrt r, similar-enough angle wrt θ, weight d for combining distance and angle):
r ∈ {1, √ 2, 1.5, √ 3, 2} θ ∈ {0.7, 0.8, 0.85, 0.9, 0.95} d ∈ {0.01, 0.02, 0.03, 0.04, 0.05}
Construct GP (i.e. estimate covariance terms) for flow classes using Neal’s software, hybrid MC.
SLIDE 19 Number of Pockets
- d had little effect in this field, due to symmetry.
- Averaged over d, at varying (r, θ):
1 1.414 1.5 1.732 2 20 40 60 80 100 120 140 # pockets 0.70 0.80 0.85 0.90 0.95
- Abrupt jump at θ = 0.95 — stringent vector similarity.
SLIDE 20 Covariance Contributions
1 1.414 1.5 1.732 2 1 2 3 4 5 covar contrib ρx 0.70 0.80 0.85 0.90 0.95 1 1.414 1.5 1.732 2 1 2 3 4 5 covar contrib ρy 0.70 0.80 0.85 0.90 0.95
- Basically symmetric.
- Increase quadratically with # pockets — can’t stray “too far”
for prediction.
- Characteristic length, 1/ρ, decreases with # pockets —
identified pockets occupy less of the space.
SLIDE 21 Discussion
- Model qualitative spatial data mining with stochastic process
framework, summarizing transformation from input to high-level abstractions.
- Probabilistic basis allows sample optimization, studies of
parameter sensitivity, reasoning about algorithm applicability.
- Next steps: combined modeling of sensitivity to input and
parameters.
- Thanks to Feng Zhao (PARC), Layne T. Watson (Va. Tech).
- Funding: NR (NSF EIA-9974956, EIA-9984317, and
EIA-0103660) and CBK (NSF IIS-0237654).