Active Data Mining of Correspondence for Qualitative Assessment of - - PowerPoint PPT Presentation

active data mining of correspondence for qualitative
SMART_READER_LITE
LIVE PREVIEW

Active Data Mining of Correspondence for Qualitative Assessment of - - PowerPoint PPT Presentation

Active Data Mining of Correspondence for Qualitative Assessment of Scientific Computations Chris Bailey-Kellogg Purdue Computer Sciences http://www.cs.purdue.edu/homes/cbk/ Naren Ramakrishnan Virginia Tech Computer Science


slide-1
SLIDE 1

Active Data Mining of Correspondence for Qualitative Assessment

  • f Scientific Computations

Chris Bailey-Kellogg

Purdue Computer Sciences http://www.cs.purdue.edu/homes/cbk/

Naren Ramakrishnan

Virginia Tech Computer Science http://people.cs.vt.edu/~ramakris/

slide-2
SLIDE 2

Data-Driven Characterization

  • f Scientific Computations
  • Choice of solver depends on problem characteristics (e.g. matrix

sensitivity) and algorithm performance (e.g. convergence).

  • Empirical characterization (rather than analytical) appropriate

under imperfect domain knowledge, lack of theory. Low-level computational experiments → high-level properties.

  • Example: spectral portrait illustrates eigenstructure under

perturbations of different magnitudes.

0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 −2.5 −2 −1.5 −1 −0.5 0.5 1 1.5 2 2.5

2 2 2 3 3 3 4 4 4 5 5 5 5 5 5 5 6 6 6 6 6 6 7 7 7 7 7 8 8 8 8 8 9 9 9 9 10 10 10

Eigenvalues inside given curve are indistinguishable under perturbation of that magnitude. Suggests numerical precision necessary.

slide-3
SLIDE 3

Active Data Mining with SAL

  • Spatial aggregation (bottom-up):

uniform operators and data types for extracting multi-layer struc- tures in spatial data.

  • Ambiguity-directed

sampling (top-down): focus data collection

  • n difficult choice points.
  • Underlying

domain knowledge: continuity, locality.

classes Equivalence

  • bjects

Spatial N-graph Ambiguities Sample Aggregate Interpolate Localize Redescribe Localize Redescribe

Lower-Level Objects

Higher-Level Objects

Abstract Description

Classify

Input Field

slide-4
SLIDE 4

Simple example: Input points (values not shown) Aggregate (localize computation)

−2 −1 1 2 3 4 5 6 7 −3 −2 −1 1 2 3 −2 −1 1 2 3 4 5 6 7 −3 −2 −1 1 2 3

Classify (group connected points with similar-enough value)

−2 −1 1 2 3 4 5 6 7 −3 −2 −1 1 2 3

slide-5
SLIDE 5

Correspondence Extension to SAL

  • Key idea: identify mutually-reinforcing relationships among

features of spatial objects, in order to combat noise and sparsity.

  • SAL particularly conducive: aggregation composes hierarchical

spatial objects:

slide-6
SLIDE 6
  • Mechanism:
  • 1. Establish analogy as relation among lower-level constituents of

higher-level objects. Ex: adjacent points of neighboring iso-contours

−2 −1 1 2 3 4 5 6 7 −3 −2 −1 1 2 3

  • 2. Abstract lower-level analogy into higher-level correspondence.

Ex: parameterized curve deformation.

  • Bridge lower-/higher-level gap: analogy’s meaning derived from

higher-level context; abstraction enables computation of global properties (containment, breaks, overall quality).

  • Directly usable in ambiguity-directed sampling to address

difficulties in correspondence.

slide-7
SLIDE 7

Application 1: Matrix Spectral Portrait Analysis

Spectral portrait for matrix A plots complex map: P(z) = log10 A2 (A − zI)−12

0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 −2.5 −2 −1.5 −1 −0.5 0.5 1 1.5 2 2.5

2 2 2 3 3 3 4 4 4 5 5 5 5 5 5 5 6 6 6 6 6 6 7 7 7 7 7 8 8 8 8 8 9 9 9 9 10 10 10

Singularities at eigenvalues; level curves capture “equivalent” eigenvalues wrt perturbations (i.e. curve k contains eigenvalues of all perturbed matrices A + E for E a matrix with E2 ≤ kA2). Perturbation-equivalence indicates sensitivity to numerical error. Ex: 2 & 3 most sensitive, then 4, then 1.

slide-8
SLIDE 8

Correspondence-Based Merge Identification

Approach: compute merge tree, indicating perturbation levels at which eigenvalues become indistinguishable, by finding correspondences among curves.

  • 1. Sample perturbation levels on regular grid; interpolate

iso-curves.

0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 −1 −0.8 −0.6 −0.4 −0.2 0.2 0.4 0.6 0.8 1 −2 −1 1 2 3 4 5 6 7 −3 −2 −1 1 2 3

slide-9
SLIDE 9
  • 2. Aggregate curve points in Delaunay triangulation.

0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 −1 −0.8 −0.6 −0.4 −0.2 0.2 0.4 0.6 0.8 1 −2 −1 1 2 3 4 5 6 7 −3 −2 −1 1 2 3

  • 3. Analogy: cross-curve edges in triangulation.

1.5 2 2.5 3 3.5 4 4.5 5 −1 −0.8 −0.6 −0.4 −0.2 0.2 0.4 0.6 0.8 1 1.5 2 2.5 3 3.5 4 4.5 −1 −0.8 −0.6 −0.4 −0.2 0.2 0.4 0.6 0.8 1

slide-10
SLIDE 10
  • 4. Correspondence abstraction: merge events in tree.

(1,0) (2,0) (3,0) (4,0) 13 13 13 13 12 12 12 12 11 11 11 11 10 10 10 10 9 9 9 9 8 8 7 7 6 6 5 4 3 2 (1,0) (2,0) (3,0) (4,0) 13 13 13 13 12 12 12 12 11 11 11 11 10 10 10 10 9 9 9 9 8 8 8 7 7 6 6 5 4 3 2 1

  • 5. Evaluate confidence in correspondence: fractions of points

matched, angular separation between “separating” samples.

  • 6. Sample to ensure curve locations adequately constrained by

separating samples (so couldn’t have merged at smaller perturbation level).

slide-11
SLIDE 11

Results

  • (2n − 3)!! possible binary merge trees; most not explicitly

considered (would be low confidence).

  • Initial grid: one sample between each eigenvalue, one unit

larger than bounding box.

  • Subsample or expand grid when merge events poorly separated.
  • Tested on variety of polynomial companion matrices: different

numbers / spacings of roots.

  • High-confidence model selection after 1–3 subsamples, 1–3 grid

expansions.

  • Substantially less computation than “one-size-fits-all”;

confidence metric and explainability.

slide-12
SLIDE 12

Application 2: Matrix Jordan Form

Jordan form analysis:

  • Input: matrix A of dimension n, r ≤ n independent

eigenvectors with eigenvalues λi of multiplicity ρi.

  • Jordan decomposition: r upper triangular “blocks”

B−1AB =        J1 J2 · Jr        , Ji =        λi 1 λi 1 · 1 λi       

  • Typical algorithms numerically unstable.
slide-13
SLIDE 13

Graphical Analysis of Jordan Form

  • Infer multiplicity by eigenvalue perturbations:

λi + |δ|

1 ρi e iφ ρi

  • Phase φ of perturbation δ ranges over multiples of π

⇒ computed values are vertices of regular 2ρi-gon, centered on λi, with diameter from |δ|.

  • Ex: 8-by-8 Brunet matrix with structure (−1)1(−2)1(7)3(7)3,

focusing on Jordan block for first (7)3:

6.998 6.9985 6.999 6.9995 7 7.0005 7.001 7.0015 7.002 −1.5 −1 −0.5 0.5 1 1.5 x 10

−3

slide-14
SLIDE 14

Correspondence-Based Symmetry Analysis

Approach: compute Jordan structure by identifying portrait symmetry (i.e. auto-correspondence), abstracting as rotation by π/ρ around eigenvalue.

  • 1. Sample points by random normwise perturbation at

magnitude(s) of interest.

6.9 7 7.1 −0.1 0.1 6.9 7 7.1 −0.1 0.1

  • 2. Aggregate triples into triangles.
slide-15
SLIDE 15
  • 3. Analogy among triangle vertices by congruence (computed via

geometric hashing).

6.9 7 7.1 −0.1 0.1 6.9 7 7.1 −0.1 0.1

  • 4. Correspondence as rotation (x, y, θ) overlaying vertices of

congruent triangles.

6.9 7 7.1 −0.1 0.1 6.9 7 7.1 −0.1 0.1

Eigenvalue=(7.00,0.00); rotation=60.46◦; ρ=3 Eigenvalue=(7.00,0.00); rotation=60.13◦; ρ=3

slide-16
SLIDE 16
  • 5. Evaluate confidence in correspondence: distance between

points and partners, regularity of sides of polygon.

  • 6. Sample when entropy of models is high.
slide-17
SLIDE 17

Results

  • 10 matrices; 4-10 perturbation levels; 6-8 samples each round.
  • Vary # models generated by varying congruence tolerance.
  • Three sample collection policies:
  • 1. Collect at same level: 1.0–2.7 rounds
  • 2. Collect at next higher level: better when #1 uses low level.
  • 3. Collect at same level until begin “hallucinating”: better for

Brunet-type matrices.

  • Symmetry quickly eliminates bad models.
  • No real advantage to varying perturbation level (independent

estimates, irresp. of level).

  • Small amount of computation required for high-confidence

assessment of Jordan form.

slide-18
SLIDE 18

Some Related Work

  • F. Chaitin-Chatelin and V. Frayss´

e: graphical analysis of scientific computations (spectral portraits).

  • A. Edelman and Y. Ma: Jordan perturbation phenomena.
  • X. Huang and F. Zhao: correspondence in weather data

iso-contours.

  • Lots of work in vision on computing and tracking

correspondence.

  • D.A. Cohn, Z. Chahramani, and M.I. Jordan: active learning.
slide-19
SLIDE 19

Discussion

  • Correspondence mechanism within Spatial Aggregation

leverages hierarchical spatial objects and relationships.

  • First systematic algorithms for performing complete imagistic

analyses (not relying on human visual inspection) of matrix eigenstructure.

  • Efficient, focused sampling and iterative model evaluation until

high confidence obtained.

  • Overcome noise and sparsity by utilizing locality and

continuity to identify mutually-reinforcing interpretations.

  • Many thanks to reviewers!
  • Funding: CBK (NSF IIS-0237654) and NR (NSF EIA-9974956,

EIA-9984317, and EIA-0103660).