Data Mining in Aeronautics, Science, and Exploration Systems 2007 - - PDF document

data mining in aeronautics science and exploration
SMART_READER_LITE
LIVE PREVIEW

Data Mining in Aeronautics, Science, and Exploration Systems 2007 - - PDF document

Data Mining in Aeronautics, Science, and Exploration Systems 2007 Conference June 26-27, 2007 Computer History Museum Mountain View, California, USA Sponsored by NASA Engineering and Safety Center Science Mission Directorate Aeronautics


slide-1
SLIDE 1

Data Mining in Aeronautics, Science, and Exploration Systems 2007 Conference

June 26-27, 2007 Computer History Museum Mountain View, California, USA

Sponsored by NASA Engineering and Safety Center Science Mission Directorate Aeronautics Research Mission Directorate - IVHM

slide-2
SLIDE 2

Data Mining in Aeronautics, Science, and Exploration Systems 2007 Conference

Computer History Museum Mountain View, CA June 26-27, 2007

Numerous disciplines, including aeronautics, physical sciences, and space exploration, have benefited from recent advances in data and text mining, machine learning, and

  • statistics. The Data Mining in Aeronautics, Science, and Exploration Systems

(DMASES) 2007 conference provides the data mining community with an opportunity to share these advances across the larger communities of engineers and scientists working in aeronautics, aerospace, and science. This single-track conference features in-depth lectures, tutorials, discussion, and a poster session.

Conference Organizers Session Chairs

Ashok N. Srivastava, Ph.D. Kevin H. Knuth, Ph.D. (Sciences) Intelligent Systems Division Department of Physics NASA Ames Research Center State University of New York, Albany Dawn M. McIntosh Michael D. New, Capt., Ph.D. (Aeronautics) Intelligent Systems Division Delta Airlines, Inc. NASA Ames Research Center Bob Beil Anindya Ghoshal, Ph.D. (Exp. Systems) Systems Engineering Office United Technologies Research Center NASA Engineering and Safety Center United Technologies Corp.

slide-3
SLIDE 3

Conference Agenda

Tuesday, June 26

8:00 AM REGISTRATION 8:30 AM Morning Announcements/Introductions 8:35 AM Mining Future Datascapes - Srivastava/NASA Ames Research Center 9:15 AM Ascent Summary Data Analysis Tool for Shuttle Wing Leading Edge Impact Detection - McIntosh/NASA Ames Research Center Exploration Systems Session 9:35 AM Distributed Mobility Management for Target Tracking in Mobile Sensor Networks - Chakrabarty/Duke University 10:20 AM * break * 10:45 AM A Structural Neural System for Data Mining and Anomaly Detection - Schulz/University of Cincinnati 11:25 AM Current Trends in Performance Prognostics Using Integrated Simulation and Sensors - Baca/Sandia National Laboratories 12:25 PM * Poster Session/Lunch * Sciences Session 2:00 PM Problem Solving Strategies: Sampling & Heuristics - Knuth/State University of New York, Albany 2:20 PM Making the Sky Searchable: Rapid Indexing for Automated Astrometry - Roweis/Google 2:30 PM Bayesian Analysis of the Cosmic Microwave Background - Jewell/NASA Jet Propulsion Laboratory 3:00 PM Efficient & Stable Gaussian Process Calculations - Foster/San Jose State University 3:30 PM * break * 4:00 PM Understanding Large-Scale Structure in Earth Science Remote Sensing Data Sets - Braverman/NASA Jet Propulsion Laboratory 4:30 PM Data-driven Modeling for Understanding Climate-Vegetation Interactions - Nemani/NASA Ames Research Center 5:00 PM END

slide-4
SLIDE 4

Wednesday, June 27

8:00 AM REGISTRATION 8:30 AM Morning Announcements 8:35 AM Tutorial, session I - Principles of Bayesian Methods - Sansó/University

  • f California, Santa Cruz

10:00 AM * break * 10:30 AM Tutorial, session II - Principles of Bayesian Methods - Sansó/University

  • f California, Santa Cruz

12:30 PM * Collaboration Discussions & Networking/Lunch * Aeronautics Session 1:30 PM National Aeronautics Research & Development Policy – Overview and Outreach - Schlickenmaier/NASA Headquarters 2:00 PM Applying Knowledge Representation to Runway Incursion - Wilczynski/University of Southern California 3:00 PM The Role of Data Mining in Aviation Safety Decision Making - McVenes/Air Line Pilots Association, International 3:30 PM * break * 4:00 PM Sifting NOAA Archived ACARS Data for Wind Variation to Improve Traffic Efficiency - Ren/Georgia Institute of Technology 4:30 PM Data & Text Mining in Boeing - Kao/Boeing Phantom Works 5:00 PM Concluding Remarks - Srivastava 5:10 PM END

slide-5
SLIDE 5

Invited Presentations

Conference Coordinator Presentations

Mining Future Datascapes Ashok Srivastava, NASA Ames Research Center Ascent Summary Data Analysis Tool for Shuttle Wing Leading Edge Impact Detection Dawn McIntosh, NASA Ames Research Center NASA Engineering and Safety Center Data Mining and Trending Working Group Bob Beil, NASA Engineering and Safety Center

Tuesday, June 26

Distributed Mobility Management for Target Tracking in Mobile Sensor Networks Krishnendu Chakrabarty, Duke University A Structural Neural System for Data Mining and Anomaly Detection Mark Schulz, University of Cincinnati Current Trends in Performance Prognostics Using Integrated Simulation and Sensors Thomas J. Baca, Sandia National Laboratories Problem Solving Strategies: Sampling and Heuristics Kevin Knuth, SUNY Albany Making the Sky Searchable: Rapid Indexing for Automated Astronomy Sam Roweis, Google Bayesian Analysis of the Cosmic Microwave Background Jeff Jewell, NASA Jet Propulsion Laboratory Efficient and Stable Gaussian Process Calculations Leslie Foster, San Jose State University Understanding Large-Scale Structure in Earth Science Remote Sensing Data Sets Amy Braverman, NASA Jet Propulsion Laboratory Data-Driven Modeling for Understanding Climate-Vegetation Interfaces Ramakrishna Nemani, NASA Ames Research Center

slide-6
SLIDE 6

Efficient & Stable Gaussian Process Calculations

Leslie Foster

San Jose State University The Gaussian process technique is one popular approach for analyzing and making predictions related to large data sets. However the traditional Gaussian process approach requires solving a system of linear equations that, in many cases, is so large that it is not practical to solve in a reasonable amount of time. We describe how low-rank approximations can be used to solve these equations approximately. The resulting algorithm is fast, accurate, numerically stable, and general. We illustrate the application

  • f the algorithm to the prediction of redshifts using broad spectrum measurements of the

light from galaxies.

slide-7
SLIDE 7

OUTLINES THE PROBLEM AND BACKGROUND LOW RANK APPROXIMATION NUMERICAL STABILITY AND RANK SELECTION RESULTS CONCLUSIONS

EFFICIENT AND STABLE GAUSSIAN PROCESS CALCULATIONS

Leslie Foster, Nabeela Aijaz, Michael Hurley, Apolo Luis, Joel Rinsky, Chandrika Satyavolu, Alex Waagen (team leader) Mathematics San Jose State University foster@math.sjsu.edu June 26, 2007, DMASES 2007

LESLIE FOSTER, NABEELA AIJAZ, MICHAEL HURLEY, APOLO LUIS, JOEL RINSKY, CHANDRIKA SATYAVOLU, ALEX WAAGEN (TEAM LEADE EFFICIENT AND STABLE GAUSSIAN PROCESS CALCULATIONS

slide-8
SLIDE 8

OUTLINES THE PROBLEM AND BACKGROUND LOW RANK APPROXIMATION NUMERICAL STABILITY AND RANK SELECTION RESULTS CONCLUSIONS

ABSTRACT

The Gaussian process technique is one popular approach for analyzing and making predictions related to large data sets. However the traditional Gaussian process approach requires solving a system of linear equations that, in many cases, is so large that it is not practical to solve in a reasonable amount of

  • time. We describe how low rank approximations can be used to

solve these equations approximately. The resulting algorithm is fast, accurate, numerically stable and general. We illustrate the application of the algorithm to the prediction of redshifts using broad spectrum measurements of the light from galaxies.

LESLIE FOSTER, NABEELA AIJAZ, MICHAEL HURLEY, APOLO LUIS, JOEL RINSKY, CHANDRIKA SATYAVOLU, ALEX WAAGEN (TEAM LEADE DMASES 2007, JUNE 26-27, 2007, MOUNTAIN VIEW, CA

slide-9
SLIDE 9

OUTLINES THE PROBLEM AND BACKGROUND LOW RANK APPROXIMATION NUMERICAL STABILITY AND RANK SELECTION RESULTS CONCLUSIONS

OUTLINE

  • I. The Problem and Background
  • II. Low Rank Approximation
  • III. Numerical Stability and Rank Selection
  • IV. Results
  • V. Conclusions

LESLIE FOSTER, NABEELA AIJAZ, MICHAEL HURLEY, APOLO LUIS, JOEL RINSKY, CHANDRIKA SATYAVOLU, ALEX WAAGEN (TEAM LEADE DMASES 2007, JUNE 26-27, 2007, MOUNTAIN VIEW, CA

slide-10
SLIDE 10

OUTLINES THE PROBLEM AND BACKGROUND LOW RANK APPROXIMATION NUMERICAL STABILITY AND RANK SELECTION RESULTS CONCLUSIONS

PREDICTION AND ESTIMATION Training Data: X – data matrix of observations – n × d y – vector of target data – n × 1 Testing Data: X ∗ – matrix of new observations – n∗ × d Goals: predict y∗ corresponding to X ∗ estimate y corresponding to X

LESLIE FOSTER, NABEELA AIJAZ, MICHAEL HURLEY, APOLO LUIS, JOEL RINSKY, CHANDRIKA SATYAVOLU, ALEX WAAGEN (TEAM LEADE DMASES 2007, JUNE 26-27, 2007, MOUNTAIN VIEW, CA

slide-11
SLIDE 11

OUTLINES THE PROBLEM AND BACKGROUND LOW RANK APPROXIMATION NUMERICAL STABILITY AND RANK SELECTION RESULTS CONCLUSIONS

Approaches for prediction with large data sets: Traditional regression Neural networks Support Vector Machines E-model . . . Gaussian Processes

LESLIE FOSTER, NABEELA AIJAZ, MICHAEL HURLEY, APOLO LUIS, JOEL RINSKY, CHANDRIKA SATYAVOLU, ALEX WAAGEN (TEAM LEADE DMASES 2007, JUNE 26-27, 2007, MOUNTAIN VIEW, CA

slide-12
SLIDE 12

OUTLINES THE PROBLEM AND BACKGROUND LOW RANK APPROXIMATION NUMERICAL STABILITY AND RANK SELECTION RESULTS CONCLUSIONS

GAUSSIAN PROCESS SOLUTION Form covariance matrix K (n × n), cross covariance matrix K ∗ (n∗ × n) and select parameter λ predict y∗ using ˆ y∗ = K ∗(λ2I + K)−1y (λ2I + K) is large – for example 180000 × 180000

LESLIE FOSTER, NABEELA AIJAZ, MICHAEL HURLEY, APOLO LUIS, JOEL RINSKY, CHANDRIKA SATYAVOLU, ALEX WAAGEN (TEAM LEADE DMASES 2007, JUNE 26-27, 2007, MOUNTAIN VIEW, CA

slide-13
SLIDE 13

OUTLINES THE PROBLEM AND BACKGROUND LOW RANK APPROXIMATION NUMERICAL STABILITY AND RANK SELECTION RESULTS CONCLUSIONS

COVARIANCE FUNCTIONS AND MATRICES Definition: A covariance function k(x, x′) is the measure of covariance between input points x and x’. covariance matrix (SPD): Kij = k(xi, xj) Examples: Polynomial, Squared Exponential, Neural Network, Rational Quadratic, Matern Class, . . .

LESLIE FOSTER, NABEELA AIJAZ, MICHAEL HURLEY, APOLO LUIS, JOEL RINSKY, CHANDRIKA SATYAVOLU, ALEX WAAGEN (TEAM LEADE DMASES 2007, JUNE 26-27, 2007, MOUNTAIN VIEW, CA

slide-14
SLIDE 14

OUTLINES THE PROBLEM AND BACKGROUND LOW RANK APPROXIMATION NUMERICAL STABILITY AND RANK SELECTION RESULTS CONCLUSIONS

COMPUTATIONAL CHALLENGES Memory: Storing covariance matrix – O(n2) Time: Solving linear system – O(n3) Numerical stability: accurate calculations.

LESLIE FOSTER, NABEELA AIJAZ, MICHAEL HURLEY, APOLO LUIS, JOEL RINSKY, CHANDRIKA SATYAVOLU, ALEX WAAGEN (TEAM LEADE DMASES 2007, JUNE 26-27, 2007, MOUNTAIN VIEW, CA

slide-15
SLIDE 15

OUTLINES THE PROBLEM AND BACKGROUND LOW RANK APPROXIMATION NUMERICAL STABILITY AND RANK SELECTION RESULTS CONCLUSIONS

APPLICATION: REDSHIFT CALCULATION Indicates that an object is moving away from you A redshift is the change in wavelength divided by the initial wavelength For example,the sound from this train is shifted and changes pitch when moving away

LESLIE FOSTER, NABEELA AIJAZ, MICHAEL HURLEY, APOLO LUIS, JOEL RINSKY, CHANDRIKA SATYAVOLU, ALEX WAAGEN (TEAM LEADE DMASES 2007, JUNE 26-27, 2007, MOUNTAIN VIEW, CA

slide-16
SLIDE 16

OUTLINES THE PROBLEM AND BACKGROUND LOW RANK APPROXIMATION NUMERICAL STABILITY AND RANK SELECTION RESULTS CONCLUSIONS

APPLICATION: REDSHIFT CALCULATION Scientists want to determine the position of galaxies in the universe. Useful for understanding the structure of the universe.

LESLIE FOSTER, NABEELA AIJAZ, MICHAEL HURLEY, APOLO LUIS, JOEL RINSKY, CHANDRIKA SATYAVOLU, ALEX WAAGEN (TEAM LEADE DMASES 2007, JUNE 26-27, 2007, MOUNTAIN VIEW, CA

slide-17
SLIDE 17

OUTLINES THE PROBLEM AND BACKGROUND LOW RANK APPROXIMATION NUMERICAL STABILITY AND RANK SELECTION RESULTS CONCLUSIONS

APPLICATION: REDSHIFT CALCULATION Five photometric observations for each galaxy denoted U,G,R,I,Z

LESLIE FOSTER, NABEELA AIJAZ, MICHAEL HURLEY, APOLO LUIS, JOEL RINSKY, CHANDRIKA SATYAVOLU, ALEX WAAGEN (TEAM LEADE DMASES 2007, JUNE 26-27, 2007, MOUNTAIN VIEW, CA

slide-18
SLIDE 18

OUTLINES THE PROBLEM AND BACKGROUND LOW RANK APPROXIMATION NUMERICAL STABILITY AND RANK SELECTION RESULTS CONCLUSIONS

APPLICATION: REDSHIFT CALCULATION We have 180,045 examples with a known U,G,R,I,Z and redshift. The goal is to be able to predict a new redshift given new U,G,R,I,Z data from a new galaxy. Testing set: 20,229 galaxies

LESLIE FOSTER, NABEELA AIJAZ, MICHAEL HURLEY, APOLO LUIS, JOEL RINSKY, CHANDRIKA SATYAVOLU, ALEX WAAGEN (TEAM LEADE DMASES 2007, JUNE 26-27, 2007, MOUNTAIN VIEW, CA

slide-19
SLIDE 19

OUTLINES THE PROBLEM AND BACKGROUND LOW RANK APPROXIMATION NUMERICAL STABILITY AND RANK SELECTION RESULTS CONCLUSIONS

BACKGROUND: LEAST SQUARES PROBLEMS Given: n × m matrix A, n ≥ m n × 1 vector y n∗ × m matrix A∗ Solve min ||y − Ax|| Estimate y: ˆ y = Ax Predict y∗: ˆ y∗ = A∗x

LESLIE FOSTER, NABEELA AIJAZ, MICHAEL HURLEY, APOLO LUIS, JOEL RINSKY, CHANDRIKA SATYAVOLU, ALEX WAAGEN (TEAM LEADE DMASES 2007, JUNE 26-27, 2007, MOUNTAIN VIEW, CA

slide-20
SLIDE 20

OUTLINES THE PROBLEM AND BACKGROUND LOW RANK APPROXIMATION NUMERICAL STABILITY AND RANK SELECTION RESULTS CONCLUSIONS

BACKGROUND: NORMAL EQUATIONS x = (ATA)−1ATy Advantage: Fast Disadvantage: cond(ATA) = cond2(A) relative error in x ∝ cond2(A) always potential numerical instability

LESLIE FOSTER, NABEELA AIJAZ, MICHAEL HURLEY, APOLO LUIS, JOEL RINSKY, CHANDRIKA SATYAVOLU, ALEX WAAGEN (TEAM LEADE DMASES 2007, JUNE 26-27, 2007, MOUNTAIN VIEW, CA

slide-21
SLIDE 21

OUTLINES THE PROBLEM AND BACKGROUND LOW RANK APPROXIMATION NUMERICAL STABILITY AND RANK SELECTION RESULTS CONCLUSIONS

BACKGROUND: ORTHOGONAL (QR) FACTORIZATION Form A = QR where Q is n × m with orthonormal columns R is m × m right triangular x = R−1QTy Disadvantages: can be slower, more memory (in Matlab) Advantage: numerically stable can be more accurate

LESLIE FOSTER, NABEELA AIJAZ, MICHAEL HURLEY, APOLO LUIS, JOEL RINSKY, CHANDRIKA SATYAVOLU, ALEX WAAGEN (TEAM LEADE DMASES 2007, JUNE 26-27, 2007, MOUNTAIN VIEW, CA

slide-22
SLIDE 22

OUTLINES THE PROBLEM AND BACKGROUND LOW RANK APPROXIMATION NUMERICAL STABILITY AND RANK SELECTION RESULTS CONCLUSIONS

LOW RANK APPROXIMATION K = m n − m m n − m K11 K12 K21 K22

  • =

m n − m n

  • K1

K2

  • K ∗ =

m n − m n∗

  • K ∗

1

K ∗

2

  • K ∼

= K ≡ K1K −1

11 K T 1

K ∗ ∼ = K ∗ ≡ K ∗

1K −1 11 K T 1

LESLIE FOSTER, NABEELA AIJAZ, MICHAEL HURLEY, APOLO LUIS, JOEL RINSKY, CHANDRIKA SATYAVOLU, ALEX WAAGEN (TEAM LEADE DMASES 2007, JUNE 26-27, 2007, MOUNTAIN VIEW, CA

slide-23
SLIDE 23

OUTLINES THE PROBLEM AND BACKGROUND LOW RANK APPROXIMATION NUMERICAL STABILITY AND RANK SELECTION RESULTS CONCLUSIONS

LOW RANK APPROXIMATION: SR FORMULA Recall ˆ y∗ = K ∗(λ2I + K)−1y Replace K with K and K ∗ with K ∗ so that

  • y∗ ∼

= K ∗(λ2I + K)−1y = . . . . . . . . .

  • y∗ ∼

= K ∗

1(λ2K11 + K T 1 K1)−1K T 1 y

Subset of Regressors Formula [Wahba, 1990]

LESLIE FOSTER, NABEELA AIJAZ, MICHAEL HURLEY, APOLO LUIS, JOEL RINSKY, CHANDRIKA SATYAVOLU, ALEX WAAGEN (TEAM LEADE DMASES 2007, JUNE 26-27, 2007, MOUNTAIN VIEW, CA

slide-24
SLIDE 24

OUTLINES THE PROBLEM AND BACKGROUND LOW RANK APPROXIMATION NUMERICAL STABILITY AND RANK SELECTION RESULTS CONCLUSIONS

COMPUTATIONAL CHALLENGES OVERCOME

  • y∗ ∼

= K ∗

1(λ2K11 + K T 1 K1)−1K T 1 y

Memory: Storing covariance matrix – O(nm) Time: Solving linear system – O(nm2) Numerical stability: ???.

LESLIE FOSTER, NABEELA AIJAZ, MICHAEL HURLEY, APOLO LUIS, JOEL RINSKY, CHANDRIKA SATYAVOLU, ALEX WAAGEN (TEAM LEADE DMASES 2007, JUNE 26-27, 2007, MOUNTAIN VIEW, CA

slide-25
SLIDE 25

OUTLINES THE PROBLEM AND BACKGROUND LOW RANK APPROXIMATION NUMERICAL STABILITY AND RANK SELECTION RESULTS CONCLUSIONS

SR FORMULA AND LEAST SQUARES In SR formula consider special case λ = 0

  • y∗ = K ∗

1(K T 1 K1)−1K T 1 y

Exactly normal equations solution to the least squares prediction problem: min ||y − K1x|| and y∗ = K ∗

1x

Note: can be easily extended for λ = 0 Potential numerical instability

LESLIE FOSTER, NABEELA AIJAZ, MICHAEL HURLEY, APOLO LUIS, JOEL RINSKY, CHANDRIKA SATYAVOLU, ALEX WAAGEN (TEAM LEADE DMASES 2007, JUNE 26-27, 2007, MOUNTAIN VIEW, CA

slide-26
SLIDE 26

OUTLINES THE PROBLEM AND BACKGROUND LOW RANK APPROXIMATION NUMERICAL STABILITY AND RANK SELECTION RESULTS CONCLUSIONS

CURES FOR NUMERICAL INSTABILITY

  • 1. Use stable technique for least squares

problem QR factorization "V method"

  • 2. Make K1 as well conditioned as possible

LESLIE FOSTER, NABEELA AIJAZ, MICHAEL HURLEY, APOLO LUIS, JOEL RINSKY, CHANDRIKA SATYAVOLU, ALEX WAAGEN (TEAM LEADE DMASES 2007, JUNE 26-27, 2007, MOUNTAIN VIEW, CA

slide-27
SLIDE 27

OUTLINES THE PROBLEM AND BACKGROUND LOW RANK APPROXIMATION NUMERICAL STABILITY AND RANK SELECTION RESULTS CONCLUSIONS

THE V METHOD Factor K1 = VV T

11 where V is n × m and V11

is m × m lower triangular

  • y∗ = K ∗

1V −T 11 (λ2I + V TV)−1V Ty

V is a rescaling of a well conditioned matrix method is numerically stable can be faster and need less memory related to [Peters and Wilkinson, 1970], [Wahba, 1990, p. 136]

LESLIE FOSTER, NABEELA AIJAZ, MICHAEL HURLEY, APOLO LUIS, JOEL RINSKY, CHANDRIKA SATYAVOLU, ALEX WAAGEN (TEAM LEADE DMASES 2007, JUNE 26-27, 2007, MOUNTAIN VIEW, CA

slide-28
SLIDE 28

OUTLINES THE PROBLEM AND BACKGROUND LOW RANK APPROXIMATION NUMERICAL STABILITY AND RANK SELECTION RESULTS CONCLUSIONS

COLUMN SELECTION Use partial Cholesky factorization with pivoting to form V selects appropriate columns for K1 K1 will be well conditioned: cond(K1) is O(condition of optimal low rank approximation) [Higham, 2002, pp. 196-208]

LESLIE FOSTER, NABEELA AIJAZ, MICHAEL HURLEY, APOLO LUIS, JOEL RINSKY, CHANDRIKA SATYAVOLU, ALEX WAAGEN (TEAM LEADE DMASES 2007, JUNE 26-27, 2007, MOUNTAIN VIEW, CA

slide-29
SLIDE 29

OUTLINES THE PROBLEM AND BACKGROUND LOW RANK APPROXIMATION NUMERICAL STABILITY AND RANK SELECTION RESULTS CONCLUSIONS

CHOICE OF RANK For least squares problems there are efficient techniques to drop columns [Bjorck, 1996, p. 133] The techniques can be easily adapted Solve GP problem with rank m approximation Small additional cost to determine the accuracy of all lower rank k approximation, k = 1, . . . , m

LESLIE FOSTER, NABEELA AIJAZ, MICHAEL HURLEY, APOLO LUIS, JOEL RINSKY, CHANDRIKA SATYAVOLU, ALEX WAAGEN (TEAM LEADE DMASES 2007, JUNE 26-27, 2007, MOUNTAIN VIEW, CA

slide-30
SLIDE 30

OUTLINES THE PROBLEM AND BACKGROUND LOW RANK APPROXIMATION NUMERICAL STABILITY AND RANK SELECTION RESULTS CONCLUSIONS

COMPUTING TIMES

LESLIE FOSTER, NABEELA AIJAZ, MICHAEL HURLEY, APOLO LUIS, JOEL RINSKY, CHANDRIKA SATYAVOLU, ALEX WAAGEN (TEAM LEADE DMASES 2007, JUNE 26-27, 2007, MOUNTAIN VIEW, CA

slide-31
SLIDE 31

OUTLINES THE PROBLEM AND BACKGROUND LOW RANK APPROXIMATION NUMERICAL STABILITY AND RANK SELECTION RESULTS CONCLUSIONS

COMPUTING TIMES

LESLIE FOSTER, NABEELA AIJAZ, MICHAEL HURLEY, APOLO LUIS, JOEL RINSKY, CHANDRIKA SATYAVOLU, ALEX WAAGEN (TEAM LEADE DMASES 2007, JUNE 26-27, 2007, MOUNTAIN VIEW, CA

slide-32
SLIDE 32

OUTLINES THE PROBLEM AND BACKGROUND LOW RANK APPROXIMATION NUMERICAL STABILITY AND RANK SELECTION RESULTS CONCLUSIONS

ESTIMATING A METHOD’S ACCURACY: BOOTSTRAP Bootstrap: standard statistical resampling technique Generate multiple (100) samples to test methods Determine reliability, error bounds Stable methods have smaller range of error

LESLIE FOSTER, NABEELA AIJAZ, MICHAEL HURLEY, APOLO LUIS, JOEL RINSKY, CHANDRIKA SATYAVOLU, ALEX WAAGEN (TEAM LEADE DMASES 2007, JUNE 26-27, 2007, MOUNTAIN VIEW, CA

slide-33
SLIDE 33

OUTLINES THE PROBLEM AND BACKGROUND LOW RANK APPROXIMATION NUMERICAL STABILITY AND RANK SELECTION RESULTS CONCLUSIONS

BOOTSTRAP RESAMPLING, n = 180045, m = 100

LESLIE FOSTER, NABEELA AIJAZ, MICHAEL HURLEY, APOLO LUIS, JOEL RINSKY, CHANDRIKA SATYAVOLU, ALEX WAAGEN (TEAM LEADE DMASES 2007, JUNE 26-27, 2007, MOUNTAIN VIEW, CA

slide-34
SLIDE 34

OUTLINES THE PROBLEM AND BACKGROUND LOW RANK APPROXIMATION NUMERICAL STABILITY AND RANK SELECTION RESULTS CONCLUSIONS

BOOTSTRAP RESAMPLING: V METHOD V method with pivoting, n = 36009, m = 1000

LESLIE FOSTER, NABEELA AIJAZ, MICHAEL HURLEY, APOLO LUIS, JOEL RINSKY, CHANDRIKA SATYAVOLU, ALEX WAAGEN (TEAM LEADE DMASES 2007, JUNE 26-27, 2007, MOUNTAIN VIEW, CA

slide-35
SLIDE 35

OUTLINES THE PROBLEM AND BACKGROUND LOW RANK APPROXIMATION NUMERICAL STABILITY AND RANK SELECTION RESULTS CONCLUSIONS

BOOTSTRAP RESAMPLING: V + SR METHOD V method with pivoting and SR method

LESLIE FOSTER, NABEELA AIJAZ, MICHAEL HURLEY, APOLO LUIS, JOEL RINSKY, CHANDRIKA SATYAVOLU, ALEX WAAGEN (TEAM LEADE DMASES 2007, JUNE 26-27, 2007, MOUNTAIN VIEW, CA

slide-36
SLIDE 36

OUTLINES THE PROBLEM AND BACKGROUND LOW RANK APPROXIMATION NUMERICAL STABILITY AND RANK SELECTION RESULTS CONCLUSIONS

BOOTSTRAP RESAMPLING: V, WITH AND W/O PIVOTING

LESLIE FOSTER, NABEELA AIJAZ, MICHAEL HURLEY, APOLO LUIS, JOEL RINSKY, CHANDRIKA SATYAVOLU, ALEX WAAGEN (TEAM LEADE DMASES 2007, JUNE 26-27, 2007, MOUNTAIN VIEW, CA

slide-37
SLIDE 37

OUTLINES THE PROBLEM AND BACKGROUND LOW RANK APPROXIMATION NUMERICAL STABILITY AND RANK SELECTION RESULTS CONCLUSIONS

RMSE ERROR VS. RANK

LESLIE FOSTER, NABEELA AIJAZ, MICHAEL HURLEY, APOLO LUIS, JOEL RINSKY, CHANDRIKA SATYAVOLU, ALEX WAAGEN (TEAM LEADE DMASES 2007, JUNE 26-27, 2007, MOUNTAIN VIEW, CA

slide-38
SLIDE 38

OUTLINES THE PROBLEM AND BACKGROUND LOW RANK APPROXIMATION NUMERICAL STABILITY AND RANK SELECTION RESULTS CONCLUSIONS

RMSE ERROR VS. NUMBER OF GALAXIES

LESLIE FOSTER, NABEELA AIJAZ, MICHAEL HURLEY, APOLO LUIS, JOEL RINSKY, CHANDRIKA SATYAVOLU, ALEX WAAGEN (TEAM LEADE DMASES 2007, JUNE 26-27, 2007, MOUNTAIN VIEW, CA

slide-39
SLIDE 39

OUTLINES THE PROBLEM AND BACKGROUND LOW RANK APPROXIMATION NUMERICAL STABILITY AND RANK SELECTION RESULTS CONCLUSIONS

GP VS. ALTERNATIVE METHODS Way and Srivastava, 2006:

LESLIE FOSTER, NABEELA AIJAZ, MICHAEL HURLEY, APOLO LUIS, JOEL RINSKY, CHANDRIKA SATYAVOLU, ALEX WAAGEN (TEAM LEADE DMASES 2007, JUNE 26-27, 2007, MOUNTAIN VIEW, CA

slide-40
SLIDE 40

OUTLINES THE PROBLEM AND BACKGROUND LOW RANK APPROXIMATION NUMERICAL STABILITY AND RANK SELECTION RESULTS CONCLUSIONS

GP VS. ALTERNATIVE METHODS Way and Srivastava, 2006 + our results:

LESLIE FOSTER, NABEELA AIJAZ, MICHAEL HURLEY, APOLO LUIS, JOEL RINSKY, CHANDRIKA SATYAVOLU, ALEX WAAGEN (TEAM LEADE DMASES 2007, JUNE 26-27, 2007, MOUNTAIN VIEW, CA

slide-41
SLIDE 41

OUTLINES THE PROBLEM AND BACKGROUND LOW RANK APPROXIMATION NUMERICAL STABILITY AND RANK SELECTION RESULTS CONCLUSIONS

SUMMARY OF RESULTS Code solves linear algebra issues in the Gaussian process approach: Fast - O(nm2), m << n Accurate - good predictions Stable - bootstrap error curves flat General - works for any kernel

LESLIE FOSTER, NABEELA AIJAZ, MICHAEL HURLEY, APOLO LUIS, JOEL RINSKY, CHANDRIKA SATYAVOLU, ALEX WAAGEN (TEAM LEADE DMASES 2007, JUNE 26-27, 2007, MOUNTAIN VIEW, CA

slide-42
SLIDE 42

OUTLINES THE PROBLEM AND BACKGROUND LOW RANK APPROXIMATION NUMERICAL STABILITY AND RANK SELECTION RESULTS CONCLUSIONS

FURTHER WORK Outliers hyperparameters using low rank approximation (we used minimize from [Rasmussen and William, 2006]) additional covariance functions lower bound on errors (ex: for redshift .02?)

LESLIE FOSTER, NABEELA AIJAZ, MICHAEL HURLEY, APOLO LUIS, JOEL RINSKY, CHANDRIKA SATYAVOLU, ALEX WAAGEN (TEAM LEADE DMASES 2007, JUNE 26-27, 2007, MOUNTAIN VIEW, CA

slide-43
SLIDE 43

OUTLINES THE PROBLEM AND BACKGROUND LOW RANK APPROXIMATION NUMERICAL STABILITY AND RANK SELECTION RESULTS CONCLUSIONS

REFERENCES

  • A. Bjork, Numerical Methods for Least Squares Problems,

SIAM, 1996.

  • N. Higham, Accuracy and Stability of Numerical

Algorithms, SIAM, 2002.

  • G. Peters and J. Wilkinson, Comput. J. (13), pp. 309-316,

1970.

  • C. Rasmussen and C. Williams, Gaussian Processes for

Machine Learning, MIT Press, 2006.

  • G. Wahba, Spline Models for Observation Data, SIAM,

1990.

  • M. Way and A. Srivastava, Astrophysical Journal (647), pp.

102-115, 2006.

LESLIE FOSTER, NABEELA AIJAZ, MICHAEL HURLEY, APOLO LUIS, JOEL RINSKY, CHANDRIKA SATYAVOLU, ALEX WAAGEN (TEAM LEADE DMASES 2007, JUNE 26-27, 2007, MOUNTAIN VIEW, CA

slide-44
SLIDE 44

OUTLINES THE PROBLEM AND BACKGROUND LOW RANK APPROXIMATION NUMERICAL STABILITY AND RANK SELECTION RESULTS CONCLUSIONS

ACKNOWLEDGEMENT We would like to thank the Woodward Fund for the financial support and the following people for their guidance.

  • Drs. Michael Way, Ashok Srivastava, Tim

Lee, Paul Gazis (NASA scientists)

  • Dr. Tim Hsu (CAMCOS director)
  • Drs. Bem Cayco, Wasin So and Steve Crunk

(SJSU faculty)

LESLIE FOSTER, NABEELA AIJAZ, MICHAEL HURLEY, APOLO LUIS, JOEL RINSKY, CHANDRIKA SATYAVOLU, ALEX WAAGEN (TEAM LEADE DMASES 2007, JUNE 26-27, 2007, MOUNTAIN VIEW, CA