S PECTRUM D ATA - II Nabeela Aijaz, Michael Hurley, Apolo Luis, Joel - - PowerPoint PPT Presentation

s pectrum d ata ii
SMART_READER_LITE
LIVE PREVIEW

S PECTRUM D ATA - II Nabeela Aijaz, Michael Hurley, Apolo Luis, Joel - - PowerPoint PPT Presentation

O UTLINES T HE P ROBLEM D ETAILS ON SOLVING THE PROBLEM R ESULTS C ONCLUSION AND FUTURE RESULTS I MPROVED L INEAR A LGEBRA M ETHODS FOR R EDSHIFT C OMPUTATION FROM L IMITED S PECTRUM D ATA - II Nabeela Aijaz, Michael Hurley, Apolo Luis, Joel


slide-1
SLIDE 1

OUTLINES THE PROBLEM DETAILS ON SOLVING THE PROBLEM RESULTS CONCLUSION AND FUTURE RESULTS

IMPROVED LINEAR ALGEBRA METHODS FOR REDSHIFT COMPUTATION FROM LIMITED SPECTRUM DATA - II

Nabeela Aijaz, Michael Hurley, Apolo Luis, Joel Rinsky, Chandrika Satyavolu, Alex Waagen (leader) May 16, 2007

NABEELA AIJAZ, MICHAEL HURLEY, APOLO LUIS, JOEL RINSKY, CHANDRIKA SATYAVOLU, ALEX WAAGEN (LEADER) IMPROVED LINEAR ALGEBRA METHODS FOR REDSHIFT COMPUTATI

slide-2
SLIDE 2

OUTLINES THE PROBLEM DETAILS ON SOLVING THE PROBLEM RESULTS CONCLUSION AND FUTURE RESULTS OUTLINE

OUTLINE

  • I. The problem
  • II. Details on solving the problem
  • III. Results
  • IV. Conclusion and future directions

JOEL RINSKY CAMCOS REPORT DAY, MAY 16, 2007

slide-3
SLIDE 3

OUTLINES THE PROBLEM DETAILS ON SOLVING THE PROBLEM RESULTS CONCLUSION AND FUTURE RESULTS WHAT IS A REDSHIFT REGRESSION PROBLEM LAST SEMESTERS RESULTS MORE THOROUGH STATEMENT OVERVIEW OF LAST SEMESTERS METHODS

WHAT IS A REDSHIFT? Indicates that an object is moving away from you A redshift is the change in wavelength divided by the initial wavelength For example,the sound from this train is shifted and changes pitch when moving away

JOEL RINSKY CAMCOS REPORT DAY, MAY 16, 2007

slide-4
SLIDE 4

OUTLINES THE PROBLEM DETAILS ON SOLVING THE PROBLEM RESULTS CONCLUSION AND FUTURE RESULTS WHAT IS A REDSHIFT REGRESSION PROBLEM LAST SEMESTERS RESULTS MORE THOROUGH STATEMENT OVERVIEW OF LAST SEMESTERS METHODS

WHY IS IT IMPORTANT Scientists want to determine the position of galaxies in the universe. Useful for understanding the structure of the universe.

JOEL RINSKY CAMCOS REPORT DAY, MAY 16, 2007

slide-5
SLIDE 5

OUTLINES THE PROBLEM DETAILS ON SOLVING THE PROBLEM RESULTS CONCLUSION AND FUTURE RESULTS WHAT IS A REDSHIFT REGRESSION PROBLEM LAST SEMESTERS RESULTS MORE THOROUGH STATEMENT OVERVIEW OF LAST SEMESTERS METHODS

THE REGRESSION PROBLEM Five photometric observations for each galaxy denoted U,G,R,I,Z

JOEL RINSKY CAMCOS REPORT DAY, MAY 16, 2007

slide-6
SLIDE 6

OUTLINES THE PROBLEM DETAILS ON SOLVING THE PROBLEM RESULTS CONCLUSION AND FUTURE RESULTS WHAT IS A REDSHIFT REGRESSION PROBLEM LAST SEMESTERS RESULTS MORE THOROUGH STATEMENT OVERVIEW OF LAST SEMESTERS METHODS

THE REGRESSION PROBLEM We have 180,000 examples with a known U,G,R,I,Z and redshift. The goal is to be able to predict a new redshift given new U,G,R,I,Z data from a new galaxy.

JOEL RINSKY CAMCOS REPORT DAY, MAY 16, 2007

slide-7
SLIDE 7

OUTLINES THE PROBLEM DETAILS ON SOLVING THE PROBLEM RESULTS CONCLUSION AND FUTURE RESULTS WHAT IS A REDSHIFT REGRESSION PROBLEM LAST SEMESTERS RESULTS MORE THOROUGH STATEMENT OVERVIEW OF LAST SEMESTERS METHODS

LAST SEMESTERS RESULTS Last semester predicted the redshift with an error of .0245 Were able to efficiently make use of all 180,000 sets of data

JOEL RINSKY CAMCOS REPORT DAY, MAY 16, 2007

slide-8
SLIDE 8

OUTLINES THE PROBLEM DETAILS ON SOLVING THE PROBLEM RESULTS CONCLUSION AND FUTURE RESULTS WHAT IS A REDSHIFT REGRESSION PROBLEM LAST SEMESTERS RESULTS MORE THOROUGH STATEMENT OVERVIEW OF LAST SEMESTERS METHODS

THIS SEMESTERS RESULTS Completely solved the linear algebra!!! Fast Accurate Stable General

JOEL RINSKY CAMCOS REPORT DAY, MAY 16, 2007

slide-9
SLIDE 9

OUTLINES THE PROBLEM DETAILS ON SOLVING THE PROBLEM RESULTS CONCLUSION AND FUTURE RESULTS WHAT IS A REDSHIFT REGRESSION PROBLEM LAST SEMESTERS RESULTS MORE THOROUGH STATEMENT OVERVIEW OF LAST SEMESTERS METHODS

MORE THOROUGH STATEMENT OF PROBLEM Matrix X: 180, 000 × 5 (U,G,R,I,Z) Vector y: 180, 000 × 1 vector, redshift values for training data. Matrix X ∗: 20, 000 × 5 matrix, testing data whose redshifts (y*) are unknown. y∗, must be predicted.

MICHAEL HURLEY CAMCOS REPORT DAY, MAY 16, 2007

slide-10
SLIDE 10

OUTLINES THE PROBLEM DETAILS ON SOLVING THE PROBLEM RESULTS CONCLUSION AND FUTURE RESULTS WHAT IS A REDSHIFT REGRESSION PROBLEM LAST SEMESTERS RESULTS MORE THOROUGH STATEMENT OVERVIEW OF LAST SEMESTERS METHODS

COVARIANCE FUNCTIONS AND MATRICES Definition: A covariance function k(x, x′) is the measure of covariance between input points x and x’. Covariance matrix: Kij = k(xi, xj)

MICHAEL HURLEY CAMCOS REPORT DAY, MAY 16, 2007

slide-11
SLIDE 11

OUTLINES THE PROBLEM DETAILS ON SOLVING THE PROBLEM RESULTS CONCLUSION AND FUTURE RESULTS WHAT IS A REDSHIFT REGRESSION PROBLEM LAST SEMESTERS RESULTS MORE THOROUGH STATEMENT OVERVIEW OF LAST SEMESTERS METHODS

EXAMPLES OF COVARIANCE FUNCTIONS AND MATRICES 1st Semester-Polynomial Kernel 2nd Semester-Squared Exponential, Neural Network, Rational Quadratic, Matern Class

MICHAEL HURLEY CAMCOS REPORT DAY, MAY 16, 2007

slide-12
SLIDE 12

OUTLINES THE PROBLEM DETAILS ON SOLVING THE PROBLEM RESULTS CONCLUSION AND FUTURE RESULTS WHAT IS A REDSHIFT REGRESSION PROBLEM LAST SEMESTERS RESULTS MORE THOROUGH STATEMENT OVERVIEW OF LAST SEMESTERS METHODS

TRADITIONAL GAUSSIAN PROCESS EQUATIONS requires solving ˆ y∗ = K ∗(λ2I + K)−1y K ∗ is the covariance matrix formed using X ∗ requires O(n3) operations, n = 180,000

MICHAEL HURLEY CAMCOS REPORT DAY, MAY 16, 2007

slide-13
SLIDE 13

OUTLINES THE PROBLEM DETAILS ON SOLVING THE PROBLEM RESULTS CONCLUSION AND FUTURE RESULTS WHAT IS A REDSHIFT REGRESSION PROBLEM LAST SEMESTERS RESULTS MORE THOROUGH STATEMENT OVERVIEW OF LAST SEMESTERS METHODS

PREVIOUS METHODS USED: RR, CG, CU Reduced Rank Conjugate Gradient Cholesky Update Quadratic regression (classic method)

MICHAEL HURLEY CAMCOS REPORT DAY, MAY 16, 2007

slide-14
SLIDE 14

OUTLINES THE PROBLEM DETAILS ON SOLVING THE PROBLEM RESULTS CONCLUSION AND FUTURE RESULTS WHAT IS A REDSHIFT REGRESSION PROBLEM LAST SEMESTERS RESULTS MORE THOROUGH STATEMENT OVERVIEW OF LAST SEMESTERS METHODS

GIBBS SAMPLER Take a representative random sample of

  • vectors. The vectors’ covariance approaches

the inverse of the kernel. Polynomial kernels did not work well, but exponential kernels work better. Slow

MICHAEL HURLEY CAMCOS REPORT DAY, MAY 16, 2007

slide-15
SLIDE 15

OUTLINES THE PROBLEM DETAILS ON SOLVING THE PROBLEM RESULTS CONCLUSION AND FUTURE RESULTS LOW RANK APPROXIMATION SR V FORMULATION

COMPUTATIONAL DIFFICULTIES Limited computing resources. Primary Computational issues: Memory: Storing covariance matrix. Time: Solving linear system of equations.

ALEX WAAGEN CAMCOS REPORT DAY, MAY 16, 2007

slide-16
SLIDE 16

OUTLINES THE PROBLEM DETAILS ON SOLVING THE PROBLEM RESULTS CONCLUSION AND FUTURE RESULTS LOW RANK APPROXIMATION SR V FORMULATION

LOW RANK APPROXIMATIONS Definition: A low rank matrix ˆ K is a low rank approximation of K if

  • ˆ

y∗

K − ˆ

y∗

ˆ K

  • is small.

If ˆ K is positive semi-definite of rank m, an n x m matrix V exists such that ˆ K = VV T

ALEX WAAGEN CAMCOS REPORT DAY, MAY 16, 2007

slide-17
SLIDE 17

OUTLINES THE PROBLEM DETAILS ON SOLVING THE PROBLEM RESULTS CONCLUSION AND FUTURE RESULTS LOW RANK APPROXIMATION SR V FORMULATION

PARTIAL CHOLESKY DECOMPOSITION The partial Cholesky decomposition allows us to calculate V such that:

ALEX WAAGEN CAMCOS REPORT DAY, MAY 16, 2007

slide-18
SLIDE 18

OUTLINES THE PROBLEM DETAILS ON SOLVING THE PROBLEM RESULTS CONCLUSION AND FUTURE RESULTS LOW RANK APPROXIMATION SR V FORMULATION

LOW RANK APPROXIMATION OF K

            v11 · · · v21 v22 · · · . . . . . . ... . . . vk−1,1 vk−1,2 · · · vk1 vk2 · · · vkn . . . . . . ... . . . vn1 vn2 · · · vnn                  v11 v21 · · · vk−1,1 vk · · · vn v22 · · · vk−1,2 v2k · · · v2n . . . . . . ... . . . . . . ... . . . · · · vkn · · · vn2     

ALEX WAAGEN CAMCOS REPORT DAY, MAY 16, 2007

slide-19
SLIDE 19

OUTLINES THE PROBLEM DETAILS ON SOLVING THE PROBLEM RESULTS CONCLUSION AND FUTURE RESULTS LOW RANK APPROXIMATION SR V FORMULATION

PARTIAL CHOLESKY DECOMPOSITION (WITH PIVOTING) Partial cholesky decomposition is O(nm2). May be used to compute ˆ K = VV T. Pivoting can be used to improve numerical stability. If m is small, V may be stored in memory.

ALEX WAAGEN CAMCOS REPORT DAY, MAY 16, 2007

slide-20
SLIDE 20

OUTLINES THE PROBLEM DETAILS ON SOLVING THE PROBLEM RESULTS CONCLUSION AND FUTURE RESULTS LOW RANK APPROXIMATION SR V FORMULATION

ADDITIONAL METHODS The computation ˆ y∗ = K ∗(λ2I + K)−1y requires O(n3) operations Subset of Regressors(SR) due Wahba(1990) V Formulation, a new method Both require O(nm2) operations, where m = 100 << 180000 = n

NABEELA AIJAZ CAMCOS REPORT DAY, MAY 16, 2007

slide-21
SLIDE 21

OUTLINES THE PROBLEM DETAILS ON SOLVING THE PROBLEM RESULTS CONCLUSION AND FUTURE RESULTS LOW RANK APPROXIMATION SR V FORMULATION

SUBSET OF REGRESSORS (SR) Form K ≈ VV T K1 and K11 are submatrices of K ˆ y∗ = K ∗

1(λ2K11 + K T 1 K1)−1K T 1 y

NABEELA AIJAZ CAMCOS REPORT DAY, MAY 16, 2007

slide-22
SLIDE 22

OUTLINES THE PROBLEM DETAILS ON SOLVING THE PROBLEM RESULTS CONCLUSION AND FUTURE RESULTS LOW RANK APPROXIMATION SR V FORMULATION

THE MAGIC LEMMA Form K ≈ VV T ˆ y∗ = V ∗V T(λ2I + VV T)−1y Magic Lemma: V T(λ2I + VV T)−1 = (λ2I + V TV)−1V T (λ2I + VV T) is 180,000 x 180,000 (λ2I + V TV) is only 100 x 100 Speed improved by a factor of MILLIONS!!

NABEELA AIJAZ CAMCOS REPORT DAY, MAY 16, 2007

slide-23
SLIDE 23

OUTLINES THE PROBLEM DETAILS ON SOLVING THE PROBLEM RESULTS CONCLUSION AND FUTURE RESULTS LOW RANK APPROXIMATION SR V FORMULATION

V FORMULATION Using Magic Lemma we can derive V Formulation. ˆ y∗ = V ∗(λ2I + V TV)−1V Ty

NABEELA AIJAZ CAMCOS REPORT DAY, MAY 16, 2007

slide-24
SLIDE 24

OUTLINES THE PROBLEM DETAILS ON SOLVING THE PROBLEM RESULTS CONCLUSION AND FUTURE RESULTS COMPUTING TIMES BOOTSTRAP RESULTS

COMPUTING TIMES, NEW METHODS WITH PARTIAL CHOLESKY

APOLO LUIS CAMCOS REPORT DAY, MAY 16, 2007

slide-25
SLIDE 25

OUTLINES THE PROBLEM DETAILS ON SOLVING THE PROBLEM RESULTS CONCLUSION AND FUTURE RESULTS COMPUTING TIMES BOOTSTRAP RESULTS

ESTIMATING A METHOD’S ACCURACY: BOOTSTRAP Bootstrap: standard statistical resampling technique Generate multiple (100) samples to test methods Determine reliability, error bounds Stable methods have smaller range of error

APOLO LUIS CAMCOS REPORT DAY, MAY 16, 2007

slide-26
SLIDE 26

OUTLINES THE PROBLEM DETAILS ON SOLVING THE PROBLEM RESULTS CONCLUSION AND FUTURE RESULTS COMPUTING TIMES BOOTSTRAP RESULTS

V IS MORE STABLE THAN SR

APOLO LUIS CAMCOS REPORT DAY, MAY 16, 2007

slide-27
SLIDE 27

OUTLINES THE PROBLEM DETAILS ON SOLVING THE PROBLEM RESULTS CONCLUSION AND FUTURE RESULTS COMPUTING TIMES BOOTSTRAP RESULTS

V IS MORE ACCURATE THAN LAST SEMESTER

APOLO LUIS CAMCOS REPORT DAY, MAY 16, 2007

slide-28
SLIDE 28

OUTLINES THE PROBLEM DETAILS ON SOLVING THE PROBLEM RESULTS CONCLUSION AND FUTURE RESULTS CONCLUSION

THIS SEMESTERS RESULTS Completely solved the linear algebra for the Gaussian process approach!!! Fast Accurate Stable General

CHANDRIKA SATYAVOLU CAMCOS REPORT DAY, MAY 16, 2007

slide-29
SLIDE 29

OUTLINES THE PROBLEM DETAILS ON SOLVING THE PROBLEM RESULTS CONCLUSION AND FUTURE RESULTS CONCLUSION

CONCLUDING REMARKS ON OUR RESULTS For Gaussian process approach: Solving the entire system is not practical. SR method is fast, but can be unstable and not as accurate. V method is fast, stable and accurate.

CHANDRIKA SATYAVOLU CAMCOS REPORT DAY, MAY 16, 2007

slide-30
SLIDE 30

OUTLINES THE PROBLEM DETAILS ON SOLVING THE PROBLEM RESULTS CONCLUSION AND FUTURE RESULTS CONCLUSION

CONCLUDING REMARKS ON OUR RESULTS Semester 1 Semester 2 Speed Fast Faster Accuracy .0245 .0215 Generality Polynomial Kernels Any Kernel

CHANDRIKA SATYAVOLU CAMCOS REPORT DAY, MAY 16, 2007

slide-31
SLIDE 31

OUTLINES THE PROBLEM DETAILS ON SOLVING THE PROBLEM RESULTS CONCLUSION AND FUTURE RESULTS CONCLUSION

FURTHER RESEARCH Is there a lower limit for the rms error? .0215? .0202? Experiment with additional covariance functions. Exclude outliers.

CHANDRIKA SATYAVOLU CAMCOS REPORT DAY, MAY 16, 2007

slide-32
SLIDE 32

OUTLINES THE PROBLEM DETAILS ON SOLVING THE PROBLEM RESULTS CONCLUSION AND FUTURE RESULTS CONCLUSION

GRAPH ILLUSTRATING OUTLIERS

CHANDRIKA SATYAVOLU CAMCOS REPORT DAY, MAY 16, 2007

slide-33
SLIDE 33

OUTLINES THE PROBLEM DETAILS ON SOLVING THE PROBLEM RESULTS CONCLUSION AND FUTURE RESULTS CONCLUSION

ACKNOWLEDGEMENT We would like to thank the Woodward Fund for the financial support and the following people for their guidance.

  • Drs. Michael Way, Ashok Srivastava, Tim

Lee, Paul Gazis (NASA scientists)

  • Dr. Tim Hsu (CAMCOS director), Dr. Leslie

Foster (Faculty advisor)

  • Drs. Bem Cayco, Wasin So and last

semester’s team

  • Dr. Steve Crunk (SJSU faculty)

CHANDRIKA SATYAVOLU CAMCOS REPORT DAY, MAY 16, 2007

slide-34
SLIDE 34

OUTLINES THE PROBLEM DETAILS ON SOLVING THE PROBLEM RESULTS CONCLUSION AND FUTURE RESULTS CONCLUSION

THE END!!! THE END!!! Questions?

CHANDRIKA SATYAVOLU CAMCOS REPORT DAY, MAY 16, 2007

slide-35
SLIDE 35

OUTLINES THE PROBLEM DETAILS ON SOLVING THE PROBLEM RESULTS CONCLUSION AND FUTURE RESULTS CONCLUSION

DIRECTIONS TO LUNCH

Lunch will be at Café Pomegranate, 221 E. San Fernando St. (at 6th), (408) 271-8822 (* on map).

CHANDRIKA SATYAVOLU CAMCOS REPORT DAY, MAY 16, 2007