s pectrum d ata ii
play

S PECTRUM D ATA - II Nabeela Aijaz, Michael Hurley, Apolo Luis, Joel - PowerPoint PPT Presentation

O UTLINES T HE P ROBLEM D ETAILS ON SOLVING THE PROBLEM R ESULTS C ONCLUSION AND FUTURE RESULTS I MPROVED L INEAR A LGEBRA M ETHODS FOR R EDSHIFT C OMPUTATION FROM L IMITED S PECTRUM D ATA - II Nabeela Aijaz, Michael Hurley, Apolo Luis, Joel


  1. O UTLINES T HE P ROBLEM D ETAILS ON SOLVING THE PROBLEM R ESULTS C ONCLUSION AND FUTURE RESULTS I MPROVED L INEAR A LGEBRA M ETHODS FOR R EDSHIFT C OMPUTATION FROM L IMITED S PECTRUM D ATA - II Nabeela Aijaz, Michael Hurley, Apolo Luis, Joel Rinsky, Chandrika Satyavolu, Alex Waagen (leader) May 16, 2007 N ABEELA A IJAZ , M ICHAEL H URLEY , A POLO L UIS , J OEL R INSKY , C HANDRIKA S ATYAVOLU , A LEX W AAGEN ( LEADER ) I MPROVED L INEAR A LGEBRA M ETHODS FOR R EDSHIFT C OMPUTATI

  2. O UTLINES T HE P ROBLEM D ETAILS ON SOLVING THE PROBLEM O UTLINE R ESULTS C ONCLUSION AND FUTURE RESULTS O UTLINE I. The problem II. Details on solving the problem III. Results IV. Conclusion and future directions J OEL R INSKY CAMCOS R EPORT D AY , M AY 16, 2007

  3. O UTLINES W HAT IS A REDSHIFT T HE P ROBLEM R EGRESSION PROBLEM D ETAILS ON SOLVING THE PROBLEM L AST SEMESTERS RESULTS R ESULTS M ORE THOROUGH STATEMENT C ONCLUSION AND FUTURE RESULTS O VERVIEW OF LAST SEMESTERS METHODS W HAT IS A REDSHIFT ? Indicates that an object is moving away from you A redshift is the change in wavelength divided by the initial wavelength For example,the sound from this train is shifted and changes pitch when moving away J OEL R INSKY CAMCOS R EPORT D AY , M AY 16, 2007

  4. O UTLINES W HAT IS A REDSHIFT T HE P ROBLEM R EGRESSION PROBLEM D ETAILS ON SOLVING THE PROBLEM L AST SEMESTERS RESULTS R ESULTS M ORE THOROUGH STATEMENT C ONCLUSION AND FUTURE RESULTS O VERVIEW OF LAST SEMESTERS METHODS W HY IS IT IMPORTANT Scientists want to determine the position of galaxies in the universe. Useful for understanding the structure of the universe. J OEL R INSKY CAMCOS R EPORT D AY , M AY 16, 2007

  5. O UTLINES W HAT IS A REDSHIFT T HE P ROBLEM R EGRESSION PROBLEM D ETAILS ON SOLVING THE PROBLEM L AST SEMESTERS RESULTS R ESULTS M ORE THOROUGH STATEMENT C ONCLUSION AND FUTURE RESULTS O VERVIEW OF LAST SEMESTERS METHODS T HE REGRESSION PROBLEM Five photometric observations for each galaxy denoted U,G,R,I,Z J OEL R INSKY CAMCOS R EPORT D AY , M AY 16, 2007

  6. O UTLINES W HAT IS A REDSHIFT T HE P ROBLEM R EGRESSION PROBLEM D ETAILS ON SOLVING THE PROBLEM L AST SEMESTERS RESULTS R ESULTS M ORE THOROUGH STATEMENT C ONCLUSION AND FUTURE RESULTS O VERVIEW OF LAST SEMESTERS METHODS T HE REGRESSION PROBLEM We have 180,000 examples with a known U,G,R,I,Z and redshift. The goal is to be able to predict a new redshift given new U,G,R,I,Z data from a new galaxy. J OEL R INSKY CAMCOS R EPORT D AY , M AY 16, 2007

  7. O UTLINES W HAT IS A REDSHIFT T HE P ROBLEM R EGRESSION PROBLEM D ETAILS ON SOLVING THE PROBLEM L AST SEMESTERS RESULTS R ESULTS M ORE THOROUGH STATEMENT C ONCLUSION AND FUTURE RESULTS O VERVIEW OF LAST SEMESTERS METHODS L AST SEMESTERS RESULTS Last semester predicted the redshift with an error of .0245 Were able to efficiently make use of all 180,000 sets of data J OEL R INSKY CAMCOS R EPORT D AY , M AY 16, 2007

  8. O UTLINES W HAT IS A REDSHIFT T HE P ROBLEM R EGRESSION PROBLEM D ETAILS ON SOLVING THE PROBLEM L AST SEMESTERS RESULTS R ESULTS M ORE THOROUGH STATEMENT C ONCLUSION AND FUTURE RESULTS O VERVIEW OF LAST SEMESTERS METHODS T HIS SEMESTERS RESULTS Completely solved the linear algebra!!! Fast Accurate Stable General J OEL R INSKY CAMCOS R EPORT D AY , M AY 16, 2007

  9. O UTLINES W HAT IS A REDSHIFT T HE P ROBLEM R EGRESSION PROBLEM D ETAILS ON SOLVING THE PROBLEM L AST SEMESTERS RESULTS R ESULTS M ORE THOROUGH STATEMENT C ONCLUSION AND FUTURE RESULTS O VERVIEW OF LAST SEMESTERS METHODS M ORE THOROUGH STATEMENT OF PROBLEM Matrix X : 180 , 000 × 5 (U,G,R,I,Z) Vector y : 180 , 000 × 1 vector, redshift values for training data. Matrix X ∗ : 20 , 000 × 5 matrix, testing data whose redshifts (y*) are unknown. y ∗ , must be predicted. M ICHAEL H URLEY CAMCOS R EPORT D AY , M AY 16, 2007

  10. O UTLINES W HAT IS A REDSHIFT T HE P ROBLEM R EGRESSION PROBLEM D ETAILS ON SOLVING THE PROBLEM L AST SEMESTERS RESULTS R ESULTS M ORE THOROUGH STATEMENT C ONCLUSION AND FUTURE RESULTS O VERVIEW OF LAST SEMESTERS METHODS C OVARIANCE FUNCTIONS AND MATRICES Definition: A covariance function k ( x , x ′ ) is the measure of covariance between input points x and x’. Covariance matrix: K ij = k ( x i , x j ) M ICHAEL H URLEY CAMCOS R EPORT D AY , M AY 16, 2007

  11. O UTLINES W HAT IS A REDSHIFT T HE P ROBLEM R EGRESSION PROBLEM D ETAILS ON SOLVING THE PROBLEM L AST SEMESTERS RESULTS R ESULTS M ORE THOROUGH STATEMENT C ONCLUSION AND FUTURE RESULTS O VERVIEW OF LAST SEMESTERS METHODS E XAMPLES OF COVARIANCE FUNCTIONS AND MATRICES 1st Semester-Polynomial Kernel 2nd Semester-Squared Exponential, Neural Network, Rational Quadratic, Matern Class M ICHAEL H URLEY CAMCOS R EPORT D AY , M AY 16, 2007

  12. O UTLINES W HAT IS A REDSHIFT T HE P ROBLEM R EGRESSION PROBLEM D ETAILS ON SOLVING THE PROBLEM L AST SEMESTERS RESULTS R ESULTS M ORE THOROUGH STATEMENT C ONCLUSION AND FUTURE RESULTS O VERVIEW OF LAST SEMESTERS METHODS T RADITIONAL G AUSSIAN PROCESS EQUATIONS requires solving y ∗ = K ∗ ( λ 2 I + K ) − 1 y ˆ K ∗ is the covariance matrix formed using X ∗ requires O(n 3 ) operations, n = 180,000 M ICHAEL H URLEY CAMCOS R EPORT D AY , M AY 16, 2007

  13. O UTLINES W HAT IS A REDSHIFT T HE P ROBLEM R EGRESSION PROBLEM D ETAILS ON SOLVING THE PROBLEM L AST SEMESTERS RESULTS R ESULTS M ORE THOROUGH STATEMENT C ONCLUSION AND FUTURE RESULTS O VERVIEW OF LAST SEMESTERS METHODS P REVIOUS METHODS USED : RR, CG, CU Reduced Rank Conjugate Gradient Cholesky Update Quadratic regression (classic method) M ICHAEL H URLEY CAMCOS R EPORT D AY , M AY 16, 2007

  14. O UTLINES W HAT IS A REDSHIFT T HE P ROBLEM R EGRESSION PROBLEM D ETAILS ON SOLVING THE PROBLEM L AST SEMESTERS RESULTS R ESULTS M ORE THOROUGH STATEMENT C ONCLUSION AND FUTURE RESULTS O VERVIEW OF LAST SEMESTERS METHODS G IBBS S AMPLER Take a representative random sample of vectors. The vectors’ covariance approaches the inverse of the kernel. Polynomial kernels did not work well, but exponential kernels work better. Slow M ICHAEL H URLEY CAMCOS R EPORT D AY , M AY 16, 2007

  15. O UTLINES T HE P ROBLEM L OW R ANK A PPROXIMATION D ETAILS ON SOLVING THE PROBLEM SR R ESULTS V F ORMULATION C ONCLUSION AND FUTURE RESULTS C OMPUTATIONAL D IFFICULTIES Limited computing resources. Primary Computational issues: Memory: Storing covariance matrix. Time: Solving linear system of equations. A LEX W AAGEN CAMCOS R EPORT D AY , M AY 16, 2007

  16. O UTLINES T HE P ROBLEM L OW R ANK A PPROXIMATION D ETAILS ON SOLVING THE PROBLEM SR R ESULTS V F ORMULATION C ONCLUSION AND FUTURE RESULTS L OW R ANK A PPROXIMATIONS Definition: A low rank matrix ˆ K is a low rank � � � ˆ K − ˆ y ∗ y ∗ approximation of K if � is small. � � ˆ K If ˆ K is positive semi-definite of rank m , an n x m matrix V exists such that ˆ K = VV T A LEX W AAGEN CAMCOS R EPORT D AY , M AY 16, 2007

  17. O UTLINES T HE P ROBLEM L OW R ANK A PPROXIMATION D ETAILS ON SOLVING THE PROBLEM SR R ESULTS V F ORMULATION C ONCLUSION AND FUTURE RESULTS P ARTIAL C HOLESKY D ECOMPOSITION The partial Cholesky decomposition allows us to calculate V such that: A LEX W AAGEN CAMCOS R EPORT D AY , M AY 16, 2007

  18. O UTLINES T HE P ROBLEM L OW R ANK A PPROXIMATION D ETAILS ON SOLVING THE PROBLEM SR R ESULTS V F ORMULATION C ONCLUSION AND FUTURE RESULTS L OW R ANK A PPROXIMATION OF K  v 11 0 0  · · · v 21 v 22 0 · · ·     v 11 v 21 v k − 1 , 1 v k v n . . . · · · · · ·  ...  . . .   . . . 0 v 22 v k − 1 , 2 v 2 k v 2 n · · · · · ·       v k − 1 , 1 v k − 1 , 2 0 . . . . . · · ·  ... ...    . . . . .   . . . . .   v k 1 v k 2 v kn · · ·     0 0 0 v kn v n 2   . . . · · · · · · ... . . .   . . .   v n 1 v n 2 v nn · · · A LEX W AAGEN CAMCOS R EPORT D AY , M AY 16, 2007

  19. O UTLINES T HE P ROBLEM L OW R ANK A PPROXIMATION D ETAILS ON SOLVING THE PROBLEM SR R ESULTS V F ORMULATION C ONCLUSION AND FUTURE RESULTS P ARTIAL C HOLESKY D ECOMPOSITION ( WITH P IVOTING ) Partial cholesky decomposition is O ( nm 2 ) . May be used to compute ˆ K = VV T . Pivoting can be used to improve numerical stability. If m is small, V may be stored in memory. A LEX W AAGEN CAMCOS R EPORT D AY , M AY 16, 2007

  20. O UTLINES T HE P ROBLEM L OW R ANK A PPROXIMATION D ETAILS ON SOLVING THE PROBLEM SR R ESULTS V F ORMULATION C ONCLUSION AND FUTURE RESULTS A DDITIONAL M ETHODS The computation y ∗ = K ∗ ( λ 2 I + K ) − 1 y ˆ requires O(n 3 ) operations Subset of Regressors(SR) due Wahba(1990) V Formulation, a new method Both require O(nm 2 ) operations, where m = 100 << 180000 = n N ABEELA A IJAZ CAMCOS R EPORT D AY , M AY 16, 2007

  21. O UTLINES T HE P ROBLEM L OW R ANK A PPROXIMATION D ETAILS ON SOLVING THE PROBLEM SR R ESULTS V F ORMULATION C ONCLUSION AND FUTURE RESULTS S UBSET OF R EGRESSORS (SR) Form K ≈ VV T K 1 and K 11 are submatrices of K y ∗ = K ∗ ˆ 1 ( λ 2 K 11 + K T 1 K 1 ) − 1 K T 1 y N ABEELA A IJAZ CAMCOS R EPORT D AY , M AY 16, 2007

  22. O UTLINES T HE P ROBLEM L OW R ANK A PPROXIMATION D ETAILS ON SOLVING THE PROBLEM SR R ESULTS V F ORMULATION C ONCLUSION AND FUTURE RESULTS THE MAGIC LEMMA Form K ≈ VV T y ∗ = V ∗ V T ( λ 2 I + VV T ) − 1 y ˆ Magic Lemma: V T ( λ 2 I + VV T ) − 1 = ( λ 2 I + V T V ) − 1 V T ( λ 2 I + VV T ) is 180,000 x 180,000 ( λ 2 I + V T V ) is only 100 x 100 Speed improved by a factor of MILLIONS!! N ABEELA A IJAZ CAMCOS R EPORT D AY , M AY 16, 2007

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend