Sparse PCA / LDA T x A x max T General x B x symmetric matrices - - PowerPoint PPT Presentation

sparse pca lda
SMART_READER_LITE
LIVE PREVIEW

Sparse PCA / LDA T x A x max T General x B x symmetric matrices - - PowerPoint PPT Presentation

Baback Moghaddam Machine Learning Group baback @ jpl.nasa.gov NASA Sounder Science Team Meeting November 3 rd - 5 th 2010 Sparse PCA / LDA T x A x max T General x B x symmetric matrices (given) Problem Formulation k card ( x ) x k subject


slide-1
SLIDE 1

Baback Moghaddam

Machine Learning Group baback @ jpl.nasa.gov NASA Sounder Science Team Meeting November 3rd - 5th 2010

slide-2
SLIDE 2

Sparse PCA / LDA

max

General Problem Formulation symmetric matrices (given)

x B x x A x

T T

subject to : ie.

k x ≤

k x ≤ ) ( card

slide-3
SLIDE 3

subject to : card(x) = k

  • n ~ 103-104 genes
  • m ~ 102 samples
  • 2 classes: cancer vs. healthy

DNA Microarray

slide-4
SLIDE 4

x1 x2

slide-5
SLIDE 5

AQUA/AIRS radiance data (June 4th 2007) 98% of spectral variance is in 500 frequencies (total

  • f 2500), hence

yielding a 5:1 compression

slide-6
SLIDE 6

Detection Performance (ROC)

For FPR > 1% the 12-band detection rate is as good as using all 242 bands, yielding 20:1 compression ratio 1024-by-256 imagery of sulfur-rich Borup Fiord glacier also measured by 242-band Hyperion sensor during 2006-07

Hyperion Sulfur Detection

Sparse Classifier (best 12 of 242 channels)

slide-7
SLIDE 7

July, 2005 ~20,000 spectra

Sec3on of Pacific with stratocumulus, cumulus and deep convec3ve clouds

AIRS Dataset

slide-8
SLIDE 8

AIRS Spectrum

slide-9
SLIDE 9

Cloudy/Clear Classifier : 1-freq

H2O

slide-10
SLIDE 10

Cloudy/Clear Classifier : 2-freqs

H2O CO2

slide-11
SLIDE 11

Cloudy/Clear Classifier : 5-freqs

CO2 H2O O3

slide-12
SLIDE 12

Cloudy/Clear Classifier : 50-freqs

CO2 H2O O3

slide-13
SLIDE 13

Current Work

  • Algorithmic Enhancements
  • formulated Sparse-LDA as Sparse Regression problem
  • this speeds up optimization, reduces CPU time by factor of ~103
  • Dataset Preparation
  • Selected suitable hyperspectral datasets from AIRS archive
  • IR spectra for a whole month (huge data matrix = 20000 x 1843)
  • Visual data in four frequency bands (from AIRS VIS instrument)
  • Demo of AIRS Cloudy/Clear sparse classifier
  • Separation of cloudy from clear data based on Level 1 data
  • Worked with Prof. Yung (Caltech) using their method of cloud separation
  • Tested 2 methods of cloud separation by AIRS Project Scientist G. Aumann
slide-14
SLIDE 14

Future Work

  • Methodology
  • Test current algorithm with a 3rd cloud separation criterion

(based on CO2 retrievals) as suggested/used by Bill Irion (JPL)

  • Select more varied AIRS datasets (ocean and land regions

separated) and perform cross-dataset validation

  • Missions
  • Meet with AIRS Science Team to discuss and propose new product

(L1 D) based on our preliminary Sparse-LDA algorithm results

slide-15
SLIDE 15

Publications