Applications of Bayesian Classification to Data Management - - PowerPoint PPT Presentation

applications of bayesian classification to data management
SMART_READER_LITE
LIVE PREVIEW

Applications of Bayesian Classification to Data Management - - PowerPoint PPT Presentation

Applications of Bayesian Classification to Data Management Christopher Lynnes NASA/GSFC Co-Authors: S. Berrick, A. Gopalan, X. Hua, S. Shen, P. Smith, K. Yang NASA/GSFC K. Wheeler, C. Curry NASA/ARC 1 6/30/04 Problem Statement Science


slide-1
SLIDE 1

6/30/04

1

Applications of Bayesian Classification to Data Management

Christopher Lynnes NASA/GSFC Co-Authors:

  • S. Berrick, A. Gopalan, X. Hua, S. Shen, P. Smith, K. Yang

NASA/GSFC

  • K. Wheeler, C. Curry

NASA/ARC

slide-2
SLIDE 2

6/30/04

2

Problem Statement

Science data volume keeps pace with technology.

slide-3
SLIDE 3

6/30/04

3

Data demands are also increasing

  • Lower Latency: driven by applications
  • Online access: driven by machine-to-machine

interfaces (e.g., models)

  • Volume: driven by advances in computing and

data mining

  • A solution is to manage data according to their

“usefulness”.

slide-4
SLIDE 4

6/30/04

4

Data Management Today: black-box paradigm

  • Data are managed as largely opaque objects

– albeit with labels (metadata) and “cover art” (browse)

Process Store CACHE

Archive

Subscrip- tions

slide-5
SLIDE 5

6/30/04

5

Content-based Data Management

Process Store CACHE

Archive

Subscrip- tions

cache

  • ptimization

Subscription: “send data when my study area is clear” Subsetting: “give me just the clear pixels ”

purge keep

Automatic quality assessment

12 11 10 9 8 7 6 3 2 1 4 5 12 11 10 9 8 7 6 3 2 1 4 5 12 11 10 9 8 7 6 3 2 1 4 5 12 11 10 9 8 7 6 3 2 1 4 5 = time-critical

slide-6
SLIDE 6

6/30/04

6

Usefulness is in the eye of the beholder

Pixel Characteristics

Clear-Sky

Study Type Cloud Properties X Aerosols X (X) X X Ocean Color X Land Vegetation X Snow Cover/Sea Ice X Wildfires X

slide-7
SLIDE 7

6/30/04

7

Characterization of MODIS Calibrated Radiance

  • Most popular product at Goddard DAAC
  • Train algorithm to classify pixels

– Cloud, glint, land, water, etc.

  • Speed of the forward algorithm is critical.

– However, we can afford time and CPU for training.

  • Products from science algorithms train machine

learning algorithms

– Products as proxy for domain experts – Nearly unlimited supply of training and test data – Circular logic if we were making science products… – …but in the decision support domain, it serves as a high-speed approximator to the science algorithm.

slide-8
SLIDE 8

6/30/04

8

Bayesian Classification Applied to MODIS Calibrated Radiance

  • Bayesian classification:

– Pr(C|E) = Π Pr(Ei |C) × Pr(C) / Π Pr(Ei) – Where C is a class – And Ei are measurements of independent variables (evidence). – Pr(C) is the prior probability

  • Training: Compute frequency histograms for Ei|C

– MODIS cloudmask and ocean color products “train” the classifier.

Frequency Histogram for Band 1

0% 2% 4% 6% 8% 10% 12% 14%

  • 0.5

0.0 0.5 1.0 1.5 2.0 2.5

Log(Calibrated Radiance)

Percentage of Points in Class

Cloud Desert Glint Ice Land Water Coast

slide-9
SLIDE 9

6/30/04

9

Prior Probabilities

  • Prior probabilities are “known” statistics for the earth

– Regional and Seasonal variations – Derived from MODIS Level 3 gridded products

December to February

slide-10
SLIDE 10

6/30/04

10

Practical Classification - Application

  • For each class:

– Look up the probability for each band measurement in frequency histograms – Compute product to get the overall probability for membership in that class – Choose the class with the highest overall probability

Frequency Distribution of Band 1 0% 5% 10% 15% 20% 0.5 1 1.5 2 2.5 3 Log10(Radiance)

Cloud Desert Glint Snow/Ice Land Water Coast

slide-11
SLIDE 11

6/30/04

11

Bayesian Classification Example

Bayesian classification using bands 1, 2, 2/1, 31, 32

MODIS Cloudmask Product

Cloud Cloud Desert Desert Water Water Ice Ice Glint Glint Land Land Coast Coast

Terra/MODIS scene for 16:20-16:25Z, 2003-10-16

slide-12
SLIDE 12

6/30/04

12

Timing Results

*Bayesian classification on 250 MHz SGI, as a function of number of bands used

400 800 1200 1600 2000

Secs

1 2 3 4 5 6 7

  • Sci. Alg.

Number of Bands

Algorithm Timing for 300 s of Data

slide-13
SLIDE 13

6/30/04

13

  • Add algorithm to Direct Broadcast processing stream

Exploitation of Classification Results

3-meter X-band Antenna

Direct Readout Laboratory Geolocation

Geolocation L0 Reformatted in HDF

Calibration Level 1a Processor

Level 0 Raw Data

Terra

Classification

Calibrated Radiances Calibrated Radiances w/supplement

band 1 band 2 band 36 class layer

slide-14
SLIDE 14

6/30/04

14

Content-Based Subsetting

Deliver just the pixels likely to be useful e.g., cloud-free

1. Classify using Bayesian classifier 2. Zero out pixels classified as cloud 3. Apply lossless compression Currently implemented as an on-the-fly conversion in WUSTL FTP, e.g.: ftp g0dug03u.ecs.nasa.gov >cd /datapool/OPS/user/MODB/RMT021KM.001 >ls >cd 2004.06.13 >ls *.hdf >get RMT021KM.A2004165.1843.001.2004166072602.hdf.clr

slide-15
SLIDE 15

6/30/04

15

Content-Based Data Selection

  • Today: “select scenes

where cloud cover < 50%”

– Less than foolproof

  • Tomorrow: “select scenes

where Lake Winnebago is visible”

  • Ad hoc indexing / queries

are difficult, but...

  • ...subscription queries

should be tractable

– “Is anyone looking for data that are clear for a particular area in this scene?”

study area

slide-16
SLIDE 16

6/30/04

16

Automated Quality Assessment of Geolocation

  • Compare observed land-water pattern with land-

sea mask based on geolocation

– Systematic geolocation error ⇒ systematic shift in pattern

  • Technique:

– Classify land/water/cloud from geolocated radiance – Assign +1.0 to land, -1.0 to water

  • Assign “unknown” classes a random number in the interval

(-1.0, +1.0)

– Cloud, snow/ice in classification – Ephemeral water in land-sea mask

– Compute cross-correlation using 2-D FFT

slide-17
SLIDE 17

6/30/04

17

Geolocation Case Study

  • Terra/MODIS data for 19 June 2002 reprocessed

with the usual onboard attitude and ephemeris

  • But: a spacecraft maneuver made the onboard

data inaccurate

– Typically, definitive attitude/ephemeris are used in the vicinity of maneuvers

  • Several months later…a group studying land

cover change identified errors in geolocation

slide-18
SLIDE 18

6/30/04

18

Geolocation Shift Effect

Land-sea mask Bayesian classification Geolocation shift Cross-Correlation