applications of bayesian classification to data management
play

Applications of Bayesian Classification to Data Management - PowerPoint PPT Presentation

Applications of Bayesian Classification to Data Management Christopher Lynnes NASA/GSFC Co-Authors: S. Berrick, A. Gopalan, X. Hua, S. Shen, P. Smith, K. Yang NASA/GSFC K. Wheeler, C. Curry NASA/ARC 1 6/30/04 Problem Statement Science


  1. Applications of Bayesian Classification to Data Management Christopher Lynnes NASA/GSFC Co-Authors: S. Berrick, A. Gopalan, X. Hua, S. Shen, P. Smith, K. Yang NASA/GSFC K. Wheeler, C. Curry NASA/ARC 1 6/30/04

  2. Problem Statement Science data volume keeps pace with technology. 2 6/30/04

  3. Data demands are also increasing • Lower Latency: driven by applications • Online access: driven by machine-to-machine interfaces (e.g., models) • Volume: driven by advances in computing and data mining • A solution is to manage data according to their “usefulness”. 3 6/30/04

  4. Data Management Today: black-box paradigm • Data are managed as largely opaque objects – albeit with labels (metadata) and “cover art” (browse) Subscrip- tions CACHE Process Store Archive 4 6/30/04

  5. Content-based Data Management 12 11 1 Subscription: 10 2 Subsetting: 9 3 8 7 6 4 5 “send data when my “give me just the study area is clear” clear pixels ” CACHE Subscrip- tions purge keep 12 11 1 cache 10 2 9 3 8 7 6 4 5 optimization Process Store Archive Automatic 12 11 1 10 2 9 3 12 11 1 5 = time-critical 8 7 6 4 10 2 5 9 3 quality assessment 8 7 6 4 5 6/30/04

  6. Usefulness is in the eye of the beholder Pixel Characteristics Clear-Sky Study Type Cloud Properties X Aerosols X (X) X X Ocean Color X Land Vegetation X Snow Cover/Sea Ice X Wildfires X 6 6/30/04

  7. Characterization of MODIS Calibrated Radiance • Most popular product at Goddard DAAC • Train algorithm to classify pixels – Cloud, glint, land, water, etc. • Speed of the forward algorithm is critical. – However, we can afford time and CPU for training. • Products from science algorithms train machine learning algorithms – Products as proxy for domain experts – Nearly unlimited supply of training and test data – Circular logic if we were making science products… – …but in the decision support domain, it serves as a high-speed approximator to the science algorithm. 7 6/30/04

  8. Bayesian Classification Applied to MODIS Calibrated Radiance • Bayesian classification: – Pr(C|E) = Π Pr( E i |C) × Pr(C) / Π Pr( E i ) – Where C is a class – And E i are measurements of independent variables (evidence). – Pr(C) is the prior probability • Training: Compute frequency histograms for E i |C – MODIS cloudmask and ocean color products “train” the classifier. Percentage of Points Frequency Histogram for Band 1 14% Cloud 12% Desert in Class 10% Glint 8% Ice 6% Land Water 4% Coast 2% 0% -0.5 0.0 0.5 1.0 1.5 2.0 2.5 Log(Calibrated Radiance) 8 6/30/04

  9. Prior Probabilities • Prior probabilities are “known” statistics for the earth – Regional and Seasonal variations – Derived from MODIS Level 3 gridded products December to February 9 6/30/04

  10. Practical Classification - Application • For each class: – Look up the probability for each band measurement in frequency histograms – Compute product to get the overall probability for membership in that class – Choose the class with the highest overall probability Frequency Distribution of Band 1 20% Cloud Desert Glint 15% Snow/Ice Land Water 10% Coast 5% 0% 0 0.5 1 1.5 2 2.5 3 Log10(Radiance) 10 6/30/04

  11. Bayesian Classification Example Bayesian classification using Terra/MODIS scene for MODIS Cloudmask Product bands 1, 2, 2/1, 31, 32 16:20-16:25Z, 2003-10-16 Cloud Cloud Desert Desert Land Land Water Water Coast Coast Glint Glint Ice Ice 11 6/30/04

  12. Timing Results Algorithm Timing for 300 s of Data 2000 1600 1200 Secs 800 400 0 1 2 3 4 5 6 7 Sci. Alg. Number of Bands *Bayesian classification on 250 MHz SGI, as a function of number of bands used 12 6/30/04

  13. Exploitation of Classification Results • Add algorithm to Direct Broadcast processing stream Calibrated Radiances Terra w/supplement class layer Classification band 36 Calibrated Radiances Calibration Geolocation band 2 L0 Reformatted in HDF Geolocation Level 1a Processor band 1 Level 0 Raw Data 3-meter X-band Antenna Direct Readout Laboratory 13 6/30/04

  14. Content-Based Subsetting Deliver just the pixels likely to be useful e.g., cloud-free 1. Classify using Bayesian classifier 2. Zero out pixels classified as cloud 3. Apply lossless compression Currently implemented as an on-the-fly conversion in WUSTL FTP, e.g.: ftp g0dug03u.ecs.nasa.gov >cd /datapool/OPS/user/MODB/RMT021KM.001 >ls >cd 2004.06.13 >ls *.hdf >get RMT021KM.A2004165.1843.001.2004166072602.hdf. clr 14 6/30/04

  15. Content-Based Data Selection • Today: “select scenes where cloud cover < 50%” – Less than foolproof study area • Tomorrow: “select scenes where Lake Winnebago is visible” • Ad hoc indexing / queries are difficult, but... • ...subscription queries should be tractable – “Is anyone looking for data that are clear for a particular area in this scene?” 15 6/30/04

  16. Automated Quality Assessment of Geolocation • Compare observed land-water pattern with land- sea mask based on geolocation – Systematic geolocation error ⇒ systematic shift in pattern • Technique: – Classify land/water/cloud from geolocated radiance – Assign +1.0 to land, -1.0 to water • Assign “unknown” classes a random number in the interval (-1.0, +1.0) – Cloud, snow/ice in classification – Ephemeral water in land-sea mask – Compute cross-correlation using 2-D FFT 16 6/30/04

  17. Geolocation Case Study • Terra/MODIS data for 19 June 2002 reprocessed with the usual onboard attitude and ephemeris • But: a spacecraft maneuver made the onboard data inaccurate – Typically, definitive attitude/ephemeris are used in the vicinity of maneuvers • Several months later…a group studying land cover change identified errors in geolocation 17 6/30/04

  18. 6/30/04 Bayesian classification Land-sea mask Geolocation Shift Effect Cross-Correlation Geolocation shift 18

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend