pa n g e o

Pa n g e o A c o m m u n i t y- d r i v e n e f f o r t f o r B - PowerPoint PPT Presentation

Pa n g e o A c o m m u n i t y- d r i v e n e f f o r t f o r B i g D ata g e o s c i e n c e 2 G l o b a l w a r m i n g i s h a p p e n i n g ! 3 W h at D r i v e s P r o g r e s s i n G E O S c i e n c e ? 3 W h at


  1. Pa n g e o A c o m m u n i t y- d r i v e n e f f o r t f o r 
 B i g D ata g e o s c i e n c e

  2. � 2 G l o b a l w a r m i n g i s h a p p e n i n g !

  3. � 3 W h at D r i v e s P r o g r e s s i n G E O S c i e n c e ?

  4. � 3 W h at D r i v e s P r o g r e s s i n G E O S c i e n c e ? New Ideas

  5. � 3 W h at D r i v e s P r o g r e s s i n G E O S c i e n c e ? New Ideas ð N / j U j E 5 r 0 j U j q ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ffi q ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi N 2 2 j U j 2 k 2 j U j 2 k 2 2 f 2 P 1D ( k ) dk , p j f j / j U j

  6. � 3 W h at D r i v e s P r o g r e s s i n G E O S c i e n c e ? New Ideas New Observations ð N / j U j E 5 r 0 j U j q ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ffi q ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi N 2 2 j U j 2 k 2 j U j 2 k 2 2 f 2 P 1D ( k ) dk , p j f j / j U j

  7. � 3 W h at D r i v e s P r o g r e s s i n G E O S c i e n c e ? New Ideas New Observations ð N / j U j E 5 r 0 j U j q ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ffi q ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi N 2 2 j U j 2 k 2 j U j 2 k 2 2 f 2 P 1D ( k ) dk , p j f j / j U j

  8. � 3 W h at D r i v e s P r o g r e s s i n G E O S c i e n c e ? New Ideas New Observations ð N / j U j E 5 r 0 j U j q ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ffi q ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi N 2 2 j U j 2 k 2 j U j 2 k 2 2 f 2 P 1D ( k ) dk , p j f j / j U j

  9. � 3 W h at D r i v e s P r o g r e s s i n G E O S c i e n c e ? New Ideas New Observations ð N / j U j E 5 r 0 j U j q ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ffi q ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi N 2 2 j U j 2 k 2 j U j 2 k 2 2 f 2 P 1D ( k ) dk , p j f j / j U j New Simulations

  10. � 3 W h at D r i v e s P r o g r e s s i n G E O S c i e n c e ? New Ideas New Observations ð N / j U j E 5 r 0 j U j q ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ffi q ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi N 2 2 j U j 2 k 2 j U j 2 k 2 2 f 2 P 1D ( k ) dk , p j f j / j U j New Simulations

  11. � 3 W h at D r i v e s P r o g r e s s i n G E O S c i e n c e ? New Ideas New Observations ð N / j U j E 5 r 0 j U j q ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ffi q ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi N 2 2 j U j 2 k 2 j U j 2 k 2 2 f 2 P 1D ( k ) dk , p j f j / j U j New Simulations

  12. � 4 Credit: NASA JPL / Dimitris Menemenlis

  13. � 4 Credit: NASA JPL / Dimitris Menemenlis

  14. � 5 M a j o r S c i e n c e Q u e s t i o n s • How is energy transferred across scales and dissipated in the ocean? • How do mesoscales / submesoscales / tides / internal waves contribute to the transport of heat / salt / dissolved tracers vertically and horizontally? • How does abyssal flow navigate complex small-scale topography (e.g. shelf overflows, Indonesian Throughflow, abyssal canyons)? • How should we represent these processes in coarse resolution climate models? dozens of high impact papers are waiting to be written!

  15. � 6 M y B i g D ata J o u r n e y discovered 
 Big Data 2013 2014 2015 2016 2017 2018 started at Columbia wandered the desert

  16. � 6 M y B i g D ata J o u r n e y discovered 
 Big Data 2013 2014 2015 2016 2017 2018 started at Columbia wandered the desert

  17. � 7 M y B i g D ata J o u r n e y discovered 
 discovered Big Data xarray! 2013 2014 2015 2016 2017 2018 started at Columbia wandered the desert

  18. � 8 S c i e n t i f i c P y t h o n f o r D ata S c i e n c e source: stackoverflow.com

  19. � 9 S c i e n t i f i c P y t h o n f o r D ata S c i e n c e aospy SciPy Credit: Stephan Hoyer, Jake Vanderplas (SciPy 2015)

  20. � 9 S c i e n t i f i c P y t h o n f o r D ata S c i e n c e aospy SciPy Credit: Stephan Hoyer, Jake Vanderplas (SciPy 2015)

  21. � 10 X a r r ay D ata s e t: M u lt i d i m e n s i o n a l Va r i a b l e s 
 w i t h c o o r d i n at e s a n d m e ta d ata land_cover elevation Indexes align data latitude + Attributes metadata ignored longitude by operations time Data variables Coordinates used for computation describe data “netCDF meets pandas.DataFrame” Credit: Stephan Hoyer

  22. � 11 x a r r ay m a k e s s c i e n c e e a s y import xarray as xr ds = xr.open_dataset('NOAA_NCDC_ERSST_v3b_SST.nc') ds <xarray.Dataset> Dimensions: (lat: 89, lon: 180, time: 684) Coordinates: * lat (lat) float32 -88.0 -86.0 -84.0 -82.0 -80.0 -78.0 -76.0 -74.0 ... * lon (lon) float32 0.0 2.0 4.0 6.0 8.0 10.0 12.0 14.0 16.0 18.0 20.0 ... * time (time) datetime64[ns] 1960-01-15 1960-02-15 1960-03-15 ... Data variables: sst (time, lat, lon) float64 nan nan nan nan nan nan nan nan nan ... Attributes: Conventions: IRIDL source: https://iridl.ldeo.columbia.edu/SOURCES/.NOAA/.NCDC/.ERSST/...

  23. � 12 x a r r ay : l a b e l- b a s e d s e l e c t i o n # select and plot data from my birthday ds.sst.sel(time='1982-08-07', method='nearest').plot()

  24. � 13 x a r r ay : l a b e l- b a s e d o p e r at i o n s # zonal and time mean temperature ds.sst.mean(dim=(‘time', 'lon')).plot()

  25. � 14 x a r r ay : g r o u p i n g a n d a g g r e g at i o n sst_clim = sst.groupby('time.month').mean(dim='time') sst_anom = sst.groupby('time.month') - sst_clim nino34_index = (sst_anom.sel(lat=slice(-5, 5), lon=slice(190, 240)) .mean(dim=('lon', 'lat')) .rolling(time=3).mean(dim='time')) nino34_index.plot()

  26. � 15 x a r r ay https://github.com/pydata/xarray label-based indexing and arithmetic • interoperability with the core scientific Python packages (e.g., • pandas, NumPy, Matplotlib) out-of-core computation on datasets that don’t fit into memory • (thanks dask!) wide range of input/output (I/O) options: netCDF, HDF, geoTIFF, zarr • advanced multi-dimensional data manipulation tools such as group- • by and resampling

  27. � 16 L e g a c y s o f t w a r e NASA Panoply INGRID

  28. 
 � 17 d a s k https://github.com/dask/dask/ Complex computations represented as a graph of ND-Arrays are split into chunks that individual tasks. comfortably fit in memory Scheduler optimizes execution of graph.

  29. 
 � 17 d a s k https://github.com/dask/dask/ Complex computations represented as a graph of ND-Arrays are split into chunks that individual tasks. comfortably fit in memory Scheduler optimizes execution of graph.

  30. � 18 E x a m p l e C a l c u l at i o n : Ta k e t h e M e a n ! multidimensional 
 serial execution (a loop) array read chunk read chunk read chunk reduce reduce reduce from disk from disk from disk store store store reduce

  31. � 19 E x a m p l e C a l c u l at i o n : Ta k e t h e M e a n ! multidimensional 
 array read chunk read chunk read chunk reduce reduce reduce from disk from disk from disk

  32. � 19 E x a m p l e C a l c u l at i o n : Ta k e t h e M e a n ! multidimensional 
 array read chunk store reduce from disk read chunk reduce store reduce from disk read chunk store reduce from disk parallel execution (dask graph)

  33. � 20 M y B i g D ata J o u r n e y discovered 
 discovered first Pangeo Big Data xarray! workshop 2013 2014 2015 2016 2017 2018 started at Columbia wandered the desert used xarray on datasets up to ~200 GB connected with xarray community

  34. � 21 Pa n g e o P r o j e c t g o a l s • Foster collaboration around the open source scientific python ecosystem for ocean / atmosphere / land / climate science. • Support the development with domain-specific geoscience packages. • Improve scalability of these tools to to handle petabyte-scale datasets on HPC and cloud platforms.

  35. � 22 M y B i g D ata J o u r n e y Earthcube proposal awarded discovered 
 discovered first Pangeo Big Data xarray! workshop 2013 2014 2015 2016 2017 2018 started at pangeo.pydata.org Columbia wandered the used xarray on datasets up to ~200 desert GB connected with fantastic xarray community

  36. 
 
 
 
 
 
 � 23 E a r t h c u b e A w a r d T e a m Ryan Abernathey, Chiara Lepore, Michael Tippet, Naomi Henderson, Richard Seager Kevin Paul, Joe Hamman, Ryan May, Davide Del Vento Matthew Rocklin

  37. � 24 O t h e r C o n t r i b u t o r s Jacob Tomlinson, Niall Roberts, Alberto Arribas Developing and operating Pangeo environment to support analysis of UK Met office products Rich Signell Deploying Pangeo on AWS to support analysis of coastal ocean modeling Justin Simcock Operating Pangeo in the cloud to support Climate Impact Lab research and analysis Supporting Pangeo via SWOT mission and recently funded ACCESS award to UW / NCAR 🎊 Yuvi Panda, Chris Holdgraf Spending lots of time helping us make things work on the cloud

Recommend


More recommend