Pa n g e o A c o m m u n i t y- d r i v e n e f f o r t f o r B i g D ata g e o s c i e n c e
� 2 G l o b a l w a r m i n g i s h a p p e n i n g !
� 3 W h at D r i v e s P r o g r e s s i n G E O S c i e n c e ?
� 3 W h at D r i v e s P r o g r e s s i n G E O S c i e n c e ? New Ideas
� 3 W h at D r i v e s P r o g r e s s i n G E O S c i e n c e ? New Ideas ð N / j U j E 5 r 0 j U j q ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ffi q ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi N 2 2 j U j 2 k 2 j U j 2 k 2 2 f 2 P 1D ( k ) dk , p j f j / j U j
� 3 W h at D r i v e s P r o g r e s s i n G E O S c i e n c e ? New Ideas New Observations ð N / j U j E 5 r 0 j U j q ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ffi q ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi N 2 2 j U j 2 k 2 j U j 2 k 2 2 f 2 P 1D ( k ) dk , p j f j / j U j
� 3 W h at D r i v e s P r o g r e s s i n G E O S c i e n c e ? New Ideas New Observations ð N / j U j E 5 r 0 j U j q ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ffi q ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi N 2 2 j U j 2 k 2 j U j 2 k 2 2 f 2 P 1D ( k ) dk , p j f j / j U j
� 3 W h at D r i v e s P r o g r e s s i n G E O S c i e n c e ? New Ideas New Observations ð N / j U j E 5 r 0 j U j q ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ffi q ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi N 2 2 j U j 2 k 2 j U j 2 k 2 2 f 2 P 1D ( k ) dk , p j f j / j U j
� 3 W h at D r i v e s P r o g r e s s i n G E O S c i e n c e ? New Ideas New Observations ð N / j U j E 5 r 0 j U j q ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ffi q ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi N 2 2 j U j 2 k 2 j U j 2 k 2 2 f 2 P 1D ( k ) dk , p j f j / j U j New Simulations
� 3 W h at D r i v e s P r o g r e s s i n G E O S c i e n c e ? New Ideas New Observations ð N / j U j E 5 r 0 j U j q ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ffi q ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi N 2 2 j U j 2 k 2 j U j 2 k 2 2 f 2 P 1D ( k ) dk , p j f j / j U j New Simulations
� 3 W h at D r i v e s P r o g r e s s i n G E O S c i e n c e ? New Ideas New Observations ð N / j U j E 5 r 0 j U j q ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ffi q ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi N 2 2 j U j 2 k 2 j U j 2 k 2 2 f 2 P 1D ( k ) dk , p j f j / j U j New Simulations
� 4 Credit: NASA JPL / Dimitris Menemenlis
� 4 Credit: NASA JPL / Dimitris Menemenlis
� 5 M a j o r S c i e n c e Q u e s t i o n s • How is energy transferred across scales and dissipated in the ocean? • How do mesoscales / submesoscales / tides / internal waves contribute to the transport of heat / salt / dissolved tracers vertically and horizontally? • How does abyssal flow navigate complex small-scale topography (e.g. shelf overflows, Indonesian Throughflow, abyssal canyons)? • How should we represent these processes in coarse resolution climate models? dozens of high impact papers are waiting to be written!
� 6 M y B i g D ata J o u r n e y discovered Big Data 2013 2014 2015 2016 2017 2018 started at Columbia wandered the desert
� 6 M y B i g D ata J o u r n e y discovered Big Data 2013 2014 2015 2016 2017 2018 started at Columbia wandered the desert
� 7 M y B i g D ata J o u r n e y discovered discovered Big Data xarray! 2013 2014 2015 2016 2017 2018 started at Columbia wandered the desert
� 8 S c i e n t i f i c P y t h o n f o r D ata S c i e n c e source: stackoverflow.com
� 9 S c i e n t i f i c P y t h o n f o r D ata S c i e n c e aospy SciPy Credit: Stephan Hoyer, Jake Vanderplas (SciPy 2015)
� 9 S c i e n t i f i c P y t h o n f o r D ata S c i e n c e aospy SciPy Credit: Stephan Hoyer, Jake Vanderplas (SciPy 2015)
� 10 X a r r ay D ata s e t: M u lt i d i m e n s i o n a l Va r i a b l e s w i t h c o o r d i n at e s a n d m e ta d ata land_cover elevation Indexes align data latitude + Attributes metadata ignored longitude by operations time Data variables Coordinates used for computation describe data “netCDF meets pandas.DataFrame” Credit: Stephan Hoyer
� 11 x a r r ay m a k e s s c i e n c e e a s y import xarray as xr ds = xr.open_dataset('NOAA_NCDC_ERSST_v3b_SST.nc') ds <xarray.Dataset> Dimensions: (lat: 89, lon: 180, time: 684) Coordinates: * lat (lat) float32 -88.0 -86.0 -84.0 -82.0 -80.0 -78.0 -76.0 -74.0 ... * lon (lon) float32 0.0 2.0 4.0 6.0 8.0 10.0 12.0 14.0 16.0 18.0 20.0 ... * time (time) datetime64[ns] 1960-01-15 1960-02-15 1960-03-15 ... Data variables: sst (time, lat, lon) float64 nan nan nan nan nan nan nan nan nan ... Attributes: Conventions: IRIDL source: https://iridl.ldeo.columbia.edu/SOURCES/.NOAA/.NCDC/.ERSST/...
� 12 x a r r ay : l a b e l- b a s e d s e l e c t i o n # select and plot data from my birthday ds.sst.sel(time='1982-08-07', method='nearest').plot()
� 13 x a r r ay : l a b e l- b a s e d o p e r at i o n s # zonal and time mean temperature ds.sst.mean(dim=(‘time', 'lon')).plot()
� 14 x a r r ay : g r o u p i n g a n d a g g r e g at i o n sst_clim = sst.groupby('time.month').mean(dim='time') sst_anom = sst.groupby('time.month') - sst_clim nino34_index = (sst_anom.sel(lat=slice(-5, 5), lon=slice(190, 240)) .mean(dim=('lon', 'lat')) .rolling(time=3).mean(dim='time')) nino34_index.plot()
� 15 x a r r ay https://github.com/pydata/xarray label-based indexing and arithmetic • interoperability with the core scientific Python packages (e.g., • pandas, NumPy, Matplotlib) out-of-core computation on datasets that don’t fit into memory • (thanks dask!) wide range of input/output (I/O) options: netCDF, HDF, geoTIFF, zarr • advanced multi-dimensional data manipulation tools such as group- • by and resampling
� 16 L e g a c y s o f t w a r e NASA Panoply INGRID
� 17 d a s k https://github.com/dask/dask/ Complex computations represented as a graph of ND-Arrays are split into chunks that individual tasks. comfortably fit in memory Scheduler optimizes execution of graph.
� 17 d a s k https://github.com/dask/dask/ Complex computations represented as a graph of ND-Arrays are split into chunks that individual tasks. comfortably fit in memory Scheduler optimizes execution of graph.
� 18 E x a m p l e C a l c u l at i o n : Ta k e t h e M e a n ! multidimensional serial execution (a loop) array read chunk read chunk read chunk reduce reduce reduce from disk from disk from disk store store store reduce
� 19 E x a m p l e C a l c u l at i o n : Ta k e t h e M e a n ! multidimensional array read chunk read chunk read chunk reduce reduce reduce from disk from disk from disk
� 19 E x a m p l e C a l c u l at i o n : Ta k e t h e M e a n ! multidimensional array read chunk store reduce from disk read chunk reduce store reduce from disk read chunk store reduce from disk parallel execution (dask graph)
� 20 M y B i g D ata J o u r n e y discovered discovered first Pangeo Big Data xarray! workshop 2013 2014 2015 2016 2017 2018 started at Columbia wandered the desert used xarray on datasets up to ~200 GB connected with xarray community
� 21 Pa n g e o P r o j e c t g o a l s • Foster collaboration around the open source scientific python ecosystem for ocean / atmosphere / land / climate science. • Support the development with domain-specific geoscience packages. • Improve scalability of these tools to to handle petabyte-scale datasets on HPC and cloud platforms.
� 22 M y B i g D ata J o u r n e y Earthcube proposal awarded discovered discovered first Pangeo Big Data xarray! workshop 2013 2014 2015 2016 2017 2018 started at pangeo.pydata.org Columbia wandered the used xarray on datasets up to ~200 desert GB connected with fantastic xarray community
� 23 E a r t h c u b e A w a r d T e a m Ryan Abernathey, Chiara Lepore, Michael Tippet, Naomi Henderson, Richard Seager Kevin Paul, Joe Hamman, Ryan May, Davide Del Vento Matthew Rocklin
� 24 O t h e r C o n t r i b u t o r s Jacob Tomlinson, Niall Roberts, Alberto Arribas Developing and operating Pangeo environment to support analysis of UK Met office products Rich Signell Deploying Pangeo on AWS to support analysis of coastal ocean modeling Justin Simcock Operating Pangeo in the cloud to support Climate Impact Lab research and analysis Supporting Pangeo via SWOT mission and recently funded ACCESS award to UW / NCAR 🎊 Yuvi Panda, Chris Holdgraf Spending lots of time helping us make things work on the cloud
Recommend
More recommend