Data Science for scaling water research Jordan S Read, USGS Office - - PowerPoint PPT Presentation

data science for scaling water research
SMART_READER_LITE
LIVE PREVIEW

Data Science for scaling water research Jordan S Read, USGS Office - - PowerPoint PPT Presentation

Data Science for scaling water research Jordan S Read, USGS Office of Water Information U.S. Department of the Interior U.S. Geological Survey Water Issues at Continental Scales Water quality and quantity are changing at a scale never seen


slide-1
SLIDE 1

U.S. Department of the Interior U.S. Geological Survey

Data Science for scaling water research

Jordan S Read, USGS Office of Water Information

slide-2
SLIDE 2
  • Water quality and quantity are changing at a scale never seen before
  • Global food and energy networks place demands on water resources,

and we are only beginning to understand the implications

Societal need: Understand water use trade-offs for energy, the environment, and human health

Water Issues at Continental Scales

slide-3
SLIDE 3

Existing USGS resources and strengths

Data People Science

A network of… Integrity

slide-4
SLIDE 4

Existing USGS resources and strengths: Data

  • 6.7M lakes, ponds, and

impoundments

  • 2.6M stream reaches

The national hydrography dataset

https://nhd.usgs.gov

slide-5
SLIDE 5
  • > 850,000 station years of stream

levels, discharge, reservoir and lake levels, surface-water quality, and rainfall

National Water Information System

Existing USGS resources and strengths: Data

https://waterdata.usgs.gov

slide-6
SLIDE 6
  • Water use data: How and where

we use water (1950-2010)

  • Various categories and spatial

resolution of reporting

National Water Information System

Existing USGS resources and strengths: Data

https://water.usgs.gov/watuse

slide-7
SLIDE 7

Existing USGS resources and strengths: Data

  • > 450 monitoring groups
  • 2.7M sites, ~300M records
  • Upstream/downstream queries

The water quality portal

https://www.waterqualitydata.us

slide-8
SLIDE 8

Existing USGS resources and strengths: Data

  • > 40 years of moderate

resolution multispectral data

  • 51M+ scenes downloaded

USGS Landsat

https://landsat.usgs.gov/

slide-9
SLIDE 9

Emerging challenges

  • A new observation paradigm
  • Shifts in the design of research collaborations
  • Declining research budgets

Understand water use trade-offs for energy, the environment, and human health

slide-10
SLIDE 10

Emerging challenges

  • A new observation paradigm
  • Shifts in the design of research collaborations
  • Declining research budgets

Understand water use trade-offs for energy, the environment, and human health

slide-11
SLIDE 11

Emerging challenges: A new observation paradigm

Understand water use trade-offs for energy, the environment, and human health

  • Continuous or discrete measurements from a site (e.g., gage height)
  • Uniform grids (e.g., images; climate data)

Familiar datasets for water resources: Familiar data exchange formats:

  • waterML2.0; *.csv
  • geotiff; netCDF

With familiar tools

slide-12
SLIDE 12

Emerging challenges: A new observation paradigm

  • Environmental DNA (eDNA)
  • “census the wild with a jar”
  • Moving sensors or frames of reference
  • Unmanned vehicles/sensors; structure from motion
  • Footprint or integrator measurements
  • Environmental exposure/duration; passive samplers
  • Data variety
  • Citizen; hyperspectral; WQ lab sample; RS image
  • Internet of things
  • Lab on a chip; personalized water use data; water

infrastructure monitoring

Understand water use trade-offs for energy, the environment, and human health

slide-13
SLIDE 13

Emerging challenges: A new observation paradigm

  • Environmental DNA (eDNA)
  • “census the wild with a jar”
  • Moving sensors or frames of reference
  • Unmanned vehicles/sensors; structure from motion
  • Footprint or integrator measurements
  • Environmental exposure/duration; passive samplers
  • Data variety
  • Citizen; hyperspectral; WQ lab sample; RS image
  • Internet of things
  • Lab on a chip; personalized water use data; water

infrastructure monitoring

Understand water use trade-offs for energy, the environment, and human health

slide-14
SLIDE 14

Emerging challenges: A new observation paradigm

  • Environmental DNA (eDNA)
  • “census the wild with a jar”
  • Moving sensors or frames of reference
  • Unmanned vehicles/sensors; structure from motion
  • Footprint or integrator measurements
  • Environmental exposure/duration; passive samplers
  • Data variety
  • Citizen; hyperspectral; WQ lab sample; RS image
  • Internet of things
  • Lab on a chip; personalized water use data; water

infrastructure monitoring

Understand water use trade-offs for energy, the environment, and human health

slide-15
SLIDE 15

Emerging challenges: A new observation paradigm

  • Environmental DNA (eDNA)
  • “census the wild with a jar”
  • Moving sensors or frames of reference
  • Unmanned vehicles/sensors; structure from motion
  • Footprint or integrator measurements
  • Environmental exposure/duration; passive samplers
  • Data variety
  • Citizen; hyperspectral; WQ lab sample; satellite image
  • Internet of things
  • Lab on a chip; personalized water use data; water

infrastructure monitoring

Understand water use trade-offs for energy, the environment, and human health

slide-16
SLIDE 16

Emerging challenges: A new observation paradigm

  • Environmental DNA (eDNA)
  • “census the wild with a jar”
  • Moving sensors or frames of reference
  • Unmanned vehicles/sensors; structure from motion
  • Footprint or integrator measurements
  • Environmental exposure/duration; passive samplers
  • Data variety
  • Citizen; hyperspectral; WQ lab sample; RS image
  • Internet of things
  • Lab on a chip; personalized water use data; water

infrastructure monitoring

Understand water use trade-offs for energy, the environment, and human health

slide-17
SLIDE 17

Emerging challenges: A new observation paradigm

Understand water use trade-offs for energy, the environment, and human health

  • Environmental DNA (eDNA)
  • “census the wild with a jar”
  • Moving sensors or frames of reference
  • Unmanned vehicles/sensors; structure from motion
  • Footprint or integrator measurements
  • Environmental exposure/duration; passive samplers
  • Data variety
  • Citizen; hyperspectral; WQ lab sample; RS image
  • Internet of things
  • Lab on a chip; personalized water use data; water

infrastructure monitoring

Can we afford to leave data on the table?

slide-18
SLIDE 18

Emerging challenges: Research collaboration shifts

Understand water use trade-offs for energy, the environment, and human health

  • Environmental DNA (eDNA)
  • “census the wild with a jar”
  • Moving sensors or frames of reference
  • Unmanned vehicles/sensors; structure from motion
  • Footprint or integrator measurements
  • Environmental exposure/duration; passive samplers
  • Data variety
  • Citizen; hyperspectral; WQ lab sample; RS image
  • Internet of things
  • Lab on a chip; personalized water use data; water

infrastructure monitoring

Do we need more people at the table?

slide-19
SLIDE 19

Emerging challenges: Research collaboration shifts

Understand water use trade-offs for energy, the environment, and human health

  • The size and makeup of our collaborations are changing
  • Larger and more diverse teams
  • New specializations
  • Increasing role for technologists

Domain Technology Domain Technology

Data poor Data rich

slide-20
SLIDE 20

Emerging challenges: Research collaboration shifts

Understand water use trade-offs for energy, the environment, and human health

Domain Technology Domain Technology

Data poor Data rich Domain Scientists Data Scientists

  • The size and makeup of our collaborations are changing
  • Larger and more diverse teams
  • New specializations
  • Increasing role for technologists
slide-21
SLIDE 21

image: The Economist

How does Data Science function at a science agency?

slide-22
SLIDE 22

Data Science for scaling water research

Use of Scientific Knowledge

High Low

Use of Data

Low High

Theory-based Models Machine learning models

Adapted from Karpatne et al. 2017

  • Domain research may leave information
  • n the table
  • Business-oriented data science may

ignore systems understanding

slide-23
SLIDE 23

Data Science for scaling water research

Use of Scientific Knowledge

High Low

Use of Data

Low High

Theory-based Models Machine learning models Theory-guided Data Science Models

Adapted from Karpatne et al. 2017

slide-24
SLIDE 24

Data Science for scaling water research

Use of Scientific Knowledge

High Low

Use of Data

Low High

Domain Scientist Data Scientist

How does Data Science function at a science agency?

slide-25
SLIDE 25
  • Thinking across scales
  • Interdisciplinary research teams
  • “Macrosystems” science
  • Computational thinking and practice
  • Embedded Comp Sci concepts within collaborative teams
  • Prioritization of democratized data and technology
  • Building relevant tools and training scientists
  • Access to data and computing resources
  • Thoughtful data web-service APIs
  • Infrastructure and resources for HPC/HTC
  • Long-term sustainability

Data Science for scaling water research

Practical near-term limits to scaling

slide-26
SLIDE 26
  • Thinking across scales
  • Interdisciplinary research teams
  • “Macrosystems” science
  • Computational thinking and practice
  • Embedded Comp Sci concepts within collaborative teams
  • Prioritization of democratized data and technology
  • Building relevant tools and training scientists
  • Access to data and computing resources
  • Thoughtful data web-service APIs
  • Infrastructure and resources for HPC/HTC
  • Long-term sustainability

Data Science for scaling water research

technology science

Practical near-term limits to scaling

slide-27
SLIDE 27

Data Science for scaling water research

  • Thinking across scales
  • Interdisciplinary research teams
  • “Macrosystems” science
  • Computational thinking and practice
  • Embedded Comp Sci concepts within collaborative teams
  • Prioritization of democratized data and technology
  • Building relevant tools and training scientists
  • Access to data and computing resources
  • Thoughtful data web-service APIs
  • Infrastructure and resources for HPC/HTC
  • Long-term sustainability

USGS Water enterprise data systems emphasis

slide-28
SLIDE 28

Data Science for scaling water research

USGS Water Data Science emphasis

  • Thinking across scales
  • Interdisciplinary research teams
  • “Macrosystems” science
  • Computational thinking and practice
  • Embedded Comp Sci concepts within collaborative teams
  • Prioritization of democratized data and technology
  • Building relevant tools and training scientists
  • Access to data and computing resources
  • Thoughtful data web-service APIs
  • Infrastructure and resources for HPC/HTC
  • Long-term sustainability
slide-29
SLIDE 29
  • Computational thinking is
  • Imagination and engagement
  • Computational practice is
  • Reproducible, transparent, and efficient

Data Science for scaling water research

Data Science Advancing computational thinking and practice

slide-30
SLIDE 30

Data Science for scaling water research

Data Science Advancing computational thinking and practice

Visualizations Tools Training Research

  • Imagination and engagement

Computational thinking Computational practice

  • Reproducible, transparent, and efficient
slide-31
SLIDE 31

Imagination and engagement

Data Science for scaling water research

Visualizations Training Research Tools https://owi.usgs.gov/vizlab

slide-32
SLIDE 32

Data Science for scaling water research

Visualizations Training Research Tools Outreach Instruction Advising Collaboration

Imagination and engagement

slide-33
SLIDE 33

Data Science for scaling water research

Visualizations Training Research Tools

Reproducible, transparent, and efficient

slide-34
SLIDE 34

Sustaining a community of tool builders

Data Science for scaling water research

Visualizations Training Research Tools

Reproducible, transparent, and efficient

slide-35
SLIDE 35

Data Science for scaling water research

Visualizations Training Research Tools

Reproducible, transparent, and efficient

Process-based models Big compute Machine learning

Theory-guided Data Science Models

slide-36
SLIDE 36

Climate change effects on lake temperature and fish habitat for ~11,000 temperate lakes

Data Science for scaling water research

Visualizations Training Research Tools

Reproducible, transparent, and efficient

slide-37
SLIDE 37

Conclusions

  • Rapid growth in data is changing collaborations
  • Access to data and technology isn't enough
  • Recognizing technologists as peer group is important
  • Being tech-friendly opens up a new recruitment pool

for science agencies

  • Need critical mass of technologists

Jordan Read jread@usgs.gov

Observations

Questions?