Transdisciplinary Foundations of Spatial Data Science April 27 th , - - PowerPoint PPT Presentation

transdisciplinary foundations of spatial data science
SMART_READER_LITE
LIVE PREVIEW

Transdisciplinary Foundations of Spatial Data Science April 27 th , - - PowerPoint PPT Presentation

Transdisciplinary Foundations of Spatial Data Science April 27 th , 2018 Workshop on Illuminating Space and Time in Data Science Center for Geographical Information Systems, Harvard University. Shashi Shekhar McKnight Distinguished University


slide-1
SLIDE 1

Transdisciplinary Foundations of Spatial Data Science

April 27th, 2018

Workshop on Illuminating Space and Time in Data Science Center for Geographical Information Systems, Harvard University.

Shashi Shekhar

McKnight Distinguished University Professor

  • Dept. of Computer Sc. and Eng., University of Minnesota

www.cs.umn.edu/~shekhar : shekhar@umn.edu

slide-2
SLIDE 2

NSF 1737633: Connecting the Smart-City Paradigm with a Sustainable Urban Infrastructure Systems Framework to Advance Equity in Communities (2017-2020)

  • S. Shekhar, A. Ramaswami, R. Feiock, V. Merwade, J. Marshall

Major Research Innovations

  • Comprehensive fine intra-urban scale data (SEIU-EHW parameters in Figure 1)
  • Spatial Data Science to understand relationships (Figure 2).
  • Model & visualize multi-infrastructure spatial smart city futures
  • Knowledge co-production theories, science and practice

Figure 2. Spatial Patterns Figure 1. Complex Interactions among SEIU and EHW parameters

slide-3
SLIDE 3
  • Co-Visioning via meetings
  • Plan infrastructure for driver-less, post-carbon future with climate change
  • Advance Environment, Health, Wellbeing & Equity via infrastructure refinement
  • Co-select Questions

– Understand spatial equity in infrastructure & outcomes (wellbeing. health, environment)? – How does equity first approach differ from average-outcome based approaches ?

  • Problem Co-Definition: How to measure spatial equity? Well-being?
  • Co-Discovery
  • Co-Evaluation
  • Details: University of Minnesota secures $2.5 million grant to improve quality of life in cities, October 20,

2017 (https://www.cs.umn.edu/news/filter/highlights/professor-shekhar-leads-u-m-team-granted-25-million-nsf-grant )

Social Equity

Research

Education

Community Partners & Outreach Diversity

NSF 1737633: Connecting the Smart-City Paradigm with a Sustainable Urban Infrastructure Systems Framework to Advance Equity in Communities (2017-2020)

  • S. Shekhar, A. Ramaswami, R. Feiock, V. Merwade, J. Marshall
slide-4
SLIDE 4

Discover Patterns, Generate Hypothesis Test Hypothesis (Controlled Experiments) Develop Theory Remove pump handle Germ Theory 1854: What causes Cholera? Collect & Curate Data ? water pump Impact on cities: Health & well-being, parks, sewage system, drinking water supply, …

History of Spatial Data Science in S&CC

Q? What are the Choleras of today? Q? How may spatial data science help?

slide-5
SLIDE 5

Today’s Transdisciplinary Spatial Data Science

5

  • Spatial Statistics: Test to reduce spurious patterns
  • Computer Sc.: Algorithms for large (e.g., national) data
  • Mathematics: Reduce missed patterns
  • SatScan enumerates only 2-point circles
slide-6
SLIDE 6

Theme 2: Spatial Data Analysis of SEIU-WHE Parameters

  • Task 2A: Develop algorithms to discover statistically significant linear and buffer

hotspots, e.g., of income-poverty, consumption, pollution exposure, and low wellbeing

  • Task 2B: Discover co-location and teleconnection patterns: Develop scalable algorithms

for identifying correlations in SEIU-WHE parameters, e.g., hotspots and deprived areas

  • Task 2C: Data-Driven and Discipline-inspired hypotheses
slide-7
SLIDE 7

Task 2A: Discovering Linear and Buffer Hotspots

  • Hotspots often along a spatial network (e.g., air pollution hotspots along roads)
  • Preliminary results: Linear hotspot detection which models the linear semantics
  • However, only along shortest paths between end-points
  • Not including the information surrounding the network.
  • Proposed approach:

– Novel notion: Non-shortest-path Simple paths, buffer hotspots – Potential solution: graph partitioning based divide and conquer

(a) Circular hotspots for pedestrian fatalities (b) Linear hotspots for pedestrian fatalities (c) Example of non-shortest path

slide-8
SLIDE 8

Task 2B: Discover co-location and teleconnection patterns

  • Challenge: Spatial partitioning distorts (& misses) spatial interactions!
  • Spatial Statistical Methods are computationally expensive
  • Prelim. Results: Fast algorithms for mining Co-location (& Teleconnection)
  • Proposed: address data with multiple levels of aggregation, e.g., areal summary

(a) a map of 3 features (b) Spatial Partitions (c) Neighbor graph Pearson’s Correlation Ripley’s cross-K Participation Index

  • 0.90

0.33 0.5

  • 1

0.5 1

slide-9
SLIDE 9

Data-Intensive Science of S&CC in 21st Century

Collect, & Curate Big Data Spatial Patterns, Hypothesis Generation Test Hypothesis (Policy Intervention) S&CC Theory Role of policies & urban forms

Hotspots of infrastructure deprivation, consumption, pollution, investment, disease & well-being. Correlates?

SEIU EHW Equity first policies

Data-driven and Discipline-inspired hypothesis generation

Volume, Variety

slide-10
SLIDE 10

Challenges Ahead

  • Non-stationarity
  • Change, e.g., climate, Web, …
  • Feedback Loops, e.g., Social
  • Fairness
  • Accountability
  • Transparency
slide-11
SLIDE 11

References :Surveys, Overviews

  • Spatial Computing ( html , short video , tweet ), Communications of the ACM, 59(1):72-81,

January, 2016.

  • Transdisciplinary Foundations of Geospatial Data Science ( html , pdf ), ISPRS Intl. Jr. of

Geo-Informatics, 6(12):395-429, 2017. ( doi:10.3390/ijgi6120395 )

  • Spatiotemporal Data Mining: A Computational Perspective , ISPRS Intl. Jr. on Geo-

Information, 4(4):2306-2338, 2015 (DOI: 10.3390/ijgi4042306).

  • Identifying

patterns in spatial information: a survey

  • f

methods ( pdf ), Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 1(3):193-214, May/June

  • 2011. (DOI: 10.1002/widm.25).
  • Theory-Guided Data Science: A New Paradigm for Scientific Discovery from Data, IEEE

Transactions on Knowledge and Dat Mining, 29(10):2318-2331, June 2017. ( DOI: 10.1109/TKDE.2017.2720168 ).

  • Parallel Processing over Spatial-Temporal Datasets from Geo, Bio, Climate and Social

Science Communities: A Research Roadmap. IEEE BigData Congress 2017: 232-250.

  • Spatial Databases: Accomplishments and Research Needs, IEEE Transactions on

Knowledge and Data Engineering, 11(1):45-55, 1999.

slide-12
SLIDE 12

References: Details

Colocations

  • Discovering colocation patterns from spatial data sets: a general approach, IEEE Trans. on Know. and

Data Eng., 16(12), 2004 (w/ Y. Huang et al.).

  • A join-less approach for mining spatial colocation patterns, IEEE Trans. on Know. and Data Eng.,18

(10), 2006. (w/ J. Yoo).

  • Cascading Spatio-Temporal Pattern Discovery. IEEE Trans. Knowl. Data Eng. 24(11): 1977-

1992, 2012 (w/ P. Mohan et al.).

Spatial Outliers

  • Detecting graph-based spatial outliers: algorithms and applications (a summary of results), Proc.:

ACM Intl. Conf. on Knowledge Discovery & Data Mining, 2001 (with Q. Lu et al.)

  • A unified approach to detecting spatial outliers, Springer GeoInformatica, 7 (2), 2003. (w/ C. Lu, et al.)
  • Discovering Flow Anomalies: A SWEET Approach, IEEE Intl. Conf. on Data Mining, 2008 (w/ J. Kang).

Hot Spots

  • Discovering personally meaningful places: An interactive clustering approach, ACM Trans. on Info.

Systems (TOIS) 25 (3), 2007. (with C. Zhou et al.)

  • A K-Main Routes Approach to Spatial Network Activity Summarization, IEEE Trans on Know. & Data

Eng., 26(6), 2014. (with D. Oliver et al.)

  • Significant Linear Hotspot Discovery, IEEE Trans. Big Data 3(2): 140-153, 2017, (w/ X.Tang et al.)

Location Prediction

  • Spatial contextual classification and prediction models for mining geospatial data, IEEE Transactions
  • n Multimedia, 4 (2), 2002. (with P. Schrater et al.)
  • Focal-Test-Based Spatial Decision Tree Learning. IEEE Trans. Knowl. Data Eng. 27(6): 1547-1559,

2015 (summary in Proc. IEEE Intl. Conf. on Data Mining, 2013) (w/ Z. Jiang et al.).

Change Detection

  • Spatiotemporal change footprint pattern discovery: an inter-disciplinary survey. Wiley Interdisc. Rew.:

Data Mining and Know. Discovery 4(1), 2014. (with X. Zhou et al.)

slide-13
SLIDE 13

Knowledge Co-Production:

NSF Smart & Connected Communities Grant 1737633 (2017-2020)

  • Co-Visioning via meetings
  • Plan infrastructure for driver-less, post-carbon future with climate change
  • Advance Environment, Health, Wellbeing & Equity via infrastructure refinement
  • Co-select Questions

– Understand spatial equity in infrastructure & outcomes (wellbeing. health, environment)? – How does equity first approach differ from average-outcome based approaches ?

  • Problem Co-Definition: How to measure spatial equity? Well-being?
  • Co-Discovery
  • Co-Evaluation
  • Details: University of Minnesota secures $2.5 million grant to improve quality of life in cities, October 20,

2017 (https://www.cs.umn.edu/news/filter/highlights/professor-shekhar-leads-u-m-team-granted-25-million-nsf-grant )

Social Equity

Research

Education

Community Partners & Outreach Diversity