FROM BALTIMORE TO THE STARS WITH DATA Tamas Budavari / Applied Math - - PowerPoint PPT Presentation

from baltimore to the stars with data
SMART_READER_LITE
LIVE PREVIEW

FROM BALTIMORE TO THE STARS WITH DATA Tamas Budavari / Applied Math - - PowerPoint PPT Presentation

FROM BALTIMORE TO THE STARS WITH DATA Tamas Budavari / Applied Math & Stats, JHU Breaking the Divestment Cycle: Predicting Abandonment & Fostering Neighborhood Revitalization in Baltimore Tams Budavri Applied Mathematics &


slide-1
SLIDE 1
slide-2
SLIDE 2

Tamas Budavari / Applied Math & Stats, JHU

FROM BALTIMORE TO THE STARS WITH DATA

slide-3
SLIDE 3
slide-4
SLIDE 4

Breaking the Divestment Cycle: Predicting Abandonment & Fostering Neighborhood Revitalization in Baltimore

Tamás Budavári

Applied Mathematics & Statistics – The Johns Hopkins University

slide-5
SLIDE 5

Baltimore overview

  • Baltimore has lost 1/3 of its population since 1950
  • Today, we have 16,500 boarded up vacant buildings
  • Of these, 13,000 are in distressed markets
  • M. Braverman
slide-6
SLIDE 6

Boarded up vacants

  • M. Braverman
slide-7
SLIDE 7

data science flexible data platform predictive modeling &

  • ptimization

1

data fusion geometry + history highly extensible

slide-8
SLIDE 8

social science modeling transition estimating externalities evaluating policy

2

slide-9
SLIDE 9

social science modeling transition estimating externalities evaluating policy

2

slide-10
SLIDE 10

government rapid response queries assisting with strategic investments mapping “unoccupancy”

3

slide-11
SLIDE 11

Data in Baltimore

 OpenBaltimore

 Hundreds of public datasets online

http://data.baltimorecity.gov

 Plus more administrative data

slide-12
SLIDE 12

DHCD’s Data Infrastructure

 Dept. of Housing & Community Dev

 Study changes over time  Support decision making

 Statistics to help?

 Inference & prediction

  • M. Braverman
  • J. D. Evans
slide-13
SLIDE 13

Jim Gray’s 20 Questions

 Data-driven studies

 Low-level questions

 What we see  High-level questions

 Help hone policy making

 Interventions

slide-14
SLIDE 14

Built a Unique Solution

 Database of Baltimore City

 Geospatial info for all parcels  Time history of real properties

 Easily extendable

 On the IDIES’s Data-Scope  Novel indexing for fast links

slide-15
SLIDE 15

Mapping Vacancy

2010 2015

Phil Garboden

slide-16
SLIDE 16

Mapping Vacancy

2010 2015

Phil Garboden

slide-17
SLIDE 17

Clustering of Vacancy

 Probability of finding a

vacant next to another

 Quantitative comparison

 Over time  Across town

slide-18
SLIDE 18

Similar Neighborhoods

 Similarity graphs & eigenmaps

slide-19
SLIDE 19

What is a Neighborhood?

 Are neighborhood boundaries meaningful?  Better grouping of houses?

 Trends on a finer scale

slide-20
SLIDE 20

Collapsed Vacants

slide-21
SLIDE 21

Collapsed Vacant

 Ends of contiguous blocks of rowhomes

 Alleys, gaps and demos break rows

 Need “sub-blockface” analysis

 Time-dependent

slide-22
SLIDE 22

Neighborhood Revitalization

 Modeling urban transitions

 What factors catalyze

reinvestment?

 Disinvestment?

 Innovative use of data

 New sources of information

 Zillow? Cell phone usage?

slide-23
SLIDE 23

Neighborhood Revitalization

 Modeling urban transitions

 What factors catalyze

reinvestment?

 Disinvestment?

 Innovative use of data

 New sources of information

 Zillow? Cell phone usage?

slide-24
SLIDE 24

Strategic Investments

 Governor’s budget

 Unprecedented $75M

 City scheduling

 Spring 2016

 JHU map of targets!

slide-25
SLIDE 25

Strategic Investments

 Combinatorial Optimization

 Improve some objective, e.g.,

  • r

 Within a limited budget

 Best objective? How to solve?

slide-26
SLIDE 26

Optimize the Impact

 Different objectives

 Same budget

 Advanced tools

 For decision makers Lenny Fan Amitabh Basu Phil Garboden

slide-27
SLIDE 27
slide-28
SLIDE 28

Price

 Longitudinal data  Environment  Prediction  Machine

Learning

slide-29
SLIDE 29

Ambitious Next Steps

Ben Seigel (21CC) Katalin Szlavecz Ben Zaitchik Keeve Nachman Katie O’Meara (MICA)

slide-30
SLIDE 30

Spatiotemporal Multi-Level Modeling

 Hierarchical Bayesian statistics  Include all aggregated data  Joint inference for the

 Individual houses and  Ensemble distributions Mengyang Gu

slide-31
SLIDE 31

Predicting Unoccupancy

 Time-series data  Water usage  BG&E usage  USPS  Proxy for occupancy

Phil Garboden Hana Clemens

slide-32
SLIDE 32

Satellite View

 Missing roof?  Blue tarp = holes?

slide-33
SLIDE 33

 Looking up!

 Astronomy images  Blurred exposures

 We solve for it

 For high-res details

Image behind the Atmosphere

Coadded Image

Matthias Lee Charlie Gulian Rick White

slide-34
SLIDE 34

 Looking up!

 Astronomy images  Blurred exposures

 We solve for it

 For high-res details

Image behind the Atmosphere

Coadded Image

Matthias Lee Charlie Gulian Rick White

slide-35
SLIDE 35

Image behind the Atmosphere

 Looking up!

 Astronomy images  Blurred exposures

 We solve for it

 For high-res details

Deconvolved Image

Matthias Lee Charlie Gulian Rick White

slide-36
SLIDE 36

Image behind the Atmosphere

 Looking up!

 Astronomy images  Blurred exposures

 We solve for it

 For high-res details

Hubble Image

Matthias Lee Charlie Gulian Rick White

slide-37
SLIDE 37

Differential Chromatic Refraction

 Even colors!

Matthias Lee Andy Connolly Charlie Gulian

slide-38
SLIDE 38

Differential Chromatic Refraction

 Even colors!

Matthias Lee Andy Connolly Charlie Gulian

slide-39
SLIDE 39

At the Heart…

 Applied Math & Stats

 Data mining  Statistical modeling  Machine learning  Optimization  Bayesian inference

 Data-Intensive Science

 Hardware platforms  Software solutions  Streaming algorithms  Database technologies  GIS tools & indexing

slide-40
SLIDE 40

Limitations of Machine Learning

 Many methods to choose from

 And more knobs to tweak

 Latching on known features

 Manual intervention to refine

 What’s left in the data?

Missing the Human in the Loop!

slide-41
SLIDE 41

Use the Brain’s Detection Power

slide-42
SLIDE 42

Rapid Serial Visual Presentation

 Current state-of-the-art is binary classification

 Target / Distractor

 We look for the interesting

 Dynamic behavior of brain:

looking for new

Nick Carey

slide-43
SLIDE 43

Human-Machine Co-Learning

 Hide wireframe of

3D cube in high-D

 Looks like noise  Random projections Nick Carey

slide-44
SLIDE 44

Human-Machine Co-Learning

 Hide wireframe of

3D cube in high-D

 Looks like noise  Random projections  Trigger to explore locally Nick Carey

slide-45
SLIDE 45

Human-Machine Co-Learning

 Hide wireframe of

3D cube in high-D

 Looks like noise  Random projections  Trigger to explore locally  Converge on better view Nick Carey

slide-46
SLIDE 46

Human-Machine Co-Learning

 Hide wireframe of

3D cube in high-D

 Looks like noise  Random projections  Trigger to explore locally  Converge on better view

Subconscious Navigation!

Nick Carey

slide-47
SLIDE 47

Human-Machine Co-Learning

 Hide wireframe of

3D cube in high-D

 Looks like noise  Random projections  Trigger to explore locally  Converge on better view

Subconscious Navigation!

Nick Carey

slide-48
SLIDE 48

Summary

 Promising first steps

 With direct applications already deployed

 Common data infrastructure & approaches

 Surprisingly similar, e.g., across astro/city

 Ambitious future plans

 Need help! And need more data…

slide-49
SLIDE 49