How to make R, PostGIS and QGis cooperate for statistical modelling - - PowerPoint PPT Presentation

how to make r postgis and qgis cooperate for statistical
SMART_READER_LITE
LIVE PREVIEW

How to make R, PostGIS and QGis cooperate for statistical modelling - - PowerPoint PPT Presentation

How to make R, PostGIS and QGis cooperate for statistical modelling duties How to make R, PostGIS and QGis cooperate for statistical modelling duties a case study on hedonic regressions Olivier Bonin UPE IFSTTAR LVMT OGRS 2012 How to make


slide-1
SLIDE 1

How to make R, PostGIS and QGis cooperate for statistical modelling duties

How to make R, PostGIS and QGis cooperate for statistical modelling duties

a case study on hedonic regressions Olivier Bonin – UPE IFSTTAR LVMT OGRS 2012

slide-2
SLIDE 2

How to make R, PostGIS and QGis cooperate for statistical modelling duties Modelling requirements

Hedonic models

In an hedonic model (Rosen, 1974), the price of a product depends

  • n a vector of its characteristics.

When applied to housing, three kinds of characteristics must be taken into account (Kain and Quigley, 1970). pi =

  • αjxij +
  • βjyij +
  • γjzij + εi

with xij the structural characteristics, yij the neighbourhood characteristics, zij the market location characteristics, and εi a Gaussian error term.

slide-3
SLIDE 3

How to make R, PostGIS and QGis cooperate for statistical modelling duties Modelling requirements

Data

Statistical data (tabular) as well as geographical data: several hundred thousands of records of residential property transactions coordinates of housing locations several GIS layers to compute the spatial characteristics of the dwellings: road networks and public transit networks, location

  • f employment centers, of amenities etc.

Construction of xij from the tabular data (property transactions database). Construction of yij and zij from spatial analysis

slide-4
SLIDE 4

How to make R, PostGIS and QGis cooperate for statistical modelling duties Modelling requirements

Spatial dimension of the problem

Main difficulty of the spatial analysis: the size of the property transaction database. Necessity to visualize the error term of models to check for spatial auto correlation. Necessity to produce maps on zones rather than on points representing the locations of dwellings.

slide-5
SLIDE 5

How to make R, PostGIS and QGis cooperate for statistical modelling duties Software setup

Statistical software

R is the obvious choice. (R Development Core Team. R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0, http://www.R-project.org, 2009) Extensive set of libraries allowing advanced modelling techniques such as spatial regressions or multi-level modelling (used in Bonin, 2009). Existing connectors with PostGIS and QGis.

slide-6
SLIDE 6

How to make R, PostGIS and QGis cooperate for statistical modelling duties Software setup

Spatial analysis software

GIS or spatially-aware RDBMS? GIS and spatially-aware RDBMS? PostGIS proved to be necessary, because of the large amount of data to handle, and of the need for a spatial index. A GIS software seemed to be useful for data visualization; QGis was selected because of its native ability to connect both to PostGIS and to QGis.

slide-7
SLIDE 7

How to make R, PostGIS and QGis cooperate for statistical modelling duties Software setup

How to connect R, QGis and PostGIS?

It is theoretically possible to connect the three pieces of software, with bi-directional connexions. Is it simple? Efficient? Useful? Required?

R PostGIS QGis manageR spqr native "add PostGIS layer..." RODBC RDbi

slide-8
SLIDE 8

How to make R, PostGIS and QGis cooperate for statistical modelling duties Software setup

R – PostGIS connexion

PostGIS on a GNU/Linux server. R on GNU/Linux, Mac OS X and Windows clients. RODBC: “straightforward” solution (directly available at CRAN), but depends on ODBC (open source solution on Mac OS X if you like to use the Terminal and to compile libraries), and very slooooow. Rdbi + RdbiPgSQL: hosted on BioConductor, outdated website, but very good performance.

slide-9
SLIDE 9

How to make R, PostGIS and QGis cooperate for statistical modelling duties Example

Mapping data in QGis

Rapid transit network and housing locations in the Ile-de-France region (source: notaries and STIF).

slide-10
SLIDE 10

How to make R, PostGIS and QGis cooperate for statistical modelling duties Example

Modelling in R

Import from PostGIS into R: a few tens of seconds to load 125,000 records with 87 columns.

Linear mixed-effects model fit by REML Data: bien Subset: condAP AIC BIC logLik 568916.1 569206.3 -284427.1 Random effects: Formula: ~1 | dep (Intercept) StdDev: 5.263306 Formula: ~1 | code_commn %in% dep (Intercept) Residual StdDev: 4.043686 6.575219

slide-11
SLIDE 11

How to make R, PostGIS and QGis cooperate for statistical modelling duties Example Fixed effects: Value Std.Error DF t-value p-value (Intercept) 72.75860 1.9320485 85213 37.65878 0.0000 surfh

  • 0.02580 0.0015843 85213 -16.28330

0.0000 nbppr5

  • 6.25587 0.2036677 85213 -30.71605

0.0000 nbppr10

  • 4.00707 1.0713927 85213
  • 3.74006

0.0002 anc1

  • 1.80266 0.2610076 85213
  • 6.90653

0.0000 bi_epoquB

  • 3.76924 0.1701087 85213 -22.15781

0.0000 bi_epoquC

  • 3.64039 0.1726365 85213 -21.08701

0.0000 bi_epoquD

  • 4.36626 0.1737003 85213 -25.13676

0.0000 bi_epoquE

  • 4.68421 0.1801792 85213 -25.99747

0.0000 bi_epoquF

  • 2.77668 0.1922702 85213 -14.44157

0.0000 bi_epoquG

  • 0.23061 0.1946496 85213
  • 1.18472

0.2361 bi_epoquH

  • 0.30314 0.3061475 85213
  • 0.99017

0.3221 saldb1 1.54548 0.0752853 85213 20.52830 0.0000 saldb2 2.48280 0.1328440 85213 18.68963 0.0000 bi_ascenO 0.23147 0.0549959 85213 4.20877 0.0000 etage1 0.11218 0.0739998 85213 1.51589 0.1296 etage2 0.50147 0.0750172 85213 6.68471 0.0000 etage3 0.51211 0.0788536 85213 6.49448 0.0000 etage4 0.62466 0.0873488 85213 7.15138 0.0000 etage5 0.43328 0.0804540 85213 5.38544 0.0000 garag1 1.13419 0.0663346 85213 17.09804 0.0000 garag2 1.41051 0.1170110 85213 12.05449 0.0000 access_fer_n2

  • 1.30453 0.0573154 85213 -22.76062

0.0000 access_fer_n4

  • 3.77274 0.0752602 85213 -50.12927

0.0000 access_metroTRUE 0.45248 0.1287529 85213 3.51429 0.0004 distc

  • 0.35556 0.0143153 85213 -24.83802

0.0000 surfh:nbppr5 0.07497 0.0026233 85213 28.57851 0.0000 surfh:nbppr10 0.05339 0.0050886 85213 10.49101 0.0000

slide-12
SLIDE 12

How to make R, PostGIS and QGis cooperate for statistical modelling duties Example

Cartographic representation of the error term

The model seems correct. Is there any spatial structure in the error term? As the model is estimated on 250,000 property transactions (possibly with several transactions at the same location), it is necessary to aggregate the error terms on zones to visualize it. We choose here the finest available census level: the IRIS. We transfer the point dataset into PostGIS, and then use sql queries to compute average error terms on the IRIS areas. It is easy, but a little longish: more than 4 minutes.

slide-13
SLIDE 13

How to make R, PostGIS and QGis cooperate for statistical modelling duties Example

Residuals of the hedonic model on housings aggregated at the IRIS level in Ile-de-France

slide-14
SLIDE 14

How to make R, PostGIS and QGis cooperate for statistical modelling duties Example

R mapping capabilities

Actually, R can make acceptable maps, with the help of RColorBrewer (for the color palettes), and of the sp and maptools libraries. To tell the truth: much more complicated, but a lot quicker.

slide-15
SLIDE 15

How to make R, PostGIS and QGis cooperate for statistical modelling duties Example

Residuals of the hedonic model on housings aggregated at the IRIS level in Ile-de-France

slide-16
SLIDE 16

How to make R, PostGIS and QGis cooperate for statistical modelling duties Conclusion

Convergence

R, PostGIS and QGis can be connected, but they also have many common capabilities (e.g. spatial queries with the help of the GEOS library – Geometry engine – open source). RDBMS have moved early towards spatial data (first version of PostGIS in 2001), as well as R. R is now able to perform most of the GIS duties (except data acquisition). GIS are moving slowly towards software or libraries that could enhance their data processing and modelling capabilities (or turning into specialized platforms).

slide-17
SLIDE 17

How to make R, PostGIS and QGis cooperate for statistical modelling duties Conclusion

Conclusion

Many researchers in social science with quantitative approaches (in my field: geography, regional science, transportation science) heavily rely on software with modelling and analysis capabilities: R for statistical modelling, Netlogo, Repast Simphony or GAMA for agent-based simulation, etc. All these platforms move towards GIS: many libraries in R, NetLogo GIS extension, Repast GIS support, build-in GIS capabilities in GAMA. Classical GIS platforms have to react if they want to remain attractive for these researchers. The USM OrbisGIS plugin (Rousseaux et al., 2012) is a very good step in this direction!