Rental Apartment Prices in the province of Zurich Assignment 1 for - - PowerPoint PPT Presentation

rental apartment prices in the province of zurich
SMART_READER_LITE
LIVE PREVIEW

Rental Apartment Prices in the province of Zurich Assignment 1 for - - PowerPoint PPT Presentation

Data Non-spatial Spatial Trend Rental Apartment Prices in the province of Zurich Assignment 1 for Spatial Statistics (STAT 946) Adrian Waddell University of Waterloo October 9, 2008 Adrian Waddell (University of Waterloo) Rent October 9,


slide-1
SLIDE 1

Data Non-spatial Spatial Trend

Rental Apartment Prices in the province of Zurich

Assignment 1 for Spatial Statistics (STAT 946) Adrian Waddell

University of Waterloo

October 9, 2008

Adrian Waddell (University of Waterloo) Rent October 9, 2008 1 / 34

slide-2
SLIDE 2

Data Non-spatial Spatial Trend

Goal

Overview of real estate market in Zurich Fit a model price ∼ location + other covariates + error which apartments have large residuals? can model be used to classify good and bad deals? automate process, daily update

Adrian Waddell (University of Waterloo) Rent October 9, 2008 2 / 34

slide-3
SLIDE 3

Data Non-spatial Spatial Trend

Data Sources

Final Data: 3088 apartments for rent in province Zurich (Switzerland), collected on Friday, October 3, 2008. street, nr, postal code, city, longitude, latitude, number of rooms, living area, apartment style, floor, price Real Estate Data Geocoding API GIS Data http://www.giszh.zh.ch (CH1903)

Adrian Waddell (University of Waterloo) Rent October 9, 2008 3 / 34

slide-4
SLIDE 4

Data Non-spatial Spatial Trend

Data Collection

Perl Script 1: Search for all apartments in Zurich, save the html page sources for each list → 165 ∗.txt files. Perl Script 2: Information extraction form html sources (parsing). Lookup longitude and latitude with Google API (geocoding). (library Geo::Coder::Google). Books on this Topic: (all O’Reilly)

Adrian Waddell (University of Waterloo) Rent October 9, 2008 4 / 34

slide-5
SLIDE 5

Data Non-spatial Spatial Trend

Data Processing

All data imported into R. Coordinate Reference System chosen to be the “Swiss coordinate system”. Transformation of housing data. Outliers detection (in location and price) and deletion. 3144 − 3088 = 56 outliers.

Adrian Waddell (University of Waterloo) Rent October 9, 2008 5 / 34

slide-6
SLIDE 6

Data Non-spatial Spatial Trend

All available apartments for rent (n = 3088)

Adrian Waddell (University of Waterloo) Rent October 9, 2008 6 / 34

slide-7
SLIDE 7

Data Non-spatial Spatial Trend

Price vs. number of rooms

Adrian Waddell (University of Waterloo) Rent October 9, 2008 7 / 34

slide-8
SLIDE 8

Data Non-spatial Spatial Trend

Price distribution for Nr. of Rooms 6.5 and price < 6700

Adrian Waddell (University of Waterloo) Rent October 9, 2008 8 / 34

slide-9
SLIDE 9

Data Non-spatial Spatial Trend

Price vs. number of Rooms

Adrian Waddell (University of Waterloo) Rent October 9, 2008 9 / 34

slide-10
SLIDE 10

Data Non-spatial Spatial Trend

Is the location sufficient to explain the monthly rent?

Adrian Waddell (University of Waterloo) Rent October 9, 2008 10 / 34

slide-11
SLIDE 11

Data Non-spatial Spatial Trend

Model

Location is not sufficient to describe price. Use Model log(price) = m(·) + e(s) e(s) = f(s) + ǫ non-spatial trend: m(area, nrRooms, ...) is chosen to be a linear model → variable selection spatial trend: e(s), model Variogram, Kriege residuals: ǫ

Adrian Waddell (University of Waterloo) Rent October 9, 2008 11 / 34

slide-12
SLIDE 12

Data Non-spatial Spatial Trend

Variable selection: apartment style

Number or Rooms style [1,2) [2,3) [3,4) [4,5) [5,6) [6,12) Not Avail * Apartment 114 228 750 873 201 26 24 Attic 1 * Attic flat 5 8 27 36 17 3 Bachelor flat 2 Bifamiliar house 2 3 3 4 * Duplex 1 14 40 101 51 14 2 Farm house 1 1 1 4 * Furnished flat 67 59 62 22 5 3 13 Loft 5 1 2 2 10 * Roof flat 4 25 55 44 15 2 2 * Row house 1 1 15 16 14 1 * Single house 1 9 11 31 Single room 10 1 1 2 Studio 4 1 Terrace flat 2 3 4 Terrace house 1 Villa 1 3

Adrian Waddell (University of Waterloo) Rent October 9, 2008 12 / 34

slide-13
SLIDE 13

Data Non-spatial Spatial Trend

Variable selection: apartment are

area available nr Room YES NO [1,2) 163 49 [2,3) 275 63 [3,4) 791 152 [4,5) 937 173 [5,6) 283 42 [6,12) 96 9 Not Avail 37 18

  • total

2582 506

Only use apartments with styles marked with * (n = 3013) Only use apartments with available living area data

Adrian Waddell (University of Waterloo) Rent October 9, 2008 13 / 34

slide-14
SLIDE 14

Data Non-spatial Spatial Trend

Variable selection summary

Adrian Waddell (University of Waterloo) Rent October 9, 2008 14 / 34

slide-15
SLIDE 15

Data Non-spatial Spatial Trend

Model fitting

Use area, style and nrRoom as covariates Omit NA’s and nrRoom > 6.5, area > 5 → n = 2464 Fit linear model log(price) = β0 +β1 ·area+β2 ·nrRooms+β3 ·style+e(s) where nrRooms and style are factor variables.

Adrian Waddell (University of Waterloo) Rent October 9, 2008 15 / 34

slide-16
SLIDE 16

Data Non-spatial Spatial Trend

Fitted Model

Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 6.6575610 0.0313623 212.279 < 2e-16 *** area 0.0075245 0.0002692 27.951 < 2e-16 *** nrRoom:1.5 0.1374210 0.0419900 3.273 0.00108 ** nrRoom:2 0.2065559 0.0413576 4.994 6.32e-07 *** nrRoom:2.5 0.2818575 0.0365278 7.716 1.73e-14 *** nrRoom:3 0.2314567 0.0372024 6.222 5.77e-10 *** nrRoom:3.5 0.2923112 0.0353915 8.259 2.37e-16 *** nrRoom:4 0.2188876 0.0401093 5.457 5.32e-08 *** nrRoom:4.5 0.2421336 0.0381684 6.344 2.66e-10 *** nrRoom:5 0.2953283 0.0511765 5.771 8.89e-09 *** nrRoom:5.5 0.2279178 0.0450000 5.065 4.39e-07 *** nrRoom:6 0.4685403 0.0738201 6.347 2.61e-10 *** nrRoom:6.5 0.2776106 0.0624401 4.446 9.14e-06 *** style:Attic flat 0.2061413 0.0288673 7.141 1.22e-12 *** style:Duplex 0.0008961 0.0204669 0.044 0.96508 style:Furnished flat 0.5765866 0.0217763 26.478 < 2e-16 *** style:Roof flat

  • 0.0006020

0.0236714

  • 0.025

0.97971 style:Row house

  • 0.1118195

0.0509342

  • 2.195

0.02823 * style:Single house 0.1376427 0.0504790 2.727 0.00644 **

  • Signif. codes:

0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1 Residual standard error: 0.254 on 2445 degrees of freedom Multiple R-squared: 0.5796, Adjusted R-squared: 0.5765 F-statistic: 187.3 on 18 and 2445 DF, p-value: < 2.2e-16 Adrian Waddell (University of Waterloo) Rent October 9, 2008 16 / 34

slide-17
SLIDE 17

Data Non-spatial Spatial Trend

Spatial trend: e(s) & exp{e(s)}

Adrian Waddell (University of Waterloo) Rent October 9, 2008 17 / 34

slide-18
SLIDE 18

Data Non-spatial Spatial Trend

Distribution of e(s)

Histogram and Kernel Density Estimate

e(s) Density −1.5 −1.0 −0.5 0.0 0.5 1.0 0.0 0.5 1.0 1.5 2.0 2.5

Adrian Waddell (University of Waterloo) Rent October 9, 2008 18 / 34

slide-19
SLIDE 19

Data Non-spatial Spatial Trend

Omnidirectional Variogram (MoM) for e(s)

Adrian Waddell (University of Waterloo) Rent October 9, 2008 19 / 34

slide-20
SLIDE 20

Data Non-spatial Spatial Trend

Robust Variogram estimates

MoM(h) = 1 2 · 1 |N(h)|

  • (si,sj)∈N(h)

{e(si) − e(sj)}2 CRESS(h) = 1 2 · 1 0.457 + 0.494/|N(h)|    1 |N(h)|

  • (si,sj)∈N(h)

|e(si) − e(sj)|1/2   

4

ROB1(h) = 1 2 · Median[{e(si) − e(sj)}2 : (si, sj) ∈ N(h)] 0.457 ROB2(h) = 1 2 · Median[{e(si) − e(sj)}1/2 : (si, sj) ∈ N(h)]4 0.457

as defined in the course notes.

Adrian Waddell (University of Waterloo) Rent October 9, 2008 20 / 34

slide-21
SLIDE 21

Data Non-spatial Spatial Trend

Robust Variogram estimates

  • 5000

15000 0.00 0.02 0.04 0.06 0.08 h MoM

  • 0.00

0.02 0.04 0.06 0.08 0.00 0.02 0.04 0.06 0.08 CRESS MoM

  • 0.00

0.02 0.04 0.06 0.08 0.00 0.02 0.04 0.06 0.08 ROB1 MoM

  • 0.00

0.02 0.04 0.06 0.08 0.00 0.02 0.04 0.06 0.08 ROB2 MoM

  • 5000

15000 0.00 0.02 0.04 0.06 0.08 h CRESS

  • 0.00

0.02 0.04 0.06 0.08 0.00 0.02 0.04 0.06 0.08 ROB1 CRESS

  • 0.00

0.02 0.04 0.06 0.08 0.00 0.02 0.04 0.06 0.08 ROB2 CRESS

  • 5000

15000 0.00 0.02 0.04 0.06 0.08 h ROB1

  • 0.00

0.02 0.04 0.06 0.08 0.00 0.02 0.04 0.06 0.08 ROB2 ROB1

  • 5000

15000 0.00 0.02 0.04 0.06 0.08 h ROB2

Adrian Waddell (University of Waterloo) Rent October 9, 2008 21 / 34

slide-22
SLIDE 22

Data Non-spatial Spatial Trend

Variogram Modeling: up to h = 40km

  • 10000

20000 30000 40000 0.00 0.01 0.02 0.03 0.04 0.05 0.06

choosing an exponential−power model by eye

h in meters γ(h)/2, Cressier estimate

Adrian Waddell (University of Waterloo) Rent October 9, 2008 22 / 34

slide-23
SLIDE 23

Data Non-spatial Spatial Trend

Variogram Modeling: Nugget?

  • ● ● ● ●
  • ● ● ●
  • ● ●
  • 50

100 150 200 0.00 0.01 0.02 0.03 0.04 0.05 0.06

nugget = 0.005

h in meters γ(h)/2, Cressier estimate

Adrian Waddell (University of Waterloo) Rent October 9, 2008 23 / 34

slide-24
SLIDE 24

Data Non-spatial Spatial Trend

Variogram Modeling: Fitting bye eye up to h = 8km

  • 2000

4000 6000 8000 0.00 0.01 0.02 0.03 0.04 0.05 0.06

choosing an exponential model by eye

h in meters γ(h)/2, Cressier estimate

0.005+0.041*(1−exp{−h/600})

Adrian Waddell (University of Waterloo) Rent October 9, 2008 24 / 34

slide-25
SLIDE 25

Data Non-spatial Spatial Trend

Variogram Modeling: Fitting bye eye up to h = 20km

  • 5000

10000 15000 20000 0.00 0.01 0.02 0.03 0.04 0.05 0.06

choosing an exponential−power model by eye

h in meters γ(h)/2, Cressier estimate

0.005+0.045*(1−exp{−|h/400|^(0.4)})

Adrian Waddell (University of Waterloo) Rent October 9, 2008 25 / 34

slide-26
SLIDE 26

Data Non-spatial Spatial Trend

Variogram Modeling: Fitting bye eye up to h = 20km

  • 5000

10000 15000 20000 0.00 0.01 0.02 0.03 0.04 0.05 0.06

choosing an matern model by eye

h in meters γ(h)/2, Cressier estimate

θ1 = 3910 θ2 = 0.1044 0.005+0.047*(1−m(h,θ1θ2))

Adrian Waddell (University of Waterloo) Rent October 9, 2008 26 / 34

slide-27
SLIDE 27

Data Non-spatial Spatial Trend

Intrinsic Stationary? Weak Stationary?

γ(h) flattens as h gets larger, Cov(e(s + h), e(s)) goes to 0 as h goes to a large distance If data is intrinsic then it is also weak stationary. However looks like the mean is not constant for all locations s. Data may be weak stationary More investigation has to be done.

  • 5000

10000 15000 20000 0.00 0.01 0.02 0.03 0.04 0.05

Variogram after trend removal (2nd order polynom)

distance semivariance

Adrian Waddell (University of Waterloo) Rent October 9, 2008 27 / 34

slide-28
SLIDE 28

Data Non-spatial Spatial Trend

Directional Variograms

distance semivariance

0.02 0.04 0.06 0.08

  • 5000

10000 15000 20000

  • 45

5000 10000 15000 20000

  • 90

0.02 0.04 0.06 0.08

  • 135

Adrian Waddell (University of Waterloo) Rent October 9, 2008 28 / 34

slide-29
SLIDE 29

Data Non-spatial Spatial Trend

Directional Variograms: Variomap

dx dy

−5000 5000 −5000 5000

var1

0.05 0.10 0.15 0.20 0.25

Adrian Waddell (University of Waterloo) Rent October 9, 2008 29 / 34

slide-30
SLIDE 30

Data Non-spatial Spatial Trend

Fit of empirical variogram with OLS

Model chosen: Matern, nugget = 0.005 fixed, θ2 variabel, initial values : σ2 = 0.05 and φ = 2000 OLS Fit γols(h) = 0.005+0.0442·(1−matern(h, θ1 = 440.656, θ2 = 1)) WLS Fit γWLS(h) = 0.005+0.047·(1−matern(h, θ1 = 1999, θ2 = 1)) Sum of Squares: 0.00344686 and 54.92099 Practical Range: 1761.974 and 7997.04

Adrian Waddell (University of Waterloo) Rent October 9, 2008 30 / 34

slide-31
SLIDE 31

Data Non-spatial Spatial Trend

Fit of empirical variogram with OLS and WLS

  • 5000

10000 15000 20000 0.00 0.01 0.02 0.03 0.04 0.05 0.06 distance semivariance

OLS WLS

Adrian Waddell (University of Waterloo) Rent October 9, 2008 31 / 34

slide-32
SLIDE 32

Data Non-spatial Spatial Trend

ML and REML

Data set too large to run ML and REML Sampling doesn’t yield good results cutoff can’t be specified

Adrian Waddell (University of Waterloo) Rent October 9, 2008 32 / 34

slide-33
SLIDE 33

Data Non-spatial Spatial Trend

Discussion

Results: Data may be weakly stationary Data is likely to be isotopic Data may be homogeneous Variogram Model fit by eye, Matern looks best Range of 1.5km-5km makes sense (size of a township) Todo: In more detail analysis of trend. Maybe more complex non-spatial model (with postal code as covariate)

Adrian Waddell (University of Waterloo) Rent October 9, 2008 33 / 34

slide-34
SLIDE 34

Data Non-spatial Spatial Trend

End

THANK YOU

Adrian Waddell (University of Waterloo) Rent October 9, 2008 34 / 34