confident spatial analysis and statistics in r geoda
play

Confident Spatial Analysis and Statistics in R & GeoDa Fri 24 th - PowerPoint PPT Presentation

Confident Spatial Analysis and Statistics in R & GeoDa Fri 24 th May Dr Nick Bearman @nickbearmanuk 10am 4:00pm/4:30pm What will you get from the course? Using a range of GIS software to perform a range of spatial analysis


  1. Confident Spatial Analysis and Statistics in R & GeoDa Fri 24 th May Dr Nick Bearman @nickbearmanuk 10am – 4:00pm/4:30pm

  2. What will you get from the course? • Using a range of GIS software to perform a range of spatial analysis – RStudio and GeoDa • Develop your confidence in using RStudio – Data handling – Scripts – Functions

  3. Course Outline • Intro • Spatial Analysis (GeoDa) – Spatial Autocorrelation – Clustering, Regression • RStudio – Mapping Recap • Decision Making (RStudio) – Buffers, Overlays – Spatial Joins

  4. Housekeeping • Log on! • Toilets • Fire Alarm • Breaks • Delegate Info Form • Reminder emails • Photos • Mailing list • Presentations, handouts and data online • bit.ly/csa-r

  5. Outline of the day • 10:10am – 10:30am – Spatial Analysis • 10:30am – 11:00am – P1: Spatial Analysis • 11:00am – 11:15am - Mapping & R Recap • 11:15am – 11:40am - P2: Mapping and R/RStudio • 11:40am – 12:00pm – Spatial Analysis • 12:00pm – 12:30pm – P3: Spatial Decision Making • 12:30pm – 1:30pm – Lunch • 1:30pm – 3:00pm – P3 ctd: Spatial Decision Making • 3:00pm – 3:15pm – Tea/Coffee • 3:15pm – 4pm/4:30pm – P4: Using your own data

  6. CDRC • ESRC: Getting more Information from Retail Data • Outreach – Stakeholder Users • Engagement – PhD / MSc / Training • Data and Services • Open, Safeguarded, Controlled

  7. Exploratory Data Analysis • First, a step back: • Data Analysis that is Exploratory • No formal hypothesis testing • Playing with and exploring the data • Usually start with descriptive statistics – Mean, St Dev, etc. • Usually first step to develop formal hypothesis

  8. Three levels of data analysis 1. Numeric 2. Descriptive Maps https://en.wikipedia.org/wiki/File:Passenger_numbers_for_London_Airports_in_a_bar_graph.png 3. Spatial Analytics

  9. Spatial Analysis • Why are we interested in space? • Where things are have an impact – Sometimes just because of space and what they are next to – More often, because of some other variable that varies over space • Spatial Data – Many Advantages – Also Limitations

  10. Challenges • Geocoding: locating individuals and events • Modifiable Areal Unit Problem • Spatial Dependence • Spatial Heterogeneity Solutions • Diagnosing spatial autocorrelation – Global and local indicators – Mapping • Spatial regression techniques

  11. Locating individuals and events • Location depends on research question and the availability of data – place of living or work? – geographic area of daily / yearly routines? – long-term longitudinal perspective? – at higher scale (e.g. neighbourhood, local authority) • To protect privacy geographical scale is often limited

  12. Modifiable areal unit problem “a problem arising from the imposition of artificial units of spatial reporting on continuous geographical phenomenon resulting in the generation of artificial spatial patterns” (Heywood, 1998) “States and other forms of socio - political organization […] exercise their power in part through the ability to draw and redraw boundaries inside and around their territories” (Agnew, 2005) Where (you) draw the boundaries is important • MPs in row over new seats set to push the boundaries – http://www.bbc.co.uk/news/uk-england-34701595 • Gerrymandering (US) / How to Lie with Maps

  13. … an example …

  14. … and the potential for error

  15. Ecological & individualistic fallacy • An ecological fallacy or ecological inference fallacy , is an error in the interpretation of statistical data in a study with macro data, whereby inferences about the nature of specific individuals are based solely upon aggregate statistics collected for the group to which those individuals belong. • Assuming every individual in the group has the group characteristics (e.g. IMD) Assuming everyone in this LSOA has a high IMD score (is highly deprived)

  16. Ecological & individualistic fallacy • An individualistic fallacy or individualistic inference fallacy , is an error in the interpretation of statistical data in a study with micro data, whereby inferences about the nature of specific individuals are based solely upon individual statistics, not taking influences by the upper hierarchy contexts into account. • Assume everyone is individual, and no one is impacted by the group characteristics (e.g. IMD impacts) We have lots of individual level data Mr Money Bags is a wealthy individual living in a highly deprived area Assuming that the area deprivation has NO impact on him

  17. Spatial Dependence (space is important) • Anselin (1988): “the existence of a functional relationship between what happens at one point in space and what happens elsewhere.” • There is an important link between spaces Prevalence of burglary Income per household 1 2 3 4 1 2 3 4 5 6 7 8 5 6 7 8 9 10 11 12 9 10 11 12 13 14 15 16 13 14 15 16 low medium high low medium high

  18. Spatial Heterogeneity (space is not important) • Not generated by spatial interaction. It refers to variation in relationships over space caused by the uniqueness of location or by spatially autocorrelated omitted variables. • The spatial variation is driven by an omitted variable, not by space itself. Female labour force participation Fertility levels 1 2 3 4 1 2 3 4 5 6 7 8 5 6 7 8 9 10 11 12 9 10 11 12 13 14 15 16 13 14 15 16 low medium high low medium high

  19. Spatial autocorrelation • A measure of similarity (correlation) over space • positive: high (or low) values in one area • negative: high & low next to each other (chessboard pattern)

  20. Defining neighbours To calculate spatial autocorrelation, we need to calculate a neighbourhood weight . This can be base on: • Contiguity (adjacent spatial units) • Distance (between centroids of polygons) • Limited number of nearest neighbours For the examples today we are using measures of contiguity.

  21. Contiguitymeasures First order queen First order rook 1 2 3 4 1 2 3 4 5 6 7 8 5 6 7 8 9 10 11 12 9 10 11 12 13 14 15 16 13 14 15 16

  22. Global measure of spatial autocorrelation Returns values of 1 0 -1 Positive None Negative Spatial autocorrelation

  23. Scatter plot of spatial autocorrelation IMD in Greater Manchester

  24. Clustering • Can look for unusual clusters • Where values are higher (or lower) than expected • Use univariate LISA: Local Indicators of Spatial Autocorrelation To identify statistically – significant clusters (Anselin, 1995)

  25. Local measures of spatial autocorrelation

  26. Local indicators of spatial autocorrelation (LISA maps) IMD Score in Greater Manchester White not significant Red High surrounded by High Blue Low surrounded by Low Light blue Low surrounded by High Pink High surrounded by Low

  27. Practical 1: Spatial Analysis • Using GeoDa & R for some analysis • Please experiment as we go through • IMDscore: • High score is more deprived • Low score is less deprived • Orange post-its if I am busy

  28. Mapping and R/RStudio Recap • Previously / yesterday: • What is GIS / spatial data? • What are coordinate systems? (WGS84 / BNG) • Using RStudio – Read CSV / shapefiles, Manage data – Loops & Scripts, Libraries – (Spatial) Data Structure in R • Will do a quick recap of RStudio & SP vs SF

  29. Coordinate Systems • Latitude and Longitude (WGS 1984) EPSG = 4326 – 52 ° N 37’ 30.32’’ (52.6250) 1 ° E 14’ 2.05’’ (1.2339) • British National Grid (Eastings & Northings) – Easting: 619301 Northing: 307416 EPSG = 27700 • UTM (Universal Transverse Mercator) − 621160.98, 3349893.53 meters, Zone 14 R EPSG = depends on zone • Why is it important? – Some data uses WGS84, some BNG, UTM – LSOA use BNG (Eastings/Northings) – Need to convert between the two

  30. This lists the variables you have This is where you can write scripts • Link to survey: Here will show either your files (the files tab) or your plots (the This is the console where you can type in commands plots tab)

  31. Shape Files • .shp the geometry (polygons) themselves • .shx extra geometry information • .dbf attribute information (dBase IV format) • .prj projection & coordinate system info • .cst text encoding https://en.wikipedia.org/wiki/Shapefile

  32. • Some code wraps: Above is an unfortunate file name – rename if you can. imd.csv is much easier!

  33. RStudio Notes • Use a script – Highlight, Ctrl-Enter or click Run – Save your scripts – Add comments (#) to your script • Also use projects (if you wish) • Beware of typos • Tab completion

  34. Spatial Data in R: sf & sp • sp • sf • • Developed 2005 Developed Oct 2016 • Uses ‘S4’ data types • Uses ‘S3’ data types • Often see @data • Extends data frames to • In use for a long time include spatial • • More advanced analysis Simple Features only possible in sp standard (currently) ISO 19125-1:2004 • Will replace sp long term https://cran.r-project.org/web/packages/sf/ https://cran.r-project.org/doc/Rnews/Rnews_2005-2.pdf

  35. Spatial Data in R: sf & sp • sf • sthelens <- st_read("sthelens.shp") • sp • sthelens <- readOGR(".", " sthelens")

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend