1st Edition

Applied Spatial Statistics and Econometrics Data Analysis in R

By Katarzyna Kopczewska Copyright 2021
    620 Pages 200 B/W Illustrations
    by Routledge

    620 Pages 200 B/W Illustrations
    by Routledge

    This textbook is a comprehensive introduction to applied spatial data analysis using R. Each chapter walks the reader through a different method, explaining how to interpret the results and what conclusions can be drawn. The author team showcases key topics, including unsupervised learning, causal inference, spatial weight matrices, spatial econometrics, heterogeneity and bootstrapping. It is accompanied by a suite of data and R code on Github to help readers practise techniques via replication and exercises.

    This text will be a valuable resource for advanced students of econometrics, spatial planning and regional science. It will also be suitable for researchers and data scientists working with spatial data.

    Introduction

    Statement by the American Statistical Association on statistical significance and p-value used in the book

    Acknowledgments

    Chapter 1: Basic operations in the R software (Mateusz Kopyt)
    1.1 About the R software
    1.2. The R software interface
    1.2.1 R Commander
    1.2.2. RStudio
    1.3 Using help
    1.4 Additional packages
    1.5 R Language - basic features
    1.6 Defining and loading data
    1.7 Basic operations on objects
    1.8 Basic statistics of the data set
    1.9 Basic visualizations
    1.9.1 Scatterplot and line chart
    1.9.2 Column chart
    1.9.3 Pie chart
    1.9.4 Boxplot
    1.10 Regression in examples

    Chapter 2: Spatial data, R classes and basic graphics (Katarzyna Kopczewska)
    2.1 Loading and basic operations on spatial vector data
    2.2. Creating, checking and converting spatial classes
    2.3 Selected color palettes
    2.4 Basic contour maps with a color layer
    Scheme 1 - with colorRampPalette() from the grDevices:: package
    Scheme 2 - with choropleth() from the GISTools:: package
    Scheme 3 - with findInterval() from the base:: package
    Scheme 4 - with findColours() from the classInt:: package
    Scheme 5 - with spplot() from the sp:: package
    2.5 Basic operations and graphs for point data
    Scheme 1 - with points() from the graphics:: package – locations only
    Scheme 2 - with spplot() from the sp:: package - locations and values
    Scheme 3 - with findInterval() from the base:: package - locations, values, different size of symbols
    2.6 Basic operations on rasters
    2.7 Basic operations on grids
    2.8 Spatial geometries

    Chapter 3: Spatial data from the Web API (Mateusz Kopyt, Katarzyna Kopczewska)
    3.1 What is the API?
    3.2. Creating contextual maps with use of API
    3.3 Ways to visualize spatial data - maps for point and regional data
    Scheme 1 - with bubbleMap() from the RgoogleMaps:: package
    Scheme 2 - with ggmap() from the ggmap:: package
    Scheme 3 - with PlotOnStaticMap() from the RgoogleMap:: package
    Scheme 4 - with RGoogleMaps:: GetMap() and conversion of staticMap into a raster
    3.4 Spatial data in vector format - example of the OSM database
    3.5 Access to non-spatial internet databases and resources via API - examples
    3.6 Geo-coding of data

    Chapter 4: Spatial weight matrices, distance measurement, tessellation, spatial statistics (Katarzyna Kopczewska, Maria Kubara)
    4.1. Introduction to spatial data analysis
    4.2 Spatial weights matrix
    4.2.1 General framework for creating spatial weights matrices
    4.2.2 Selection of a neighborhood matrix
    4.2.3 Neighborhood matrices according to the contiguity criterion
    4.2.4 Matrix of k nearest neighbors (knn)
    4.2.5 Matrix based on distance criterion (neighbours in a radius of d km)
    4.2.6 Inverse distance matrix
    4.2.7 Summarizing and editing of spatial weights matrix
    4.2.8 Spatial lags and higher order neighborhood
    4.2.9 Creating weights matrix based on group membership
    4.3 Distance measurement and spatial aggregation
    4.4 Tessellation
    4.5 Spatial statistics
    4.5.1 Global statistics
    4.5.1.1 Global Moran I statistics
    4.5.1.2 Global Geary C statistics
    4.5.1.3 Join-count statistics
    4.5.2. Local spatial autocorrelation statistics
    4.5.2.1 Local Moran I statistics (LISA)
    4.5.2.2 Local Geary C statistics
    4.5.2.3 Local Getis-Ord Gi statistics
    4.5.2.4. Local spatial heteroscedasticity (LOSH)
    4.6 Spatial cross-correlations for two variables
    4.7 Correlogram

    Chapter 5: Applied spatial econometrics (Katarzyna Kopczewska)
    5.1 Value added from spatial modelling and classes of models
    5.2 Basic cross-sectional models
    5.2.1 Estimation
    5.2.2 Quality assessment of spatial models
    5.2.2.1 Information criteria and pseudo R2 in assessing model fit
    5.2.2.2 Test for heteroskedasticity of model residuals
    5.2.2.3 Residual autocorrelation tests
    5.2.2.4 LM tests for model type selection
    5.2.2.5 LR and Wald tests for model restrictions
    5.2.3 Selection of spatial weight matrix and modelling of diffusion strength
    5.2.4 Forecasts in spatial models
    5.2.5 Causality
    5.3 Selected specifications of cross-sectional spatial models
    5.3.1 Uni-directional spatial interaction models
    5.3.2 Cumulative models
    5.3.3 Bootstrapped models for big data
    5.3.4 Models for grid data
    5.4 Spatial panel models

    Chapter 6: Geographically Weighted Regression - modelling spatial heterogeneity (Piotr Ćwiakowski)
    6.1 Geographically weighted regression
    6.2 Basic estimation of GWR model
     6.2.1 Estimation of the reference OLS model
    6.2.2 Choosing the optimal bandwidth for a dataset
    6.2.3 Local geographically weighted statistics
    6.2.4 Geographically weighted regression estimation
    6.2.5 Basic diagnostic tests of the GWR model
    6.2.6 Testing the significance of parameters in GWR
    6.2.7 Selection of the optimal functional form of the model
    6.2.8 GWR with heteroskedastic random error
    6.3 The problem of collinearity in GWR models
    6.3.1 Diagnosing collinearity in GWR
    6.4. Mixed GWR
    6.5. Robust regression in the GWR model
    6.6. Geographically and Temporally Weighted Regression (GTWR)

    Chapter 7: Unattended spatial learning (Katarzyna Kopczewska)
    7.1 Clustering of spatial points with k-means, PAM and CLARA algorithms
    7.2 Clustering with the DBSCAN algorithm
    7.3 Spatial Principal Component Analysis
    7.4 Spatial Drift
    7.5 Spatial hierarchical clustering
    7.6 Spatial oblique decision tree

    Chapter 8: Spatial point pattern analysis and spatial interpolation (Kateryna Zabarina)
    8.1. Introduction and main definitions
    8.1.1. Dataset
    8.1.2. Creation of window and point pattern
    8.1.3. Marks
    8.1.4. Covariates
    8.1.5. Duplicated points
    8.1.6. Projection and rescaling
    8.2. Intensity-based analysis of unmarked point pattern
    8.2.1. Quadrat test
    8.2.2. Tests with spatial covariates
    8.3. Distance-based analysis of the unmarked point pattern
    8.3.1. Distance-based measures
    8.3.1.1. Ripley’s K function
    8.3.1.2. F function
    8.3.1.3. G function
    8.3.1.4. J function
    8.3.1.5. Distance-based CSR tests
    8.3.2. Monte-Carlo tests
    8.3.3. Envelopes
    8.3.4. Non-graphical tests
    8.4. Selection and estimation of a proper model for unmarked point pattern
    8.4.1. Theoretical note
    8.4.2. Choice of parameters
    8.4.3. Estimation and results
    8.4.4. Conclusions
    8.5. Intensity-based analysis of marked point pattern
    8.5.1. Segregation test
    8.6. Correlation and spacing analysis of the marked point pattern
    8.6.1. Analysis under assumption of stationarity
    8.6.1.1. K function variations for multitype pattern
    8.6.1.2. Mark connection function
    8.6.1.3. Analysis of within and between types of dependence
    8.6.1.4. Randomisation test of components’ independence
    8.6.2. Analysis under assumption of non-stationarity
    8.6.2.1. Inhomogeneous K function variations for multitype pattern
    8.7. Selection and estimation of a proper model for unmarked point pattern
    8.7.1. Theoretical note
    8.7.2. Choice of optimal radius
    8.7.3. Within-industry interaction radius
    8.7.4. Between-industry interaction radius
    8.7.5. Estimation and results
    8.7.6. Model with no between-industry interaction
    8.7.7. Model with all possible interactions
    8.8. Spatial interpolation methods - kriging
    8.8.1. Basic definitions
    8.8.2. Description of chosen kriging methods
    8.8.3. Data preparation for the study
    8.8.4. Estimation and discussion

    Chapter 9: Spatial Sampling and Bootstrap (Katarzyna Kopczewska, Piotr Ćwiakowski)
    9.1 Spatial point data - object classes and spatial aggregation
    9.2 Spatial sampling - randomization / generation of new points on the surface
    9.3 Spatial sampling - sampling of sub-samples from existing points
    9.3.1 Simple sampling
    9.3.2 The options of the sperrorest:: package
    9.3.3 Sampling points from areas determined by the k-means algorithm - block bootstrap
    9.3.4 Sampling points from moving blocks (moving block bootstrap, MBB)
    9.4. The use of spatial sampling and bootstrap in cross-validation of models

    Chapter 10: Spatial Big Data (Piotr Wójcik)
    10.1. Examples of big data usage
    10.2. Spatial big data
    10.2.1. Spatial data types
    10.2.2. Challenges related to the use of spatial Big Data
    10.2.2.1. Processing of large data sets
    10.2.2.2. Mapping and reduction
    10.2.2.3. Spatial data indexing
    10.3. The sf:: package - simple features
    10.3.1 sf class – a special data frame
    10.3.2 Data with POLYGON geometry
    10.3.3 Data with POINT geometry
    10.3.4 Visualization using the ggplot2:: package
    10.3.5 Selected functions for spatial analysis
    10.4. Using the dplyr:: package functions
    10.5. Example analysis of large raster data
    10.5.1. Measurement of economic inequalities from space
    10.5.2. Analysis using the raster:: package functions
    10.5.3 Other functions of the raster:: package
    10.5.4 Potential alternative – stars:: package

    Chapter 11: Spatial unsupervised learning – applications of market basket analysis in geomarketing (Alessandro Festi)
    11.1 Introduction to market basket analysis
    11.2 Data needed in spatial market basket analysis
    11.3 Simulation of data
    11.4 The market basket analysis technique applied to geolocation data
    11.5 Spatial association rules
    11.6 Applications to geomarketing
    11.6.1 Finding the best location for a business
    11.6.2 Targeting
    11.6.3 Discovery of competitors
    11.7 Conclusions and further approaches

    Appendix 1: Data used in the examples
    A1. Data set No. 1 / dataset1 / - poviat panel data with many variables
    A2. Dataset no 2 /dataset2/ – geo-located point data
    A3. Dataset no 3 /dataset3/ – monthly unemployment rate in poviats (NTS4)
    A4. Dataset no 4 /dataset4/ - grid data for population
    A5. Shapefiles of countour maps – for poviats (NTS4), regions (NTS2), country (NTS0) and registration areas
    A6. Raster data on night light intensity on Earth in 2013
    A7. Population in cities in Poland
    Appendix 2: Links between packages
    Appendix 3: Spatial data sets in R packages
    References
    Index of terms
    Index of R packages
    Index of R commands

    Biography

    Katarzyna Kopczewska is an associate professor at University of Warsaw, Faculty of Economic Sciences. As a quantitative economist, she deals with spatial modelling of geolocalised economic processes – location and co-location, agglomeration, concentration, diffusion, spatial interactions in relation to economic phenomena, companies and real estate but also regional policy or public-sector activities. She conducts methodological research on the implementation of data science methods for spatial analysis and combining them with classical spatial statistics and econometrics in R. She combines quantitative solutions with theory and problems of regional science and economic geography. She serves at the European Regional Science Association (ERSA).