1st Edition
Geospatial Health Data Modeling and Visualization with R-INLA and Shiny
Geospatial health data are essential to inform public health and policy. These data can be used to quantify disease burden, understand geographic and temporal patterns, identify risk factors, and measure inequalities. Geospatial Health Data: Modeling and Visualization with R-INLA and Shiny describes spatial and spatio-temporal statistical methods and visualization techniques to analyze georeferenced health data in R. The book covers the following topics:
- Manipulating and transforming point, areal, and raster data,
- Bayesian hierarchical models for disease mapping using areal and geostatistical data,
- Fitting and interpreting spatial and spatio-temporal models with the integrated nested Laplace approximation (INLA) and the stochastic partial differential equation (SPDE) approaches,
- Creating interactive and static visualizations such as disease maps and time plots,
- Reproducible R Markdown reports, interactive dashboards, and Shiny web applications that facilitate the communication of insights to collaborators and policymakers.
The book features fully reproducible examples of several disease and environmental applications using real-world data such as malaria in The Gambia, cancer in Scotland and USA, and air pollution in Spain. Examples in the book focus on health applications, but the approaches covered are also applicable to other fields that use georeferenced data including epidemiology, ecology, demography or criminology. The book provides clear descriptions of the R code for data importing, manipulation, modelling, and visualization, as well as the interpretation of the results. This ensures contents are fully reproducible and accessible for students, researchers and practitioners.
I Geospatial health data and INLA
1. Geospatial health
Geospatial health data
Disease mapping
Communication of results
2. Spatial data and R packages for mapping
Types of spatial data
Areal data
Geostatistical data
Point patterns
Coordinate Reference Systems (CRS)
Geographic coordinate systems
Projected coordinate systems
Setting Coordinate Reference Systems in R
Shapefiles
Making maps with R
ggplot2
leaflet
mapview
tmap
3. Bayesian inference and INLA
Bayesian inference
Integrated Nested Laplace Approximations (INLA)
4. The R-INLA package
Linear predictor
The inla() function
Priors specification
Example
Data
Model
Results
Control variables to compute approximations
II Modeling and visualization
5. Areal data
Spatial neighborhood matrices
Standardized Incidence Ratio (SIR)
Spatial small area disease risk estimation
Spatial modeling of lung cancer in Pennsylvania
Spatio-temporal small area disease risk estimation
Issues with areal data
6. Spatial modeling of areal data. Lip cancer in Scotland
Data and map
Data preparation
Adding data to map
Mapping SIRs
Modeling
Model
Neighborhood matrix
Inference using INLA
Results
Mapping relative risks
Exceedance probabilities
7. Spatio-temporal modeling of areal data. Lung cancer in Ohio
Data and map
Data preparation
Observed cases
Expected cases
SIRs
Adding data to map
Mapping SIRs
Time plots of SIRs
Modeling
Model
Neighborhood matrix
Inference using INLA
Mapping relative risks
8. Geostatistical data
Gaussian random fields
Stochastic Partial Differential Equation approach (SPDE)
Spatial modeling of rainfall in Paraná, Brazil
Model
Mesh construction
Building the SPDE model on the mesh
Index set
Projection matrix
Prediction data
Stack with data for estimation and prediction
Model formula
inla() call
Results
Projecting the spatial field
Disease mapping with geostatistical data
9. Spatial modeling of geostatistical data. Malaria in The Gambia
Data
Data preparation
Prevalence
Transforming coordinates
Mapping prevalence
Environmental covariates
Modeling
Model
Mesh construction
Building the SPDE model on the mesh
Index set
Projection matrix
Prediction data
Stack with data for estimation and prediction
Model formula
inla() call
Mapping malaria prevalence
Mapping exceedance probabilities
10. Spatio-temporal modeling of geostatistical data. Air pollution in Spain
Map
Data
Modeling
Model
Mesh construction
Building the SPDE model on the mesh
Index set
Projection matrix
Prediction data
Stack with data for estimation and prediction
Model formula
inla() call
Results
Mapping air pollution predictions
III Communication of results
11. Introduction to R Markdown
R Markdown
YAML
Markdown syntax
R code chunks
Figures
Tables
Example
12. Building a dashboard to visualize spatial data with flexdashboard
The R package flexdashboard
R Markdown
Layout
Dashboard components
A dashboard to visualize global air pollution
Data
Table using DT
Map using leaflet
Histogram using ggplot2
R Markdown structure. YAML header and layout
R code to obtain the data and create the visualizations
13. Introduction to Shiny
Examples of Shiny apps
Structure of a Shiny app
Inputs
Outputs
Inputs, outputs and reactivity
Examples of Shiny apps
Example 1
Example 2
HTML Content
Layouts
Sharing Shiny apps
14. Interactive dashboards with flexdashboard and Shiny
An interactive dashboard to visualize global air pollution
15. Building a Shiny app to upload and visualize spatio-temporal data
Shiny
Setup
Structure of app.R
Layout
HTML content
Read data
Adding outputs
Table using DT
Time plot using dygraphs
Map using leaflet
Adding reactivity
Reactivity in dygraphs
Reactivity in leaflet
Uploading data
Inputs in ui to upload a CSV file and a shapefile
Uploading CSV file in server()
Uploading shapefile in server()
Accessing the data and the map
Handling missing inputs
Requiring input files to be available using req()
Checking data are uploaded before creating the map
Conclusion
16. Disease surveillance with SpatialEpiApp
Installation
Use of SpatialEpiApp
‘Inputs’ page
‘Analysis’ page
‘Help’ page
Appendix
A R installation and packages used in the book
A.1 Installing R and RStudio
A.2 Installing R packages
A.3 Packages used in the book
Biography
Paula Moraga is a Lecturer in the Department of Mathematical Sciences at the University of Bath. She received her Master’s in Biostatistics from Harvard University and her Ph.D. in Statistics from the University of Valencia. Dr. Moraga develops innovative statistical methods and open-source software for disease surveillance including R packages for spatio-temporal modeling, detection of clusters, and travel-related spread of disease. Her work has directly informed strategic policy in reducing the burden of diseases such as malaria and cancer in several countries.
"The stress is on practical usage of INLA modelling in a spatial context and hence the author shows the full code for several carefully selected examples. Essentially all the steps from the beginning (necessary data manipulation and preparation) via INLA analysis itself (often in several alternatives) to the results (plots and maps) are explained carefully and commented. This is very useful for anybody who wants to start with the powerful INLA but did not dare to go through the very powerful but notalways- fully-documented environment." ~Marek Brabec, ISCB News