This vignette describes how to visualise data with maps using UK Statistical Geography resources from the Office for National Statistics (ONS).

What is a Statistical Geography?

The ONS maintains a registry of Statistical Geographies. These are areas in the UK, as defined by various geographic hierarchies (e.g. Council areas in an Administrative hierarchy or NHS Regions in the Health hierarchy). Each area is issued with a code and an official name along with other information (e.g. what the parent area is or the geometry of the boundary).

The data comes from the ONS Geoportal which has an RDF representation available from ONS Geography Linked Data.

Downloading data

We’ll start by downloading some data. We’ll be looking at deaths in Welsh care homes.

We’re not going to use get_cube() but instead use SPARQL to extract a slice from the cube. Our query will filter the observations to find deaths by any cause and which took place in any location (i.e. in an Ambulance or Hospice etc). We’ll use the GROUP BY clause to SUM the count of deaths. This gives us a total number of deaths among care home users in Wales since records began.

wales_ch_deaths <- "
PREFIX wgd: <http://gss-data.org.uk/data/gss_data/covid-19/wg-notifications-of-deaths-of-residents-related-to-covid-19-in-adult-care-homes#dimension/>
PREFIX cause: <http://gss-data.org.uk/data/gss_data/covid-19/wg-notifications-of-deaths-of-residents-related-to-covid-19-in-adult-care-homes#concept/cause-of-death/>
PREFIX location: <http://gss-data.org.uk/data/gss_data/covid-19/wg-notifications-of-deaths-of-residents-related-to-covid-19-in-adult-care-homes#concept/location-of-death/>

SELECT ?geo (SUM(?deaths) AS ?total_deaths)
WHERE {
  ?obs wgd:cause-of-death cause:total ;
       wgd:location-of-death location:total;
       <http://gss-data.org.uk/def/measure/count> ?deaths ;
       wgd:notification-date ?date ;
       wgd:area-code ?geo .
  ?geo <http://statistics.data.gov.uk/def/statistical-entity#code> <http://statistics.data.gov.uk/id/statistical-entity/W06> .

}

GROUP BY ?geo
"

Then we can execute the query against the COGS SPARQL endpoint:

deaths <- query(wales_ch_deaths, "https://staging.gss-data.org.uk/sparql")

This is what the response looks like:

knitr::kable(head(deaths))
geo total_deaths
http://statistics.data.gov.uk/id/statistical-geography/W06000009 172
http://statistics.data.gov.uk/id/statistical-geography/W06000010 336
http://statistics.data.gov.uk/id/statistical-geography/W06000011 350
http://statistics.data.gov.uk/id/statistical-geography/W06000012 222
http://statistics.data.gov.uk/id/statistical-geography/W06000013 235
http://statistics.data.gov.uk/id/statistical-geography/W06000014 158

The geo column provides us with a character vector of URIs for statistical geographies. We can use these URIs to download descriptions of the geographies. We use get_geography() to download the descriptions from the ONS Linked Geography endpoint. Note we’re setting a flag to also download the boundaries.

geo <- get_geography(deaths$geo, include_geometry = T)

Creating a thematic map

We can use these boundaries to visualise the data on a map.

The modern approach to spatial data in R is to create a simple feature sf object. This is a data frame with a particular column chosen to be the active geometry.

library(sf)
#> Linking to GEOS 3.7.1, GDAL 2.2.3, PROJ 4.9.3
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

geo_sf <- st_as_sf(geo, wkt="boundary", crs="WGS84") %>%
  left_join(mutate(deaths, geo=uri(geo)), by=c("uri"="geo"))

Finally, we can use this sf object in ggplot to create a choropleth:

library(ggplot2)

ggplot(geo_sf) + 
  geom_sf(aes(fill=total_deaths), colour="white") + 
  scale_fill_viridis_c("Total Deaths") + 
  labs(title="Deaths among Care Home Users in Wales", 
       subtitle="Since the outbreak of COVID-19") +
  theme_minimal()