Techniques we’ll cover in this part:
In this section we’ll cover some of the fundamental concepts and operations when working with geospatial data in R. For now, we’ll be using non-linguistic data as examples, looking mainly at things like population statistics and other random things.
The sf
package, which stands for ‘simple features’, is used for lots of spatial operations in R and is now the recommended way of performing GIS in R. The spData
package contains lots of pre-built datasets for spatial analysis. We’ll also load in the tidyverse
set of packages, which we use for general data processing/plotting in R. Remember to install all of these packages first if you haven’t already:
install.packages('sf')
install.packages('spData')
install.packages('tidyverse')
Now load them in:
library(sf)
library(spData)
library(tidyverse)
Let’s look at the us_states
dataset that comes with the spData
package - use head()
to print out the first 6 rows:
head(us_states)
## Simple feature collection with 6 features and 6 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: -114.8136 ymin: 24.55868 xmax: -71.78699 ymax: 42.04964
## Geodetic CRS: NAD83
## GEOID NAME REGION AREA total_pop_10 total_pop_15
## 1 01 Alabama South 133709.27 [km^2] 4712651 4830620
## 2 04 Arizona West 295281.25 [km^2] 6246816 6641928
## 3 08 Colorado West 269573.06 [km^2] 4887061 5278906
## 4 09 Connecticut Norteast 12976.59 [km^2] 3545837 3593222
## 5 12 Florida South 151052.01 [km^2] 18511620 19645772
## 6 13 Georgia South 152725.21 [km^2] 9468815 10006693
## geometry
## 1 MULTIPOLYGON (((-88.20006 3...
## 2 MULTIPOLYGON (((-114.7196 3...
## 3 MULTIPOLYGON (((-109.0501 4...
## 4 MULTIPOLYGON (((-73.48731 4...
## 5 MULTIPOLYGON (((-81.81169 2...
## 6 MULTIPOLYGON (((-85.60516 3...
Each row is a different US state, and it contains lots of different columns:
What kind of object type is this in R?
class(us_states)
## [1] "sf" "data.frame"
As you can see, its class is a combination of ‘sf’ and ‘data.frame’, so while it has some special status containing spatial information, you can also treat it like a normal dataframe and apply the same kind of data processing/wrangling operations. For example, you can run the table()
function on a column to tabulate how many observations are in each level of that variable
table(us_states$REGION)
##
## Norteast Midwest South West
## 9 12 17 11
So what kind of special status does it have as a sf
class?
It has things like a CRS (coordinate reference system - more on that later)
It also has a $geometry
column, which contains all of the coordinates for the shapes. This is where all the important geographic information lies - it’s what gives points their location, and shapes their… shape.
We can input the whole dataset into the plot()
function, which will plot all the polygons and colour-code them by each column of metadata.
plot(us_states)
If you want to just plot a single set of the geometries with no colour-coding, you can directly reference the $geometry
column:
plot(us_states$geometry)
Or you can plot one state/polygon at a time by subsetting the dataframe as you normally would in R. For example, here’s the 21st row/state in the dataset:
plot(us_states$geometry[[21]])
Exercise
Keep running the code below to plot a random US state each time and guess which it is (let’s test our geography knowledge!)
<- us_states %>%
random_state sample_n(1)
plot(random_state$geometry)
$NAME random_state
## [1] "Florida"
You can also plot these using the regular ggplot
syntax. You don’t even need to specify columns for the X/Y values - R is clever and knows that it’s a spatial dataframe with a geometry column for the shape. I like to use the theme_void()
option when plotting maps since we don’t really need all the axes and gridlines etc.
ggplot(random_state) +
geom_sf(fill = 'blue') +
theme_void()
You can apply all the usual ggplot
customisation options. For example, we can try to make a really garish-looking map in hot pink with a thick, lime green, dashed border.
ggplot(random_state) +
geom_sf(fill = 'hotpink', colour = 'green', lwd = 5, lty = 'dashed') +
theme_void()
Nice.
You can even do cool things like st_buffer()
, which is basically like inflating the polygon like a balloon. Here’s what Florida looks like with st_buffer()
set to 30000:
%>%
random_state st_buffer(30000) %>%
ggplot() +
geom_sf() +
theme_void()
And here’s it set to 90000:
%>%
random_state st_buffer(90000) %>%
ggplot() +
geom_sf() +
theme_void()
This is quite useful when plotting things like rivers, which can be quite thin and hard to see on maps. For example, spData
comes with a built-in shapefile for the River Seine in France:
%>%
seine plot()
Plotting with a buffer:
%>%
seine st_buffer(5000) %>%
plot()
Like with any type of dataset in R, you can also colour-code your ggplot by the values in some column of data. Let’s colour-code the states by one of the metadata columns, the total population in 2010:
ggplot(us_states) +
geom_sf(aes(fill = total_pop_10)) +
theme_void()
Exercise
It’s time for some interactive exercises now:
AREA
and total_pop_10
and total_pop_15
columnsAnswers
Calculating population density and population change:
<- us_states %>%
us_states mutate(pop_dens_10 = total_pop_10 / as.numeric(AREA),
pop_dens_15 = total_pop_15 / as.numeric(AREA),
pop_dens_change = pop_dens_15 - pop_dens_10)
Plotting by population density:
#plot by population density in 2010
ggplot(us_states) +
geom_sf(aes(fill = pop_dens_10)) +
theme_void()
#this isn't very informative because District of Colombia is a massive outlier
%>%
us_states select(NAME, pop_dens_10) %>%
arrange(desc(pop_dens_10))
## Simple feature collection with 49 features and 2 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: -124.7042 ymin: 24.55868 xmax: -66.9824 ymax: 49.38436
## Geodetic CRS: NAD83
## First 10 features:
## NAME pop_dens_10 geometry
## 1 District of Columbia 3279.2769 MULTIPOLYGON (((-77.11976 3...
## 2 New Jersey 430.1923 MULTIPOLYGON (((-75.55945 3...
## 3 Rhode Island 385.1138 MULTIPOLYGON (((-71.19564 4...
## 4 Massachusetts 309.7496 MULTIPOLYGON (((-70.27553 4...
## 5 Connecticut 273.2488 MULTIPOLYGON (((-73.48731 4...
## 6 Maryland 212.1674 MULTIPOLYGON (((-79.47666 3...
## 7 Delaware 170.0704 MULTIPOLYGON (((-75.7886 39...
## 8 New York 151.1749 MULTIPOLYGON (((-74.0702 40...
## 9 Florida 122.5513 MULTIPOLYGON (((-81.81169 2...
## 10 Pennsylvania 107.5780 MULTIPOLYGON (((-80.51942 4...
#so either filter it out:
%>%
us_states filter(NAME != 'District of Columbia') %>%
ggplot() +
geom_sf(aes(fill = pop_dens_10))
#or log-transform the values:
ggplot(us_states) +
geom_sf(aes(fill = log(pop_dens_10))) +
theme_void()
Plotting by population change:
ggplot(us_states) +
geom_sf(aes(fill = pop_dens_change)) +
theme_void()
#again, not very useful because District of Colombia is a massive outlier
%>%
us_states select(NAME, pop_dens_change) %>%
arrange(desc(pop_dens_change))
## Simple feature collection with 49 features and 2 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: -124.7042 ymin: 24.55868 xmax: -66.9824 ymax: 49.38436
## Geodetic CRS: NAD83
## First 10 features:
## NAME pop_dens_change geometry
## 1 District of Columbia 353.986830 MULTIPOLYGON (((-77.11976 3...
## 2 Massachusetts 10.926915 MULTIPOLYGON (((-70.27553 4...
## 3 New Jersey 9.018396 MULTIPOLYGON (((-75.55945 3...
## 4 Maryland 8.719782 MULTIPOLYGON (((-79.47666 3...
## 5 Delaware 8.718132 MULTIPOLYGON (((-75.7886 39...
## 6 Florida 7.508354 MULTIPOLYGON (((-81.81169 2...
## 7 North Carolina 4.442798 MULTIPOLYGON (((-83.10023 3...
## 8 California 4.354330 MULTIPOLYGON (((-118.6034 3...
## 9 Virginia 3.936031 MULTIPOLYGON (((-75.66971 3...
## 10 Connecticut 3.651576 MULTIPOLYGON (((-73.48731 4...
#so either filter it out
%>%
us_states filter(NAME != 'District of Columbia') %>%
ggplot() +
geom_sf(aes(fill = pop_dens_change)) +
theme_void()
#or log-transform the values
%>%
us_states ggplot() +
geom_sf(aes(fill = log(pop_dens_change))) +
theme_void()
Let’s fly over the Atlantic and look at the UK. As well as us_states
, the spData
package also contains a dataset called world
- let’s take a look:
head(world)
## Simple feature collection with 6 features and 10 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: -180 ymin: -18.28799 xmax: 180 ymax: 83.23324
## Geodetic CRS: WGS 84
## # A tibble: 6 × 11
## iso_a2 name_long continent region_un subregion type area_km2 pop lifeExp
## <chr> <chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl>
## 1 FJ Fiji Oceania Oceania Melanesia Sove… 1.93e4 8.86e5 70.0
## 2 TZ Tanzania Africa Africa Eastern … Sove… 9.33e5 5.22e7 64.2
## 3 EH Western S… Africa Africa Northern… Inde… 9.63e4 NA NA
## 4 CA Canada North Am… Americas Northern… Sove… 1.00e7 3.55e7 82.0
## 5 US United St… North Am… Americas Northern… Coun… 9.51e6 3.19e8 78.8
## 6 KZ Kazakhstan Asia Asia Central … Sove… 2.73e6 1.73e7 71.6
## # ℹ 2 more variables: gdpPercap <dbl>, geom <MULTIPOLYGON [°]>
Let’s filter it down to just the UK and plot it - remember we can treat this as a normal dataframe still so we can use the filter()
function as we would for any other dataframe.
%>%
world filter(name_long == 'United Kingdom') %>%
ggplot() +
geom_sf()
It looks like a child has drawn it! Clearly, the set of world
polygons is fairly low resolution.
Luckily, I’ve come prepared with my own UK geospatial dataset. We call these shapefiles. We can load in external shapefiles by using the read_sf()
function, similar to how you might read in a csv file using read_csv()
. I’ve saved this shapefile inside a subfolder called shapefiles
within the directory containing this script, and I recommend you do the same.
NOTE on files: to open up a shapefile, you just need its .shp
and .shx
files, but ideally you want the extra files too: the .dbf
file contains info on labels, and the .prj
file contains the CRS (again - more on that later). These should all be inside the same directory, with matching filenames, but when loading in the shapefile you only need to refer to the .shp file like so:
<- read_sf('shapefiles/uk-outline.shp') uk
Let’s plot it to see if the quality is better
ggplot(uk) +
geom_sf()
This isn’t very interesting on its own, because it’s all just one polygon. Let’s load in a different shapefile, of England only, which is split by region instead.
<- read_sf('shapefiles/uk-regions.shp') uk.regions
As we can see, this splits England up by its 9 NUTS1 regions (NUTS stands for the Nomenclature of Territorial Units for Statistics), i.e. things like North East, Yorkshire and The Humber, West Midlands, East of England etc.
uk.regions
## Simple feature collection with 9 features and 7 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: 82668.52 ymin: 5336.966 xmax: 655653.8 ymax: 657536
## Projected CRS: OSGB36 / British National Grid
## # A tibble: 9 × 8
## RGN22CD RGN22NM BNG_E BNG_N LONG LAT GlobalID geometry
## <chr> <chr> <dbl> <dbl> <dbl> <dbl> <chr> <MULTIPOLYGON [m]>
## 1 E120000… North … 417314 600356 -1.73 55.3 5a23ed0… (((450254 525947.3, 4502…
## 2 E120000… North … 350014 506279 -2.77 54.4 86f3c3e… (((327866.9 373722.3, 32…
## 3 E120000… Yorksh… 446902 448736 -1.29 53.9 e98894d… (((495603.1 422511.5, 49…
## 4 E120000… East M… 477659 322635 -0.850 52.8 94937b2… (((552751.4 326659.3, 55…
## 5 E120000… West M… 386294 295477 -2.20 52.6 1b27107… (((402840.5 368845.6, 40…
## 6 E120000… East o… 571078 263235 0.504 52.2 ab4f9e4… (((570605.1 181382.2, 57…
## 7 E120000… London 517515 178392 -0.309 51.5 acffd85… (((516122.7 172421.3, 51…
## 8 E120000… South … 470062 172924 -0.993 51.5 0ac18c7… (((429038.4 84843.5, 429…
## 9 E120000… South … 285013 102567 -3.63 50.8 72d5f65… (((83962.84 5401.15, 839…
We can plot this using plot()
or ggplot()
as we saw earlier, but a word of warning first: this takes a long time (at least on my ancient 2014 MacBook!) because the shapefile is so detailed and complex.
%>%
uk.regions ggplot() +
geom_sf()
We can quantify this using the object_size()
function from the pryr
package:
library(pryr)
object_size(uk.regions)
## 23.51 MB
23MB is pretty big! (for comparison, the size of the world
shapefile we were plotting earlier is just 0.3MB, despite containing over 170 polygons/countries)
We can simplify the complexity of the UK region shapefile to reduce its file size (and therefore the time taken to plot it). For this, we need to load in the rmapshaper
package and use the ms_simplify()
function. We provide a proportion value (from 0 to 1) to the keep
argument, which corresponds to the proportion of points we want to keep, i.e. the lower the value, the simpler the resulting shapefile. Polygons are essentially just sets of points joined up like in the dot-to-dot books we all had as children! Naturally, the fewer of these points you keep, the simpler and more ‘jaggedy’ the resulting shape will be.
Let’s try 0.02, keeping just 2% of the original points.
library(rmapshaper)
<- uk.regions %>%
uk.regions ms_simplify(keep = 0.02)
This is now just over 600KB:
object_size(uk.regions)
## 612.14 kB
This means it’s much quicker to plot, and quality is still absolutely fine - in fact it doesn’t even look any different! (note: RGN22NM
is the name of the column containing the label for each region - talk about unintuitive column names…)
%>%
uk.regions ggplot() +
geom_sf(aes(fill = RGN22NM)) +
theme_void()
Exercise
How far can you push it? Reload the uk.regions
shapefile in and try re-simplifying with different values of keep
inside ms_simplify()
to see just how small/simple you can make it without visibly losing significant quality. Or approach this from a more fun angle: how ridiculous can you make the UK shapefile look with more extreme simplification? (hint: you might want to go really small with the value of keep
)
There’s another method you can use to simplify spatial polygons, using st_simplify()
instead and specifying a dTolerance
argument, e.g. uk.small <- uk.regions %>% st_simplify(dTolerance = 500)
. Try this with different values of dTolerance
(higher values = more simplified) - how does the result compare with the output of ms_simplify()
?
Answers
#reload in to undo our previous simplification
<- read_sf('shapefiles/uk-regions.shp')
uk.regions
#using ms_simplify, keeping just 0.01% of points (shrinks to 17kb)
.1 <- uk.regions %>% ms_simplify(keep = 0.0001)
uk.small
#still looks decent!
ggplot(uk.small.1) +
geom_sf()
#using ms_simplify, keeping just 0.0001% of points (shrinks to 14kb)
.2 <- uk.regions %>% ms_simplify(keep = 0.000001)
uk.small
#not so decent...
ggplot(uk.small.2) +
geom_sf()
#using st_simplify instead, setting tolerance to 500m (shrinks to 140kb)
.3 <- uk.regions %>% st_simplify(dTolerance = 500)
uk.small
#dosn't look that different to the output of ms_simplify()
ggplot(uk.small.3) +
geom_sf()
#using st_simplify instead, setting tolerance to 10,000m (shrinks to 18kb)
.4 <- uk.regions %>% st_simplify(dTolerance = 10000)
uk.small
#notice how the borders between polygons aren't preserved here - gaps emerge!
#(this wasn't an issue with ms_simplify)
ggplot(uk.small.4) +
geom_sf()
Bonus: we can also use the smoothr
package to smooth polygons and produce what I like to call the ‘melted ice sculpture’ effect!
#starting with uk.small.1, which is the one we shrunk using ms_simplify(keep = 0.0001)
.6 <- uk.small.1 %>% smoothr::smooth(method = "ksmooth", smoothness = 3)
uk.small
ggplot(uk.small.6) +
geom_sf()
Ok that’s enough messing aroud - before we carry on, let’s revert back to our original simplification of the uk.regions
shapefile:
<- read_sf('shapefiles/uk-regions.shp') %>%
uk.regions ms_simplify(keep = 0.02)
One thing we might want to do is plot two separate sets of polygon geometries and calculate the degree of overlap between them.
Let’s use this new simplified version of the England regions shapefile alongside a shapefile denoting the Areas of Outstanding Natural Beauty (AONB) in England, downloaded from the government’s open data site. We’ll be asking which region of England is the most ‘beautiful’, and we’ll be answering this scientifically based on the relative proportion of each region’s area that’s classified as an ‘area of outstanding natural beauty’.
<- read_sf('shapefiles/AONB.shp') aonb
This is another large shapefile with unncessary detail:
object_size(aonb)
## 32.72 MB
So let’s simpify it to the same degree as we did for the England regions:
<- aonb %>%
aonb ms_simplify(keep = 0.02)
And here’s what it looks like:
ggplot(aonb) +
geom_sf()
Before we plot these two shapefiles over the top of each other, we need to check their coordinate reference systems. This topic is way beyond the scope of this workshop, but essentially the CRS is used to precisely locate points on the earth’s surface. It combines a specific projection, origin point, and unit of measurement (e.g. are the points specified in latitude/longitude, or easting/northing?). Every shapefile has a CRS, or if it doesn’t then R will only be able to plot the shapefile in an empty space.
There is a whole science behind this topic (it isn’t helped by the wonky shape of the Earth and how it’s not a perfect sphere) but the specifics aren’t important: all we need to know is that if we want to do any computation involving two different shapefiles (e.g. measuring overlap, plotting one over the other etc), then they must be projected in the same way with identical CRSs.
We can check this with the st_crs()
function:
st_crs(uk.regions)
## Coordinate Reference System:
## User input: OSGB36 / British National Grid
## wkt:
## PROJCRS["OSGB36 / British National Grid",
## BASEGEOGCRS["OSGB36",
## DATUM["Ordnance Survey of Great Britain 1936",
## ELLIPSOID["Airy 1830",6377563.396,299.3249646,
## LENGTHUNIT["metre",1]]],
## PRIMEM["Greenwich",0,
## ANGLEUNIT["degree",0.0174532925199433]],
## ID["EPSG",4277]],
## CONVERSION["British National Grid",
## METHOD["Transverse Mercator",
## ID["EPSG",9807]],
## PARAMETER["Latitude of natural origin",49,
## ANGLEUNIT["degree",0.0174532925199433],
## ID["EPSG",8801]],
## PARAMETER["Longitude of natural origin",-2,
## ANGLEUNIT["degree",0.0174532925199433],
## ID["EPSG",8802]],
## PARAMETER["Scale factor at natural origin",0.9996012717,
## SCALEUNIT["unity",1],
## ID["EPSG",8805]],
## PARAMETER["False easting",400000,
## LENGTHUNIT["metre",1],
## ID["EPSG",8806]],
## PARAMETER["False northing",-100000,
## LENGTHUNIT["metre",1],
## ID["EPSG",8807]]],
## CS[Cartesian,2],
## AXIS["(E)",east,
## ORDER[1],
## LENGTHUNIT["metre",1]],
## AXIS["(N)",north,
## ORDER[2],
## LENGTHUNIT["metre",1]],
## USAGE[
## SCOPE["Engineering survey, topographic mapping."],
## AREA["United Kingdom (UK) - offshore to boundary of UKCS within 49°45'N to 61°N and 9°W to 2°E; onshore Great Britain (England, Wales and Scotland). Isle of Man onshore."],
## BBOX[49.75,-9,61.01,2.01]],
## ID["EPSG",27700]]
That’s quite a lot of reader-unfriendly jargon!
Let’s run a comparison between the CRS of our two shapefiles:
st_crs(uk.regions) == st_crs(aonb)
## [1] TRUE
Great, they’re identical! We don’t have to mess around with anything then (spoiler alert: we’ll see how to change a CRS and reproject spatial data in the next part of this workshop).
Now that we know our two shapefiles have the same projection, we can plot them together by simply adding another geom_sf()
layer and specifying the second shapefile:
ggplot(uk.regions) +
geom_sf(aes(fill = RGN22NM)) +
geom_sf(data = aonb, fill = 'red', alpha = 0.5) +
theme_void()
Notice how some of the AONBs actually straddle two different regions (e.g. the North Pennines overlap with both the North West and the North East). This makes our life a bit more difficult, because it’s not just a simple case of working out which areas ‘belong’ to which regions and then calculating their total area. Instead, we’ll have to calculate how much of each region of England is covered by an AONB.
To do this, we need to first find the total area of each UK region. This is nice and easy: we can do this by applying the st_area()
function to our shapefile, which returns a set of area values for each unique geometry in the shapefile (i.e. each region of the UK).
st_area(uk.regions)
## Units: [m^2]
## [1] 8599596685 14170875783 15410922881 15645545178 13003754980 19133442615
## [7] 1573905346 19090850368 23856297826
Let’s add this back to our shapefile in a new column called total_area
:
$total_area <- st_area(uk.regions) uk.regions
Then find the area of overlap. At the moment, there are 34 rows (i.e. 34 unique geometries) in our AONB shapefile, one for each area of outstanding natural beauty in England. We want to collapse this down to a single multipolygon, i.e. a single row. We can do this by using the st_union()
function - let’s do this, and save it a new shapefile called aonb.single
:
<- aonb %>% st_union() aonb.single
Now we can use the st_intersection()
function to only return the parts of uk.regions
that are overlapped with our AONB shapefile
<- st_intersection(uk.regions, aonb.single) overlapped
Let’s plot this intersection and colour-code it by region:
ggplot(overlapped) +
geom_sf(aes(fill = RGN22NM)) +
theme_void()
On the surface it might just look like the original AONB shapefile we loaded in, so you might be thinking what was the point in everything we just did? Importantly, this is actually the UK region shapefile but only those parts that are overlapped with the AONB shapefile, i.e. it’s still split by region with 9 rows/geometries:
overlapped
## Simple feature collection with 9 features and 8 fields
## Geometry type: GEOMETRY
## Dimension: XY
## Bounding box: xmin: 85816.02 ymin: 7101.81 xmax: 653767.1 ymax: 650535.1
## Projected CRS: OSGB36 / British National Grid
## # A tibble: 9 × 9
## RGN22CD RGN22NM BNG_E BNG_N LONG LAT GlobalID total_area
## * <chr> <chr> <dbl> <dbl> <dbl> <dbl> <chr> [m^2]
## 1 E12000001 North East 417314 600356 -1.73 55.3 5a23ed0… 8.60e 9
## 2 E12000002 North West 350014 506279 -2.77 54.4 86f3c3e… 1.42e10
## 3 E12000003 Yorkshire and The Hu… 446902 448736 -1.29 53.9 e98894d… 1.54e10
## 4 E12000004 East Midlands 477659 322635 -0.850 52.8 94937b2… 1.56e10
## 5 E12000005 West Midlands 386294 295477 -2.20 52.6 1b27107… 1.30e10
## 6 E12000006 East of England 571078 263235 0.504 52.2 ab4f9e4… 1.91e10
## 7 E12000007 London 517515 178392 -0.309 51.5 acffd85… 1.57e 9
## 8 E12000008 South East 470062 172924 -0.993 51.5 0ac18c7… 1.91e10
## 9 E12000009 South West 285013 102567 -3.63 50.8 72d5f65… 2.39e10
## # ℹ 1 more variable: geometry <GEOMETRY [m]>
This means we can easily calculate the area of this overlapped
shapefile using the same method as before, i.e. using the st_area()
function. Let’s do this and save it back to the shapefile in a column simply called beaut_area
:
$beaut_area <- st_area(overlapped) overlapped
We now have two area measures for each English region: its total area in the total_area
column (in m2) and how much of it is an area of outstanding natural beauty in the beaut_area
column (also measured in m2).
Let’s do a few things here to finish off: take our overlapped
shapefile, add a column called beaut_prop
which is the proportion of each region that is an AONB (simply divide beaut_area
by total_area
), then drop the geometry column to turn it into a simple dataframe, select only the columns containing the region name (RGN22NM
) and this proportion (beaut_prop
), then sort by these values:
%>%
overlapped mutate(beaut_prop = beaut_area / total_area) %>%
st_drop_geometry() %>%
select(RGN22NM, beaut_prop) %>%
arrange(beaut_prop)
## # A tibble: 9 × 2
## RGN22NM beaut_prop
## <chr> [1]
## 1 London 0.00184
## 2 East Midlands 0.0332
## 3 East of England 0.0549
## 4 Yorkshire and The Humber 0.0599
## 5 West Midlands 0.0983
## 6 North West 0.110
## 7 North East 0.165
## 8 South East 0.259
## 9 South West 0.294
There you have it: the South West is officially the most beautiful region, with almost 30% of its area classified as an area of outstanding natural beauty! Yorkshire and The Humber is lagging behind at almost 6%, but spare a thought for poor Londoners where the value is just 0.18%!
They do say beauty is in the eye of the beholder though…
Exercise
Now it’s your turn: search the web for some shapefiles - you’d be surprised at how many there are! Try downloading them, messing around with them, plotting them etc (remember to resize them first if they’re large and you don’t want to risk waiting an age for R to do its thing)
Here are some random examples:
form == 'canal'
or form == 'lake'
first.Example
Here’s one I prepared earlier, plotting canals and inland rivers in the North West (in blue) alongside the locations of Japanese knotweed (in red), all overlaid on a map of UK postcode areas:
<- read_sf("shapefiles/uk-areas.shp") %>%
uk.areas ms_simplify(keep = 0.1)
<- read_sf("shapefiles/NBN_Atlas_occurrences_-_Japanese_Knotweed_(View_Layer).shp")
knotweed
<- read_sf("~/Downloads/oprvrs_essh_gb/data/WatercourseLink.shp")
uk.water
%>%
uk.areas ggplot() +
geom_sf() +
geom_sf(data = filter(uk.water, form %in% c('canal', 'inlandRiver')),
colour = 'blue', alpha = 0.4) +
geom_sf(data = knotweed, colour = 'red', alpha = 0.3, size = 0.5) +
coord_sf(xlim = c(-3, -2), ylim = c(53.2, 53.7)) +
theme_void()