Introduction

There are lots of ways to produce maps in R. But, however, they are drawn, two things are usually needed to produce a choropleth map of the sort seen in previous sessions:

  • first, some data;
  • second, a map to join the data to.

The join is usually made possible by the data and the map containing the same variable; for example, using the same ID codes for all the places in the data as in the map. This implies that what is contained in the data set are measurements of the places shown in the map. Sometimes maps will come pre-bundled with the data of interest so, in effect, the join has already been made.

Once we have the data ready to map then R offers plenty of options to produce quick or publication quality maps, which may have either static or dynamic content. The two packages we shall focus on are:

  • ggplot2 (mainly in Part 1) and
  • tmap (mainly in Part 2).

However, there are others.

Another package we shall be using in this session is sf, which provides “support for simple features, a standardized way to encode spatial vector data.” What we are mapping in this exercise are vector data – data for which the geography is ultimately defined by a series of points (two points define a line, a series of lines define a boundary, a boundary demarcates an area). The other common geographic representation used in Geographic Information Systems is raster, which is a grid representation of a study region, which each cell in the grid given one or more values to represent what is found there. For raster data, the stars and raster packages are useful.

Part 1

Getting Started

As in previous sessions, if you are keeping all the files and outputs from these exercises together in an R Project (which is a good idea) then open that Project now.

Load the data

Let’s begin with the easy bit and load the data, which are from http://superweb.statssa.gov.za. These includes the variable No_schooling which is the percentage of the population without schooling per South African municipality in 2011.

Code
# A quick check to see if the Tidyverse packages are installed...
installed <- installed.packages()[,1]
if(!("tidyverse" %in% installed)) install.packages("tidyverse",
                                                   dependencies = TRUE)
require(tidyverse)

# Read-in the data directly from a web location. The data are in .csv format
education <- read_csv("https://github.com/profrichharris/profrichharris.github.io/raw/main/MandM/data/education.csv")
# Quick check of the data by looking at the top three rows
slice_head(education, n = 3)
# A tibble: 3 × 8
  LocalMunicipalityCode LocalMunicipalityName No_schooling Some_primary
  <chr>                 <chr>                        <dbl>        <dbl>
1 EC101                 Camdeboo                      13.4         34.5
2 EC102                 Blue Crane Route              16.0         36.2
3 EC103                 Ikwezi                        18.4         35.4
# ℹ 4 more variables: Complete_primary <dbl>, Some_secondary <dbl>,
#   Grade_12_Std_10 <dbl>, Higher <dbl>

Loading the map

Next we need a ‘blank map’ of the same South African municipalities that are included in the data above. It is read-in below in geoJSON format but it would not have been unusual if it had been in .shp (shapefile) or .kml format, instead. The source of the data is https://dataportal-mdb-sa.opendata.arcgis.com/. There are several ways of reading this file into R but it is better to use the sf package because older options such as maptools::readShapePoly() (which was for reading shapefiles) or rgdal::readOGR are either deprecated already or in the process of being retired.

Code
# Another quick check to make sure that various libraries have been installed
if(!("proxy" %in% installed)) install.packages("proxy")
if(!("sf" %in% installed)) install.packages("sf", dependencies = TRUE)
require(sf)

# Use the read_sf() function from the sf library to read in a digital map of the study area
municipal <- read_sf("https://github.com/profrichharris/profrichharris.github.io/raw/main/MandM/boundary%20files/MDB_Local_Municipal_Boundary_2011.geojson")


If we now look at the top of the municipal object then we find it is of class sf, which is short for simple features. It has a vector geometry (it is of type multipolygon) and has its coordinate reference system (CRS) set as WGS 84. It also contains some attribute data, although not the schooling data we are looking to map.

Code
slice_head(municipal, n = 1)
Simple feature collection with 1 feature and 10 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 27.15761 ymin: -33.28488 xmax: 28.0811 ymax: -32.67573
Geodetic CRS:  WGS 84
# A tibble: 1 × 11
  OBJECTID ProvinceCode ProvinceName LocalMunicipalityCode LocalMunicipalityName
     <int> <chr>        <chr>        <chr>                 <chr>                
1        1 EC           Eastern Cape BUF                   Buffalo City         
# ℹ 6 more variables: DistrictMunicipalityCode <chr>,
#   DistrictMunicipalityName <chr>, Year <int>, Shape__Area <dbl>,
#   Shape__Length <dbl>, geometry <MULTIPOLYGON [°]>

Here are just the attribute data

Code
st_drop_geometry(municipal) |>
  slice_head(n = 5)
# A tibble: 5 × 10
  OBJECTID ProvinceCode ProvinceName LocalMunicipalityCode LocalMunicipalityName
     <int> <chr>        <chr>        <chr>                 <chr>                
1        1 EC           Eastern Cape BUF                   Buffalo City         
2        2 EC           Eastern Cape EC101                 Camdeboo             
3        3 EC           Eastern Cape EC102                 Blue Crane Route     
4        4 EC           Eastern Cape EC103                 Ikwezi               
5        5 EC           Eastern Cape EC104                 Makana               
# ℹ 5 more variables: DistrictMunicipalityCode <chr>,
#   DistrictMunicipalityName <chr>, Year <int>, Shape__Area <dbl>,
#   Shape__Length <dbl>

And here is the ‘blank’ map, drawn using plot{sf}, which is the ‘base’ way of plotting sf objects.

Code
par(mai=c(0, 0, 0, 0))  # Removes the plot margins
municipal |>
  st_geometry() |>
  plot()

Had it been necessary to set the coordinate reference system (CRS) then the function st_set_crs() would be used. Instead, and just for fun, we will change the existing CRS: here is the map transformed on to a ‘south up’ coordinate reference system, achieved by changing its EPSG code to 2050 with the function st_transform().

Code
par(mai=c(0, 0, 0, 0))
municipal |>
  st_transform(2050) |>
  st_geometry() |>
  plot()

Note how functions with the sf library tend to start with st_. Personally, I find this slightly confusing and I am not sure it doesn’t make it harder to find what I looking for in the package’s help pages but it is consistent for the functions and methods that operate on spatial data and is, I believe, short for spatial type.

sf and sp

At the risk of over-simplification, sf (simple features) can be viewed as a successor to the earlier sp (spatial) and related packages, which are well documented in the book Applied Spatial Data Analysis with R. Sometimes other packages are still reliant on sp and so the spatial objects need to be changed into sp’s native format prior to use.

Code
# From sf to sp
municipal_sp <- as(municipal, "Spatial")
class(municipal_sp)
[1] "SpatialPolygonsDataFrame"
attr(,"package")
[1] "sp"
Code
# From sp to sf
municipal_sf <- st_as_sf(municipal_sp)
class(municipal_sf)
[1] "sf"         "data.frame"

Joining the attribute data to the map

If we look again at the map and schooling data, we find that they have two variables in common which suggests a means to join them together based on a common variable.

Code
# The variables that they appear to have in common...
intersect(names(municipal), names(education))
[1] "LocalMunicipalityCode" "LocalMunicipalityName"

This is encouraging but, in this example, we need to be careful using the municipal names because not all of those in the map are in the education data or vice versa. Although the variable LocalMunicipalityName is present in both the map and education data, the variable is not actually the same in both because of different ways of spelling the names of places.

Code
# anti_join() returns all rows from x without a match in y
anti_join(municipal, education, by = "LocalMunicipalityName")
Simple feature collection with 7 features and 10 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 27.42423 ymin: -32.05935 xmax: 32.05014 ymax: -24.96652
Geodetic CRS:  WGS 84
# A tibble: 7 × 11
  OBJECTID ProvinceCode ProvinceName LocalMunicipalityCode LocalMunicipalityName
     <int> <chr>        <chr>        <chr>                 <chr>                
1       34 EC           Eastern Cape EC157                 King Sabata Dalindye 
2       74 KZN          KwaZulu-Nat… KZN214                uMuziwabantu         
3       75 KZN          KwaZulu-Nat… KZN215                Ezinqoleni           
4       97 KZN          KwaZulu-Nat… KZN262                uPhongolo            
5      146 MP           Mpumalanga   MP301                 Chief Albert Luthuli 
6      149 MP           Mpumalanga   MP304                 Dr Pixley Ka Isaka S 
7      165 NW           North West   NW372                 Local Municipality o 
# ℹ 6 more variables: DistrictMunicipalityCode <chr>,
#   DistrictMunicipalityName <chr>, Year <int>, Shape__Area <dbl>,
#   Shape__Length <dbl>, geometry <MULTIPOLYGON [°]>
Code
anti_join(education, municipal, by = "LocalMunicipalityName")
# A tibble: 7 × 8
  LocalMunicipalityCode LocalMunicipalityName  No_schooling Some_primary
  <chr>                 <chr>                         <dbl>        <dbl>
1 EC157                 King Sabata Dalindyebo         21.7         33.2
2 KZN214                UMuziwabantu                   20.7         42.1
3 KZN215                Ezingoleni                     21.9         40.1
4 KZN262                UPhongolo                      22.5         35.1
5 MP301                 Albert Luthuli                 23.8         31.1
6 MP304                 Pixley Ka Seme                 24.0         31.9
7 NW372                 Madibeng                       12.8         27.1
# ℹ 4 more variables: Complete_primary <dbl>, Some_secondary <dbl>,
#   Grade_12_Std_10 <dbl>, Higher <dbl>

Fortunately, the municipal codes are consistent even where the names are not, which is why place codes and IDs are generally better than using names when joining data.

Code
# The municipality codes are consistent in the map and data; there are none that do not match.=
anti_join(municipal, education, by = "LocalMunicipalityCode")
Simple feature collection with 0 features and 10 fields
Bounding box:  xmin: NA ymin: NA xmax: NA ymax: NA
Geodetic CRS:  WGS 84
# A tibble: 0 × 11
# ℹ 11 variables: OBJECTID <int>, ProvinceCode <chr>, ProvinceName <chr>,
#   LocalMunicipalityCode <chr>, LocalMunicipalityName <chr>,
#   DistrictMunicipalityCode <chr>, DistrictMunicipalityName <chr>, Year <int>,
#   Shape__Area <dbl>, Shape__Length <dbl>, geometry <GEOMETRY [°]>
Code
anti_join(education, municipal, by = "LocalMunicipalityCode")
# A tibble: 0 × 8
# ℹ 8 variables: LocalMunicipalityCode <chr>, LocalMunicipalityName <chr>,
#   No_schooling <dbl>, Some_primary <dbl>, Complete_primary <dbl>,
#   Some_secondary <dbl>, Grade_12_Std_10 <dbl>, Higher <dbl>

We therefore join the data to the map using the variable LocalMunicipalityCode and check that the schooling data are now attached to the map.

Code
municipal <- left_join(municipal, education, by = "LocalMunicipalityCode")
names(municipal)
 [1] "OBJECTID"                 "ProvinceCode"            
 [3] "ProvinceName"             "LocalMunicipalityCode"   
 [5] "LocalMunicipalityName.x"  "DistrictMunicipalityCode"
 [7] "DistrictMunicipalityName" "Year"                    
 [9] "Shape__Area"              "Shape__Length"           
[11] "geometry"                 "LocalMunicipalityName.y" 
[13] "No_schooling"             "Some_primary"            
[15] "Complete_primary"         "Some_secondary"          
[17] "Grade_12_Std_10"          "Higher"                  

Note that the variables LocalMunicipalityName.x and LocalMunicipalityName.y have been created in the process of the join. This is because we did not use LocalMunicipalityName for the join but that variable name is present in both the map and the data and is therefore duplicated when they are joined together – municipal$LocalMunicipalityName becomes municipal$LocalMunicipalityName.x and education$LocalMunicipalityName creates municipal$LocalMunicipalityName.y.

Mapping the data

Using plot{sf}

The ‘one line’ way of plotting the data is to use the in-built plot() function for sf.

Code
plot(municipal["No_schooling"])

As a ‘rough and ready’ way to check for spatial variation and patterns in the data, it is quick and easy. It is important to specify the variable(s) you wish to include in the plot or else it will plot them all up to the value specified by the argument max.plot, which has a default of nine.

Code
plot(municipal)


The map can be customised. For example,

Code
if(!("RColorBrewer" %in% installed)) install.packages("RColorBrewer",
                                                      dependencies = TRUE)
require(RColorBrewer)

plot(municipal["No_schooling"], key.pos = 1, breaks = "jenks", nbreaks = 7,
     pal = rev(brewer.pal(7, "RdYlBu")),
     graticule = TRUE, axes = TRUE,
     main = "Percentage of Population with No Schooling")

Have a read through the documentation at ?sf::plot to get a sense of what the various arguments do. Try changing them and see if you can produce a map with six equal interval breaks, for example.

Note the use of the RColorBrewer package and its brewer.pal() function. RColorBrewer provides colour palettes based on https://colorbrewer2.org/ and has been used to create a diverging red-yellow-blue colour palette that is reversed using the function rev() so that red is assigned to the highest values, not lowest. A ‘natural breaks’ (jenks) classification with 7 colours has been used (breaks = "jenks", nbreaks = 7).

Thinking about the map classes

Here is the same underlying map but with equal interval breaks instead:

Code
plot(municipal["No_schooling"], key.pos = 1, breaks = "equal", nbreaks = 7,
     pal = rev(brewer.pal(7, "RdYlBu")),
     graticule = TRUE, axes = TRUE,
     main = "Percentage of Population with No Schooling")

… and here with quantile breaks:

Code
plot(municipal["No_schooling"], key.pos = 1, breaks = "quantile", nbreaks = 7,
     pal = rev(brewer.pal(7, "RdYlBu")),
     graticule = TRUE, axes = TRUE,
     main = "% of Population with No Schooling")

Clearly the maps above do not appear exactly the same. This because the geographical patterns and therefore the geographical information that we view in the map are a function of how the map is constructed, including the number, colouring and widths (ranges) of the map classes. Ideally, these should be set to reflect the distribution of the data and what is being look for in it.

The following histograms show the break points in the distributions used in the various maps. The code works by creating a list of plots (specifically, a list of ggplots, see below) – one plot each for the jenks, equal and quantile styles – and then, using a package called gridExtra, to arrange those plots into a single grid. However, the code matters less than what it reveals, which is that Jenks or other ‘natural breaks’ classifications are reasonably good for identifying break points that reflect the distribution of the data in the absence of the user having cause to set those break points in some other way.

Thinking about the distribution of the data is important to avoid creating maps that give the wrong impression of the prevalence or otherwise of values in the data. For example, if you use a quantile classification with, say, 4 breaks, it will create four map classes, each containing approximately one quarter of the data, even if some of the values in those classes are rarely found and unusual when compared to others in the same class. An equal interval classification will not have this problem.

Code
if(!("gridExtra" %in% installed)) install.packages("gridExtra",
                                                   dependencies = TRUE)
if(!("classInt" %in% installed)) install.packages("classInt",
                                                  dependencies = TRUE)

require(gridExtra)
require(classInt)

styles <- c("jenks", "equal", "quantile")
g <- lapply(styles, \(x) {
        ggplot(municipal, aes(x = No_schooling)) +
        geom_histogram(fill = "light grey") +
        xlab("% of Population with No Schooling") +
        geom_vline(xintercept = classIntervals(municipal$No_schooling,
                                               n = 7, style = x)$brks,
                   col = "dark red") +
        geom_rug() +
        theme_minimal() +
        ggtitle(paste(x,"classification"))
    })
# The step below brings together, in a grid, the list of three different plots
# created above
grid.arrange(grobs = g)


For further information on using plot{sf} see here and look at the help menu, ?sf::plot.

Using ggplot2

Whilst the plot() function for sf objects is useful for producing quick maps, I tend to prefer ggplot2 for better quality ones that I am wanting to customise or annotate in particular ways. We already have seen examples of ggplot2 output in earlier sessions and also in the histograms above.

ggplot2 is based on The Grammar of Graphics. I find it easiest to think of it, initially, in four stages:

  1. Say which data are to be plotted;
  2. Say which aesthetics of the chart (e.g. colour, line type, point size) will vary with the data;
  3. Say which types of plots (which ‘geoms’) are to feature in the chart;
  4. (Optional) change other attributes of the chart to add titles, rename the axis labels, and so forth.

As an example, in the code chunk below, those four stages are applied to a boxplot showing the distribution of the no schooling variable by South African Provinces.

First, the data = municipal. Second, consulting with the ggplot2 cheatsheet, I find that the aesthetics, aes(), for the boxplot, require a discrete x and a continuous y, which are provided by ProvinceName and No_schooling, respectively. ProvinceName has also been used to assign a fill colour to each box. Third, the optional changes arise from me preferring theme_minimal() to the default style, although I have then modified it to remove the legend, change the angle of the text on the x-axis, remove the x-axis label and change the y-axis label.

Code
require(ggplot2)
ggplot(data = municipal, aes(x = ProvinceName, y = No_schooling,
                             fill = ProvinceName)) +
  geom_boxplot() +
  theme_minimal() +
  theme(legend.position = "none", axis.text.x = element_text(angle = 45)) +
  xlab(element_blank()) +
  ylab("% No schooling within municipalities")


Let’s now take that process and apply it to create a map, using the same RColorBrewer colour palette as previously and adding the map using geom_sf (for a full list of geoms available for ggplot2 see here). The line scale_fill_distiller is an easy way to shade the map using the colour palette from RColorBrewer andlabs() adds labelling.

Code
ggplot(municipal, aes(fill = No_schooling)) +
  geom_sf() +
  scale_fill_distiller("%", palette = "RdYlBu") +
  theme_minimal() +
  labs(
    title = "Percentage of Population with No Schooling",
    subtitle = "2011 South African Census Data",
    caption = "Source: Statistics South Africa"
  )  


Presently the map has a continuous shading scheme – you can see it in the map’s legend. This can be changed to discrete map classes and colours by converting the continuous municipal$No_schooling variable to a factor, using the cut() function, here with break points found using ClassIntervals(style = "jenks"). Because we have then changed from continuous to discrete (categorised) data but still want to use an RColorBrewer palette, so scale_fill_brewer() replaces scale_fill_distiller(), wherein the argument direction = -1 reverses the RdYlBu palette so that the highest values are coloured red. Adding guides(fill = guide_legend(reverse = TRUE)) reverses the legend so that the highest values are on top in the legend, which is another preference of mine.

Code
# Find the break points in the distribution using a Jenks classification
brks <- classIntervals(municipal$No_schooling, n = 7, style = "jenks")$brks

# Factor the No_schooling variable using those break points
municipal$No_schooling_gp <- cut(municipal$No_schooling, brks,
                                 include.lowest = TRUE)

ggplot(municipal, aes(fill = No_schooling_gp)) +
  geom_sf() +
  scale_fill_brewer("%", palette = "RdYlBu", direction = -1) +
  theme_minimal() +
  guides(fill = guide_legend(reverse = TRUE)) +
  labs(
    title = "Percentage of Population with No Schooling",
    subtitle = "2011 South African Census Data",
    caption = "Source: Statistics South Africa"
  ) 

At this point, we may note that the four stages of the map production that I referred to earlier was an over-simplification and we can add a fifth:

  1. Say which data are to be plotted;
  2. Say which aesthetics of the chart (e.g. colour, line type, point size) will vary with the data;
  3. Say which types of plots (which ‘geoms’) are to feature in the chart – geom_sf for mapping;
  4. ‘Scale’ the data – in the above examples, link the mapped variables to map classes and colour codes. The scaling is what scale_fill_... was doing;
  5. (Optional) change other attributes of the chart to add titles, rename the axis labels, and so forth.

Annotating the map with ggspatial

Having created the basic map using ggplot2, we can add some additional map elements using ggspatial, which provides some extra cartographic functions. The following code chunk adds a backdrop to the map. Different backgrounds (alternative map tiles) can be chosen from the list at rosm::osm.types(); see here for what they look like.

Before running the code, we may note a change from the previous code chunk (above) which is in addition to installing and requiring ggspatial and adding the map tile as a backdrop. Specifically, if you look at the previous code chunk you will find that the data, municipal are handed-to ggplot in the top line ggplot(municipal, ...) where the first argument is the data one. In other words, ggplot(municipal, ...) is the same as, ggplot(data = municipal, ...) (check the help file, ?ggplot2::ggplot to confirm this). By specifying the data in the top line, in essence this sets data = municipal as a global parameter for the plot: it is where ggplot will, by default, now look for the variable called for in aes(fill = No_schooling_gp) and where it will look for other variables too. Whilst this would work just fine in the code immediately below, a little later I introduce a second geom_sf into the plot and no longer want municipal to be the default choice for all the aesthetics of the chart, just some of them. To pre-empt any problems that might otherwise arise, I no longer specify municipal as the default dataset in my ggplot() but, instead, specifically name municipal where I want to use it – as the fill data, in the line geom_sf(data = municipal, aes(fill = No_schooling_gp)) – which will leave me free to associate other data with different aesthetics in due course. More simply, if you set the data = argument and also any aesthetics, mapping = aes() then you are basically saying “use these for everything that follows” unless you override them by saying what you want for each specific geom.

Code
if(!("ggspatial" %in% installed)) install.packages("ggspatial",
                                                   dependencies = TRUE)
require(ggspatial)

ggplot() +
  annotation_map_tile(type = "cartolight", progress = "none") +
  geom_sf(data = municipal, aes(fill = No_schooling_gp)) +
  scale_fill_brewer("%", palette = "RdYlBu", direction = -1) +
  theme_minimal() +
  guides(fill = guide_legend(reverse = TRUE)) +
  labs(
    title = "Percentage of Population with No Schooling",
    subtitle = "2011 South African Census Data",
    caption = "Source: Statistics South Africa"
  ) 


A north arrow and a scale bar can also be added, although including the scale bar generates a warning because the map to true life distance ratio is not actually constant across the map but varies with longitude and latitude. The argument, location = "tl" is short for top left; location = "br" for bottom right. See ?annotation_north_arrow and ?annotation_scale for further details and options. Note also the use of the last_plot() function to more easily add content to the last ggplot.

Code
last_plot() +
  annotation_north_arrow(location = "tl",
                         style = north_arrow_minimal(text_size = 14)) +
  annotation_scale(location = "br", style = "ticks")

Adding point symbols to the map

In the next example, the locations of South African cities are added to the map, with a symbol drawn in proportion to their population size. The source of the data is a shapefile from https://data.humdata.org/dataset/hotosm_zaf_populated_places. The symbol shape that is specified by pch = 3 has the same numeric coding as those in ?graphics::points (i.e. 0 is a square, 1, is a circle, 2 is a triangle, and so forth).

The function scales::label_comma() forces decimal display of numbers to avoid displaying scientific notation.

The notation [pagkage_name]::[function] is a way of saying in which library/package a specific function is found - the label_comma() function is in the scales library. Sometimes this can be useful to run a function from a package/library without first requiring (loading) it. It can also be useful to avoid the problem when two packages contain a function with the same name. For example, there is a select() function in both the ’raster and dplyr libraries. If you require (load) both of these libraries then the select function in whichever is loaded second will mask the select function in the one loaded first, potentially causing errors or confusions. A way around the problem is to use the :: notation to be specific about which package’s select you wish to use, i.e. raster::select() or dplyr::select().

Code
if(!("scales" %in% installed)) install.packages("scales", dependencies = TRUE)

download.file("https://github.com/profrichharris/profrichharris.github.io/blob/main/MandM/boundary%20files/hotosm_zaf_populated_places_points_shp.zip?raw=true",
              "cities.zip", mode = "wb", quiet = TRUE)
unzip("cities.zip")

read_sf("hotosm_zaf_populated_places_points.shp") |>
  filter(place == "city") |>
  mutate(population = as.numeric(population)) ->
  cities

last_plot() +
  geom_sf(data = cities, aes(size = population), pch = 3) +
  scale_size("Population", labels = scales::label_comma())

Slightly confusingly, a shapefile actually consists of at least three files, one with the extension .shp (the coordinate/shape data), one .shx (an index file) and one .dbf (the attribute data). If you use a shapefile you need to make sure you download all of them and keep them together in the same folder.

Labelling using ggsflabel

Nice labelling of the cities is provided by ggsflabel, in this example using a pipe |> to filter and only label cities with over a million population. The function geom_sf_label_repel() is designed to stop labels from being placed over each other. We could use last_plot() again to create this map but, instead, here is the code in full:

Code
if(!("remotes" %in% installed)) install.packages("remotes", dependencies = TRUE)
if(!("ggsflabel" %in% installed)) remotes::install_github("yutannihilation/ggsflabel")
require(ggsflabel)

ggplot() +
  annotation_map_tile(type = "cartolight", progress = "none") +
  geom_sf(data = municipal, aes(fill = No_schooling_gp)) +
  scale_fill_brewer("%", palette = "RdYlBu", direction = -1) +
    geom_sf(data = cities, aes(size = population), pch = 3) +
  scale_size("Population", labels = scales::label_comma()) +
  geom_sf_label_repel(data = cities |> filter(population > 1e6),
                    aes(label = name), alpha = 0.7, size = 3) +
  theme_minimal() +
  # The line below removes some annoying axis titles that otherwise appear
  theme(axis.title.x = element_blank(), axis.title.y = element_blank()) +
  guides(fill = guide_legend(reverse = TRUE)) +
  labs(
    title = "Percentage of Population with No Schooling",
    subtitle = "2011 South African Census Data",
    caption = "Source: Statistics South Africa"
  ) 

Saving the map as a graphic file

Having created the map, it can now be saved as a graphic. If you are not using an R Project to save your files into, you may wish to change your working directory before saving the graphic, using setwd(dir) and substituting dir with the pathname to the preferred directory, or by using Session -> Set Working Directory -> Choose Directory from the dropdown menus. Once you have done so, the last_plot() is easily saved using the function ggsave(). For example, in .pdf format, to a print quality,

Code
ggsave("no_schooling.pdf", device = "pdf", width = 6, units = "in",
       dpi = "print")

Alternatively, we can write directly to a graphics device, using one of the functions bmp(), jpeg(), png(), tiff() or pdf(). For instance,

Code
jpeg("no_schooling.jpg", res = 72)
last_plot()
dev.off()

Creating an interactive map using ggiraph

So far all the maps we have created have been static. This is obviously better for anything that will be printed but, for a website or similar, we may wish to include more ‘interaction’. The package ggiraph package creates dynamic ggplot2 graphs and we can use it to create an interactive map where information about the areas appears as we brush over those areas on the map with the mouse pointer. This is achieved by replacing, in the code, geom_sf() with the geom_sf_interactive() function from ggiraph, specifying the text to show with the tooltip (the example below pastes a number of character elements together without a space between them, hence paste0() but does include a carriage return, \n) and rendering the resulting ggplot2 object with girafe().

Code
if(!("ggiraph" %in% installed)) install.packages("ggiraph", dependencies = TRUE)
require(ggiraph)

g <- ggplot() +
  annotation_map_tile(type = "cartolight", progress = "none") +
  geom_sf_interactive(data = municipal,
                      aes(tooltip = paste0(LocalMunicipalityName.x, "\n",
                                           round(No_schooling,1), "%"),
                             fill = No_schooling_gp)) +
  scale_fill_brewer("%", palette = "RdYlBu", direction = -1) +
  theme_minimal() +
  guides(fill = guide_legend(reverse = TRUE)) +
  labs(
    title = "Percentage of Population with No Schooling",
    subtitle = "2011 South African Census Data",
    caption = "Source: Statistics South Africa"
  ) +
  annotation_north_arrow(location = "tl",
                         style = north_arrow_minimal(text_size = 14)) +
  annotation_scale(location = "br", style = "ticks")

girafe(ggobj = g)

Save your workspace

This is the end of Part 1 and may be a good place to stop or, at least, take a break. If you won’t be continuing immediately to Part 2 then don’t forget to save your workspace. For example, by using,

Code
save.image("making_maps.RData")

Part 2

To begin, you may need to open your Project then load the workspace from Part 1. For example,

Code
load("making_maps.RData")

You may also need to reload all the previously used packages/libraries. I think this includes all the ones that you still need:

Code
# Just a way of requiring more than one package at the same time:
pkgs <- c("tidyverse", "sf", "ggplot2", "RColorBrewer", "ggspatial", "ggsflabel")
invisible(lapply(pkgs, require, character.only = TRUE))

Using tmap

Clearly there is a lot of scope to produce high quality maps using ggplot2 and various associated packages. However, it has a rival, in tmap, which is arguably easier to use. Like ggplot2, tmap adopts the Grammar of Graphics but approaches it in a slightly different way that uses layers: it builds-up the layers of the graphic by first specifying a spatial object or background, then doing things to the map based on it, then specifying another spatial object and/or other map elements to do things with, and so forth. The types of layers available can be viewed here and here. A brief introduction to tmap is available here.

The code chunk a little further below builds-up the layers of the map to produce one quite like that previously created in ggplot2. First, however, there is a error in the geometry of the underlying municipal map file to deal with:

Code
all(st_is_valid(municipal))
[1] FALSE

The problem lies in the 128th area, where one edge crosses another,

Code
st_is_valid(municipal[128,], reason = TRUE)
[1] "Edge 25 crosses edge 27"

tmap is less forgiving of this error than ggplot2 is. A temporary ‘fix’ – more of a side-step really – is achieved by changing the coordinate reference system, which presently is using EPSG: 4326 (you can see this with st_crs(municipal)), to EPSG: 3857.

Code
municipal <- st_transform(municipal, 3857)
all(st_is_valid(municipal))
[1] TRUE

Now we can produce the plot, to give an output similar to one of the earlier ggplots:

Code
if(!("tmap" %in% installed)) install.packages("tmap", dependencies = TRUE)
require(tmap)

tmap_mode("plot")

tm_graticules(col = "light grey") +
  tm_shape(municipal, is.master = TRUE) +
  tm_fill("No_schooling", palette = "-RdYlBu", title = "%", style = "jenks",
          n = 7) +
  tm_borders(col = "black") +
  tm_shape(cities) +
  tm_dots(size = "population", shape = 3) +
  tm_shape(cities |> filter(population > 1e6)) + 
  tm_text("name", bg.color = "white", auto.placement = TRUE, bg.alpha = 0.6) +
  tm_legend(title = "Percentage of Population with No Schooling",
            bg.color = "white", bg.alpha = 0.7) +
  tm_compass(type = "arrow", position = c("right", "top")) +
  tm_scale_bar(position = c("right", "bottom"), bg.color = "white") +
  tm_credits("Source: 2011 Census / Statistics South Africa",
             bg.color = "white")


The map looks pretty good and can be saved using the function tmap_save. For example,

Code
tmap_save(tmap_last(), "no_schooling2.jpg", width = 7, units = "in")

However, there is a cartographic/mathematical irritation that might bug a reviewer if you were to submit the map as part of an academic journal (or a marker if you were to submit the map for assessment!). If you look at the map classes, they are non-unique: e.g. 5.64 to 9.87, 9.87 to 13.38, 13.38 to 17.19, and so forth. Which category would a value of 9.87 (or 13.38, etc.) fall into?

To solve this problem, we can do what we did for the ggplots, which is to create a factor from the municipal$No_schooling variable, which is what the first two lines of code below do. The third line reverses the order of the factors, so that the highest and not lowest valued group is treated as the first level, and so forth. The reason I have added this is because of my preference for the highest values to appear top in the legend.

Code
brks <- classIntervals(municipal$No_schooling, n = 7, style = "jenks")$brks
municipal$No_schooling_gp <- cut(municipal$No_schooling, brks,
                                 include.lowest = TRUE)
municipal$No_schooling_gp <- factor(municipal$No_schooling_gp,
                                levels = rev(levels(municipal$No_schooling_gp)))

tm_graticules(col = "lightgrey") +
  tm_shape(municipal) +
  tm_fill("No_schooling_gp", palette = "RdYlBu", title = "%") +
  tm_borders(col = "black") +
  tm_shape(cities) +
  tm_dots(size = "population", shape = 3) +
  tm_shape(cities %>% filter(population > 1e6)) + 
  tm_text("name", bg.color = "white", auto.placement = TRUE, bg.alpha = 0.6) +
  tm_legend(title = "Percentage of Population with No Schooling",
            bg.color = "white", bg.alpha = 0.7) +
  tm_compass(type = "arrow", position = c("right", "top")) +
  tm_scale_bar(position = c("right", "bottom"), bg.color = "white") +
  tm_credits("Source: 2011 Census / Statistics South Africa",
             bg.color = "white")


Where tmap really excels is in rendering interactive maps to leaflet by changing the tmap_mode from tmap_mode("plot")to tmap_mode("view"). The following allows panning and can be zoomed in and out of. Unfortunately, it also reveals that the earlier ‘fix’ to the municipal object doesn’t work here so I have omitted the problem area, although that isn’t much of a solution.

Code
tmap_mode("view")

tm_basemap("OpenStreetMap.HOT") +
  tm_shape(municipal[-128,], name = "municipalities") +
  tm_fill("No_schooling_gp", palette = "RdYlBu", title = "%") +
  tm_borders(col = "black") +
  tm_shape(cities) +
  tm_dots(size = "population") +
  tm_legend(title = "Percentage of Population with No Schooling",
            bg.color = "white", bg.alpha = 0.7) +
  tm_scale_bar(position = c("right", "bottom"), bg.color = "white")

The following version adds further functionality. It allows different map layers to be displayed (move your mouse cursor over the map layers icon to do so) and, if you right click on any of the areas shown, will bring-up information about them.

Code
tm_basemap(c(OSM = "OpenStreetMap",
             OSMHot = "OpenStreetMap.HOT",
             Carto = "CartoDB")) +
  tm_shape(municipal[-128,], name = "municipalities") +
  tm_fill("No_schooling_gp", palette = "RdYlBu", title = "%",
          id = "LocalMunicipalityName.x",
          popup.vars = c("% No schooling:" = "No_schooling",
                         "Province: " = "ProvinceName"),
          popup.format = list(digits = 1)) +
  tm_borders(col = "black") +
  tm_shape(cities) +
  tm_dots(size = "population",
          id = "name",
          popup.vars = c("Population: " = "population")) +
  tm_legend(title = "Percentage of Population with No Schooling",
            bg.color = "white", bg.alpha = 0.7) +
  tm_scale_bar(position = c("right", "bottom"), bg.color = "white") +
  tm_view(view.legend.position = c("right", "bottom"))


Different layers are available dependent upon the tmap_mode. For example, there is no straightforward way of adding a maptile (tm_basemap) as the backdrop to a map in tmap’s "plot" mode, or a compass (tm_compass) or a scale bar (tm_scale_bar) in "view" mode.


We can also have some fun! Here is an animated map where the animation is produced from a combination of the tm_facets(along = "ProvinceName", free.coords = FALSE) layer and the use of the tmap_animation() function. Notice how I add municipal to the map twice, as two separate layers. The first is to provide a general backdrop to the map with all the municipalities shaded grey. They second is linked to the animation with the municipalities shaded by the percentage of their population without schooling.

Code
if(!("gifski" %in% installed)) install.packages("gifski", dependencies = TRUE)

tmap_mode("plot")

t <- tm_graticules(col = "light grey") +
  tm_shape(municipal) +
  tm_polygons(col = "grey", border.col = "black") +
  tm_shape(municipal) +
  tm_fill("No_schooling_gp", palette = "RdYlBu", title = "%") +
  tm_borders(col = "white") +
  tm_facets(along = "ProvinceName", free.coords = FALSE) +
  tm_legend(title = "Percentage of Population with No Schooling",
            bg.color = "white", bg.alpha = 0.7) +
  tm_compass(type = "arrow", position = c("right", "top")) +
  tm_scale_bar(position = c("right", "bottom"), bg.color = "white") +
  tm_credits("Source: 2011 Census / Statistics South Africa",
             bg.color = "white")

tmap_animation(t, delay = 100)


The animation may be saved as a .gif file by including the argument filename (see ?tmap_animation).


Faceting can also be used on static maps, as in the following example, where tm_facets(by = "ProvinceName", free.coords = TRUE) creates a choropleth map for each Province, with a common legend, positioned outside each provincial map through the use of tm_layout(legend.outside.position = "bottom").

Code
tmap_mode("plot")

tm_graticules(col = "light grey") +
  tm_shape(municipal) +
  tm_fill("No_schooling_gp", palette = "RdYlBu",
          title = "% Population with No Schooling",
          legend.is.portrait = FALSE) +
  tm_borders(col = "white") +
  tm_facets(by = "ProvinceName", free.coords = TRUE) +
  tm_compass(type = "arrow", position = c("right", "top")) +
  tm_scale_bar(position = c("right", "bottom")) +
  tm_layout(legend.outside.position = "bottom")

Geofacets

To this point of the session we have been using maps to represent the spatial distribution of one or more variables whose values are plotted as an aesthetic of the map such as colour or shape size. A different approach is offered by the geofacet package which uses geography as a ‘placeholder’ to position graphical summaries of data for different parts of the map. The idea is to flexibly visualise data for different geographical regions by providing a ggplot2 faceting function facet_geo() that works just like ggplot2’s built-in faceting, except that the resulting arrangement of panels follows a grid that mimics the original geographic topology as closely as possible.

Here is an example of it in use, showing the distribution of municipalities within South African provinces in terms of the percentage of the population with higher education (university) qualifications.

Code
if(!("geofacet" %in% installed)) install.packages("geofacet")
require(geofacet)

# Define a grid that mimics the geographical distribution of the provinces
mygrid <- data.frame(
  code = c("LIM", "GT", "NW", "MP", "NC", "FS", "KZN", "EC", "WC"),
  name = c("Limpopo", "Gauteng", "North West", "Mpumalanga", "Northern Cape",
           "Free State", "KwaZulu-Natal", "Eastern Cape", "Western Cape"),
  row = c(1, 2, 2, 2, 3, 3, 3, 4, 4),
  col = c(3, 3, 2, 4, 1, 2, 3, 2, 1),
  stringsAsFactors = FALSE
)

# Plot the data with the geofaceting
ggplot(municipal, aes(Higher)) +
  geom_boxplot(col = "dark grey") +
  geom_density() +
  geom_rug() +
  facet_geo(~ ProvinceName, grid = mygrid) +
  scale_y_continuous(breaks = c(0, 0.2, 0.4)) +
  theme_bw() +
  labs(
    title = "Percentage of Population with higher education",
    caption = "Source: 2011 Census / Statistics South Africa"
  ) +
  xlab("% per municipality")

Bivarite Mapping with ggplot2

Sometimes we want to the colours on the map to reflect the combination of two variables’ values, not just one. We used to be able to use the bivariate package to do this. Unfortunately, this does not appear to be available any more and whilst it is possible to download achieved versions, the most recent was not working for me and could create conflicts with newer packages. We can, however, undertake the process ‘by hand’.

To illustrate, let’s use the two variables, municipal$No_schooling and municipal$Higher. One measures the percentage of the population without schooling per South African municipality in 2011 and we shall use it to create a new variable, the percentage with schooling; the other is the percentage with higher education. The following code creates the bivariate map. The process is broadly that described in this tutorial but I have changed and simplified it a little. It may not look especially simple but there is not much to it that we have not done already. The main difference is that we are cutting (categorising) the data along two variables and then creating a colour scheme and a legend to represent the resulting groups.

Code
# First, split the municipalities into three groups (categories) based on the % Schooling
tertiles_schooling <- municipal |>
  # Create the new schooling variable from no_schooling:
  mutate(Schooling = 100 - No_schooling) |>
  # Take the Schooling variable and cut it into the groups:
  pull(Schooling) %>%
  cut(., breaks = quantile(., probs = seq(0, 1, length.out = 4)),
                           labels = FALSE,
                           include.lowest = TRUE)

# Second, split the municipalities into three groups based on the % higher education
tertiles_higher <- municipal |>
  pull(Higher) %>%
  cut(., breaks = quantile(., probs = seq(0, 1, length.out = 4)),
                           labels = FALSE,
                           include.lowest = TRUE)

# Third, add those groups to the map's attribute data and also combine them
municipal$schooling_gp  <- tertiles_schooling
municipal$higher_gp  <- tertiles_higher
municipal$combined_gp <- paste(tertiles_schooling, tertiles_higher, sep = " - ")

# Here is what has been created
municipal |> 
  st_drop_geometry() |>
  select(schooling_gp, higher_gp, combined_gp) |>
  slice_head(n = 5)
# A tibble: 5 × 3
  schooling_gp higher_gp combined_gp
         <int>     <int> <chr>      
1            3         3 3 - 3      
2            3         3 3 - 3      
3            2         2 2 - 2      
4            2         2 2 - 2      
5            3         3 3 - 3      
Code
# Fourth, define a colour scheme, here using hex colour codes
# https://www.joshuastevens.net/cartography/make-a-bivariate-choropleth-map/ is helpful
cols <- c(
  "3 - 3" = "#e8e8e8", # highest primary and secondary
  "2 - 3" = "#cbb8d7",
  "1 - 3" = "#9972af", # lowest primary, highest secondary
  "3 - 2" = "#e4d9ac",
  "2 - 2" = "#c8ada0", # medium primary, medium secondary
  "1 - 2" = "#976b82",
  "3 - 1" = "#c8b35a", # highest primary, lowest secondary
  "2 - 1" = "#8f8e53",
  "1 - 1" = "#804d36" # low primary, lowest secondary
)

# Now create the map, omitting the legend
map <- ggplot() +
  geom_sf(data = municipal, aes(fill = combined_gp)) +
  scale_fill_manual(values = cols, guide = FALSE) +
  geom_sf(data = cities, pch = 21, bg = "light grey") +
  geom_sf_label_repel(data = cities |> filter(population > 0.25e6),
                    aes(label = name), alpha = 0.7, size = 3) +
  theme_minimal() +
  theme(axis.title.x = element_blank(), axis.title.y = element_blank()) +
  labs(
    title = "Levels of education in South African municipalities",
    subtitle = "2011 South African Census Data",
    caption = "Source: Statistics South Africa"
  ) +
  annotation_north_arrow(location = "tl",
                         style = north_arrow_minimal(text_size = 14))

# Create the map's legend
# It should be more obvious what this does once it is plotted
legend <- ggplot() +
  geom_tile(
    data = municipal,
    mapping = aes(
      x = schooling_gp,
      y = higher_gp,
      fill = combined_gp)) +
  scale_fill_manual(values = cols, guide = FALSE) +
  labs(x = "← ← Lower % Schooling",
       y = "← ← Lower % HE") +
  coord_fixed() +
  theme(axis.title = element_text(size = 7.5),
        axis.text = element_blank(),
        axis.ticks = element_blank(),
        panel.background = element_blank())

if(!("cowplot" %in% installed)) install.packages("ggiraph", dependencies = TRUE)
require(cowplot)

# Use the ggdraw() function in cowplot to combine the map and legend together
ggdraw() +
  draw_plot(map, 0, 0, 1, 1) +
  draw_plot(legend, 0.75, 0.1, 0.25, 0.25)

Maps in Shiny

Shiny is described as “an open source R package that provides an elegant and powerful web framework for building web applications using R” [or Python]. At the time of writing, there is a nice example of a mapping application on the Shiny homepage.

Teaching Shiny in detail is beyond the scope of this course but there are Getting Started guides (for both R and Python) and a gallery of applications which can be viewed here.

The following code creates an app that allows variables from the South African municipality data to be selected and mapped, with various choices for the maps classes and colour palette. Don’t worry if not all of the code makes sense to you but hopefully you will recognise some parts from this session and will see that the basis of the app is to define a user interface and then to define tasks on the server side that will take some inputs from selections made by the user in the interface.

Code
# This checks that the required packages are installed and then loaded
installed <- installed.packages()[,1]
pkgs <- c("shiny", "sf", "tidyverse", "ggplot2", "sf", "classInt", "RColorBrewer",
          "ggspatial")
install <- pkgs[!(pkgs %in% installed)]
if(length(install)) install.packages(install, dependencies = TRUE)
invisible(lapply(pkgs, require, character.only = TRUE))

# This downloads the map and the data and merges them
map <- read_sf("https://github.com/profrichharris/profrichharris.github.io/raw/main/MandM/boundary%20files/MDB_Local_Municipal_Boundary_2011.geojson")
mapping_data <- read_csv("https://github.com/profrichharris/profrichharris.github.io/raw/main/MandM/data/education.csv")
map <- left_join(map, mapping_data, by = "LocalMunicipalityCode")

# This selects out from the data the variables of interest
df <- map |>
  st_drop_geometry() |>
  select(where(is.double), -starts_with("Shape"))

# This defines the user interface
ui <- fluidPage(

    # Application title
    titlePanel("Educational geographies for South African municipalities"),

    # Sidebar with various types of input
    sidebarLayout(
        sidebarPanel(
            varSelectInput("var", "Mapping variable", df),
            sliderInput("brks", "Classes", min = 3, max = 8, value = 5, step = 1),
            selectInput("type", "Classification", c("equal", "quantile", "jenks")),
            selectInput("pal", "Palette", rownames(brewer.pal.info)),
            checkboxInput("rev", "Reverse palette"),
            checkboxInput("north", "North arrow")
          ),

        # The main panel with contain the map plot
        mainPanel(
          plotOutput("map")
        )
    )
)

# This defines the server side of the app, taking various inputs
# from the user interface
server <- function(input, output, session) {
  
  output$map <- renderPlot({
    
    vals <- map |>
      pull(!!input$var)
    
    brks <- classIntervals(vals, n = input$brks, style = input$type)$brks
    map$gp <- cut(vals, brks, include.lowest = TRUE)
    
    
    p <- ggplot(map, aes(fill = gp)) +
      geom_sf() +
      scale_fill_brewer("%", palette = input$pal, direction = ifelse(input$rev, 1, -1)) +
      theme_minimal() +
      guides(fill = guide_legend(reverse = TRUE)) +
      labs(caption = "Source: 2011 Census, Statistics South Africa")
    
    if(input$north) p <- p + annotation_north_arrow(location = "tl",
                                              style = north_arrow_minimal(text_size = 14))
  
    p
  
  }, res = 100)}

# This loads and runs the app
shinyApp(ui, server)

You can play with this app to see how changing the design of the map can easily affect your interpretation of the geography of what it shows. The classic book about this is Mark Monmonier’s How to Lie with Maps.

Saving the map and attribute data

That’s almost it for now! However, before finishing, we will save the map with the joined attribute data to the working directory as an R object.

Code
save(municipal, file = "municipal.RData")

Summary

This session has demonstrated that R is a powerful tool for drawing publication quality maps. The native plot functions for sf objects are useful as a quick way to draw a map and both ggplot2 and tmap offer a range of functionality to customise their cartographic outputs to produce really nice looking maps. I tend to use ggplot2 but that is really more out of habit than anything else as tmap might actually be the easier to use. It depends a bit on whether I am drawing static maps (usually in ggplot2) or interactive ones (probably better in tmap). There are other packages available, too, including mapview, which the following code chunk uses (see also, here), and Leaflet to R. Perhaps the key take-home point is that these maps can look at lot better than those produced by some conventional GIS and have the advantage that they can be linked to other analytically processes in R, as future sessions will demonstrate.

Code
if(!("mapview" %in% installed)) install.packages("mapview", dependencies = TRUE)
require(mapview)

mapview(municipal %>% mutate(No_schooling = round(No_schooling, 1)), 
        zcol = "No_schooling",
        layer.name = "% No Schooling",
        map.types = "OpenStreetMap.HOT",
        col.regions = colorRampPalette(rev(brewer.pal(9, "RdYlBu"))))

Further reading

Chapter 2 on Spatial data and R packages for mapping from Geospatial Health Data by Paula Morga.

Chapter 9 on Making maps with R from Geocomputation with R by Robin Lovelace, Jakub Nawosad & Jannes Muenchow.

Chapter 8 on Plotting spatial data from Spatial Data Science with Applications in R by Edzer Pebesma and Roger Bivand.

See also: Elegant and informative maps with tmap, which is a work in progress by Martijn Tennekes and Jakub Nowosad.