Thematic maps in R (Part 1)

Introduction

There are lots of ways to produce maps in R. But, however, they are drawn, two things are usually needed to produce a choropleth map of the sort seen in previous sessions:

  • first, some data;
  • second, a map to join the data to.

The join is usually made possible by the data and the map containing the same variable; for example, by using the same ID codes for all the places in the data as are in the map. This implies that what is contained in the data set are measurements of the places shown in the map. Sometimes maps will come pre-bundled with the data of interest so, in effect, the join has already been made.

Once we have the data ready to map (sometimes by ‘cleaning’ the data before linking them to the map) then R offers plenty of options to produce quick or publication quality maps, which may have either static or dynamic content. The two packages we shall focus on are:

  • ggplot2 (mainly in Part 1) and
  • tmap (mainly in Part 2).

However, there are others.

Another package we shall be using in this session is sf, which provides “support for simple features, a standardized way to encode spatial vector data.” What we are mapping in this exercise are vector data – data for which the geography is ultimately defined by a series of points (two points define a line, a series of lines define a boundary, a boundary demarcates an area, and so forth). The other common geographic representation used in Geographic Information Systems is raster, which is a grid representation of a study region, with each cell in the grid given one or more values to represent what is found there. For raster data, the stars and raster packages are useful.

Part 1

Getting Started

As in previous sessions, if you are keeping all the files and outputs from these exercises together in an R Project (which is a good idea) then open that Project now.

Load the data

Let’s begin with the easy bit and load the data, which are from http://superweb.statssa.gov.za. These includes the variable No_schooling which is the percentage of the population without schooling per South African municipality in 2011.

Code
# A quick check to see if the Tidyverse packages are installed...
installed <- installed.packages()[,1]
if(!("tidyverse" %in% installed)) install.packages("tidyverse",
                                                   dependencies = TRUE)
require(tidyverse)

# Read-in the data directly from a web location. The data are in .csv format
education <- read_csv("https://github.com/profrichharris/profrichharris.github.io/raw/main/MandM/data/education.csv")
# Quick check of the data by looking at the top three rows
slice_head(education, n = 3)
# A tibble: 3 × 8
  LocalMunicipalityCode LocalMunicipalityName No_schooling Some_primary
  <chr>                 <chr>                        <dbl>        <dbl>
1 EC101                 Camdeboo                      13.4         34.5
2 EC102                 Blue Crane Route              16.0         36.2
3 EC103                 Ikwezi                        18.4         35.4
# ℹ 4 more variables: Complete_primary <dbl>, Some_secondary <dbl>,
#   Grade_12_Std_10 <dbl>, Higher <dbl>

Loading the map

We have the data but now we need a ‘blank map’ of the same South African municipalities to map it. It is read-in below in geoJSON format but it would not have been unusual if it had been in .shp (shapefile) or .kml format, instead. The source of the data is https://dataportal-mdb-sa.opendata.arcgis.com/. There are several ways of reading this file into R but it is better to use the sf package because older options such as maptools::readShapePoly() (which was for reading shapefiles) or rgdal::readOGR are either deprecated already or in the process of being retired.

Code
# Another quick check to make sure that various libraries have been installed
if(!("proxy" %in% installed)) install.packages("proxy")
if(!("sf" %in% installed)) install.packages("sf", dependencies = TRUE)
require(sf)

# Use the read_sf() function from the sf library to read in a digital map of the study area
municipal <- read_sf("https://github.com/profrichharris/profrichharris.github.io/raw/main/MandM/boundary%20files/MDB_Local_Municipal_Boundary_2011.geojson")


If we now look at the top of the municipal object then we find it is of class sf, which is short for simple features. It has a vector geometry (it is of type multipolygon) and has its coordinate reference system (CRS) set as WGS 84. It also contains some attribute data, although not the schooling data we are looking to map.

Code
slice_head(municipal, n = 1)
Simple feature collection with 1 feature and 10 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 27.15761 ymin: -33.28488 xmax: 28.0811 ymax: -32.67573
Geodetic CRS:  WGS 84
# A tibble: 1 × 11
  OBJECTID ProvinceCode ProvinceName LocalMunicipalityCode LocalMunicipalityName
     <int> <chr>        <chr>        <chr>                 <chr>                
1        1 EC           Eastern Cape BUF                   Buffalo City         
# ℹ 6 more variables: DistrictMunicipalityCode <chr>,
#   DistrictMunicipalityName <chr>, Year <int>, Shape__Area <dbl>,
#   Shape__Length <dbl>, geometry <MULTIPOLYGON [°]>

Here are just the attribute data

Code
st_drop_geometry(municipal) |>
  slice_head(n = 5)
# A tibble: 5 × 10
  OBJECTID ProvinceCode ProvinceName LocalMunicipalityCode LocalMunicipalityName
     <int> <chr>        <chr>        <chr>                 <chr>                
1        1 EC           Eastern Cape BUF                   Buffalo City         
2        2 EC           Eastern Cape EC101                 Camdeboo             
3        3 EC           Eastern Cape EC102                 Blue Crane Route     
4        4 EC           Eastern Cape EC103                 Ikwezi               
5        5 EC           Eastern Cape EC104                 Makana               
# ℹ 5 more variables: DistrictMunicipalityCode <chr>,
#   DistrictMunicipalityName <chr>, Year <int>, Shape__Area <dbl>,
#   Shape__Length <dbl>

And here is the ‘blank’ map, drawn using plot{sf}, which is the ‘base’ way of plotting sf objects.

Code
par(mai=c(0, 0, 0, 0))  # Removes the plot margins
municipal |>
  st_geometry() |>
  plot()

Had it been necessary to set the coordinate reference system (CRS) then the function st_set_crs() would be used. Instead, and just for fun, we will change the existing CRS: here is the map transformed on to a ‘south up’ coordinate reference system, achieved by changing its EPSG code to 2050 with the function st_transform(). I have no idea why you would want to do this or what the projection is actually used for but, more generally, the st_transform is useful for, say (not here, but in other cases), converting between the UK National Grid system (EPSG:27700) and longitude and latitude (EPSG:4326), or making sure the maps you are working with are all using the same CRS, which would be necessary for various GIS (Geographic Information Systems) type operations.

Code
par(mai=c(0, 0, 0, 0))
municipal |>
  st_transform(2050) |>
  st_geometry() |>
  plot()

Note how functions with the sf library tend to start with st_. Personally, I find this slightly confusing and I am not sure it makes it easier to find what I looking for in the package’s help pages but it is consistent for the functions and methods that operate on spatial data and is, I believe, short for spatial type.

sf and sp

At the risk of over-simplification, sf (simple features) can be viewed as a successor to the earlier sp (spatial) and related packages, which are well documented in the book Applied Spatial Data Analysis with R. Sometimes other packages are still reliant on sp and so the spatial objects need to be changed into sp’s native format prior to use.

Code
# From sf to sp
municipal_sp <- as(municipal, "Spatial")
class(municipal_sp)
[1] "SpatialPolygonsDataFrame"
attr(,"package")
[1] "sp"
Code
# From sp to sf
municipal_sf <- st_as_sf(municipal_sp)
class(municipal_sf)
[1] "sf"         "data.frame"

The need for this ought to be less frequent now that sf is well established and sp deprecated.

Joining the attribute data to the map

If we look again at the map and schooling data, we find that they have two variables in common which suggests a means to join them together based on a common variable.

Code
# The variables that they appear to have in common...
intersect(names(municipal), names(education))
[1] "LocalMunicipalityCode" "LocalMunicipalityName"

This is encouraging but, in this example, we need to be careful using the municipal names because not all of those in the map are in the education data or vice versa. Although the variable LocalMunicipalityName is present in both the map and education data, the variable is not actually the same in both because of different ways of spelling the names of places. The following code shows where one is not the same as the other.

Code
# anti_join() returns all rows from x without a match in y
anti_join(municipal, education, by = "LocalMunicipalityName")
Simple feature collection with 7 features and 10 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 27.42423 ymin: -32.05935 xmax: 32.05014 ymax: -24.96652
Geodetic CRS:  WGS 84
# A tibble: 7 × 11
  OBJECTID ProvinceCode ProvinceName LocalMunicipalityCode LocalMunicipalityName
     <int> <chr>        <chr>        <chr>                 <chr>                
1       34 EC           Eastern Cape EC157                 King Sabata Dalindye 
2       74 KZN          KwaZulu-Nat… KZN214                uMuziwabantu         
3       75 KZN          KwaZulu-Nat… KZN215                Ezinqoleni           
4       97 KZN          KwaZulu-Nat… KZN262                uPhongolo            
5      146 MP           Mpumalanga   MP301                 Chief Albert Luthuli 
6      149 MP           Mpumalanga   MP304                 Dr Pixley Ka Isaka S 
7      165 NW           North West   NW372                 Local Municipality o 
# ℹ 6 more variables: DistrictMunicipalityCode <chr>,
#   DistrictMunicipalityName <chr>, Year <int>, Shape__Area <dbl>,
#   Shape__Length <dbl>, geometry <MULTIPOLYGON [°]>
Code
anti_join(education, municipal, by = "LocalMunicipalityName")
# A tibble: 7 × 8
  LocalMunicipalityCode LocalMunicipalityName  No_schooling Some_primary
  <chr>                 <chr>                         <dbl>        <dbl>
1 EC157                 King Sabata Dalindyebo         21.7         33.2
2 KZN214                UMuziwabantu                   20.7         42.1
3 KZN215                Ezingoleni                     21.9         40.1
4 KZN262                UPhongolo                      22.5         35.1
5 MP301                 Albert Luthuli                 23.8         31.1
6 MP304                 Pixley Ka Seme                 24.0         31.9
7 NW372                 Madibeng                       12.8         27.1
# ℹ 4 more variables: Complete_primary <dbl>, Some_secondary <dbl>,
#   Grade_12_Std_10 <dbl>, Higher <dbl>

Fortunately, the municipal codes are consistent even where the names are not, which is why place codes and IDs are generally better than using names when joining data. Names are often subject to sometimes subtle variations that can upset the intended join.

Code
# The municipality codes are consistent in the map and data; there are none that do not match.
anti_join(municipal, education, by = "LocalMunicipalityCode")
Simple feature collection with 0 features and 10 fields
Bounding box:  xmin: NA ymin: NA xmax: NA ymax: NA
Geodetic CRS:  WGS 84
# A tibble: 0 × 11
# ℹ 11 variables: OBJECTID <int>, ProvinceCode <chr>, ProvinceName <chr>,
#   LocalMunicipalityCode <chr>, LocalMunicipalityName <chr>,
#   DistrictMunicipalityCode <chr>, DistrictMunicipalityName <chr>, Year <int>,
#   Shape__Area <dbl>, Shape__Length <dbl>, geometry <GEOMETRY [°]>
Code
anti_join(education, municipal, by = "LocalMunicipalityCode")
# A tibble: 0 × 8
# ℹ 8 variables: LocalMunicipalityCode <chr>, LocalMunicipalityName <chr>,
#   No_schooling <dbl>, Some_primary <dbl>, Complete_primary <dbl>,
#   Some_secondary <dbl>, Grade_12_Std_10 <dbl>, Higher <dbl>

We therefore join the data to the map using the variable LocalMunicipalityCode and check that the schooling data are now attached to the map.

Code
municipal <- left_join(municipal, education, by = "LocalMunicipalityCode")
names(municipal)
 [1] "OBJECTID"                 "ProvinceCode"            
 [3] "ProvinceName"             "LocalMunicipalityCode"   
 [5] "LocalMunicipalityName.x"  "DistrictMunicipalityCode"
 [7] "DistrictMunicipalityName" "Year"                    
 [9] "Shape__Area"              "Shape__Length"           
[11] "geometry"                 "LocalMunicipalityName.y" 
[13] "No_schooling"             "Some_primary"            
[15] "Complete_primary"         "Some_secondary"          
[17] "Grade_12_Std_10"          "Higher"                  

Note that the variables LocalMunicipalityName.x and LocalMunicipalityName.y have been created in the process of the join. This is because although we did not use LocalMunicipalityName for the join, that variable name is present in both the map and the data and is therefore duplicated when they are joined together – municipal$LocalMunicipalityName becomes municipal$LocalMunicipalityName.x and education$LocalMunicipalityName creates municipal$LocalMunicipalityName.y.

Mapping the data

Using plot{sf}

The ‘one line’ way of plotting the data is to use the in-built plot() function for sf.

Code
plot(municipal["No_schooling"])

As a ‘rough and ready’ way to check for spatial variation and patterns in the data, it is quick and easy. It is important to specify the variable(s) you wish to include in the plot – plot(municipal["No_schooling"]) in the example above – or else it will try and plot all of them in the map’s data, up to the value specified by the argument max.plot, which has a default of nine.

Code
plot(municipal)


The map can be customised. For example,

Code
if(!("RColorBrewer" %in% installed)) install.packages("RColorBrewer",
                                                      dependencies = TRUE)
require(RColorBrewer)

plot(municipal["No_schooling"], key.pos = 1, breaks = "jenks", nbreaks = 7,
     pal = rev(brewer.pal(7, "RdYlBu")),
     graticule = TRUE, axes = TRUE,
     main = "Percentage of Population with No Schooling")

Have a read through the documentation at ?sf::plot to get a sense of what the various arguments do. Try changing them and see if you can produce a map with six equal interval breaks, for example.

Note the use of the RColorBrewer package and its brewer.pal() function. RColorBrewer provides colour palettes based on https://colorbrewer2.org/ and has been used to create a diverging red-yellow-blue colour palette that is reversed using the function rev() so that red is assigned to the highest values, not lowest. A ‘natural breaks’ (jenks) classification with 7 colours has been used (breaks = "jenks", nbreaks = 7).

Thinking about the map classes

It is obvious, but under-appreciated, that what we see in a map in a function of how the map is constructed, including the map classes used. Here is the same underlying map but with equal interval breaks instead:

Code
plot(municipal["No_schooling"], key.pos = 1, breaks = "equal", nbreaks = 7,
     pal = rev(brewer.pal(7, "RdYlBu")),
     graticule = TRUE, axes = TRUE,
     main = "Percentage of Population with No Schooling")

… and here it is with quantile breaks:

Code
plot(municipal["No_schooling"], key.pos = 1, breaks = "quantile", nbreaks = 7,
     pal = rev(brewer.pal(7, "RdYlBu")),
     graticule = TRUE, axes = TRUE,
     main = "% of Population with No Schooling")

Clearly the maps above do not appear exactly the same. The geographical patterns and therefore the geographical information that we view in the map change with the number, colouring and widths (ranges) of the map classes. Ideally, these should be set to reflect the distribution of the data and what is being look for in it.

Let’s explore that distribution for a moment and how the various map classes used above end up dividing it up. The following histograms show the break points in the distributions used in the various maps. The code works by creating a list of plots (specifically, a list of ggplots, see below) – one plot each for the jenks, equal and quantile styles – and then, using a package called gridExtra, to arrange those plots into a single grid. However, the code matters less than what it reveals, which is that Jenks or other ‘natural breaks’ classifications are reasonably good for identifying break points that reflect the distribution of the data in the absence of the user having cause to set those break points in some other way.

I am not suggesting you need to follow these exact steps every time you want to map but some data. BUT thinking about the distribution of the data is always important to avoid creating maps that give the wrong impression of the prevalence or otherwise of values in the data. For example, if you use a quantile classification with, say, 4 breaks, it will create four map classes, each containing approximately one quarter of the data, even if some of the values in those classes are rarely found and are unusual when compared to others in the same class. An equal interval classification will not have this problem.

Code
if(!("gridExtra" %in% installed)) install.packages("gridExtra",
                                                   dependencies = TRUE)
if(!("classInt" %in% installed)) install.packages("classInt",
                                                  dependencies = TRUE)

require(gridExtra)
require(classInt)

styles <- c("jenks", "equal", "quantile")
g <- lapply(styles, \(x) {
        ggplot(municipal, aes(x = No_schooling)) +
        geom_histogram(fill = "light grey") +
        xlab("% of Population with No Schooling") +
        geom_vline(xintercept = classIntervals(municipal$No_schooling,
                                               n = 7, style = x)$brks,
                   col = "dark red") +
        geom_rug() +
        theme_minimal() +
        ggtitle(paste(x,"classification"))
    })
# The step below brings together, in a grid, the list of three different plots
# created above
grid.arrange(grobs = g)


For further information on using plot{sf} see here and look at the help menu, ?sf::plot.

Using ggplot2

Whilst the plot() function for sf objects is useful for producing quick maps, I prefer ggplot2 for better quality ones that I am wanting to customise or annotate in particular ways, or to include in a research publication or report. We already have seen examples of ggplot2 output in earlier sessions and also in the histograms above.

ggplot2 is based on The Grammar of Graphics. It’s a simplification of the ‘grammar’ but I find it easiest to think of it, initially, in four stages:

  1. Say which data are to be plotted;
  2. Say which aesthetics of the chart (e.g. colour, line type, point size) will vary with the data (or just be constant);
  3. Say which types of plots (which ‘geoms’) are to feature in the chart;
  4. (Optional customisation) change other attributes of the chart to add titles, rename the axis labels, and so forth.

As an example, and in the code chunk below, those four stages are applied to a boxplot showing the distribution of the no schooling variable by South African Provinces.

First, the data = municipal. Second, consulting with the ggplot2 cheatsheet, I find that the aesthetics, aes(), for the boxplot, require a discrete x and a continuous y, which are provided by ProvinceName and No_schooling, respectively. ProvinceName has also been used to assign a fill colour to each box. Third, the optional changes arise from me preferring theme_minimal() to the default style, although I have then modified it to remove the legend, change the angle of the text on the x-axis, remove the x-axis label and change the y-axis label. I did not have to do any of this; I just preferred it.

Code
require(ggplot2)
ggplot(data = municipal, aes(x = ProvinceName, y = No_schooling,
                             fill = ProvinceName)) +
  geom_boxplot() +
  theme_minimal() +
  theme(legend.position = "none", axis.text.x = element_text(angle = 45)) +
  xlab(element_blank()) +
  ylab("% No schooling within municipalities")


Let’s now take that same general process and apply it to create a map, using the same RColorBrewer colour palette as previously and adding the map using geom_sf (for a full list of geoms available for ggplot2 see here). The line scale_fill_distiller is an easy way to shade the map using the colour palette from RColorBrewer andlabs() adds labelling. The steps are the same: specify the data; specify the aesthetics; specify the geom (now of type sf, because it is a spatial feature); add the customisation.

Code
ggplot(municipal, aes(fill = No_schooling)) +
  geom_sf() +
  scale_fill_distiller("%", palette = "RdYlBu") +
  theme_minimal() +
  labs(
    title = "Percentage of Population with No Schooling",
    subtitle = "2011 South African Census Data",
    caption = "Source: Statistics South Africa"
  )  


Presently the map has a continuous shading scheme – you can see it in the map’s legend. This can be changed to discrete map classes and colours by converting the continuous municipal$No_schooling variable to a factor, using the cut() function, here with break points found using ClassIntervals(style = "jenks"). Because we have then changed from continuous to discrete (categorised) data but still want to use an RColorBrewer palette, so scale_fill_brewer() replaces scale_fill_distiller(), wherein the argument direction = -1 reverses the RdYlBu palette so that the highest values are coloured red. Adding guides(fill = guide_legend(reverse = TRUE)) reverses the legend so that the highest values are on top in the legend, which is another preference of mine. It just seems kind of logical to me to have the highest values highest in the legend.

Code
# Find the break points in the distribution using a Jenks classification
brks <- classIntervals(municipal$No_schooling, n = 7, style = "jenks")$brks

# Factor the No_schooling variable using those break points
municipal$No_schooling_gp <- cut(municipal$No_schooling, brks,
                                 include.lowest = TRUE)

ggplot(municipal, aes(fill = No_schooling_gp)) +
  geom_sf() +
  scale_fill_brewer("%", palette = "RdYlBu", direction = -1) +
  theme_minimal() +
  guides(fill = guide_legend(reverse = TRUE)) +
  labs(
    title = "Percentage of Population with No Schooling",
    subtitle = "2011 South African Census Data",
    caption = "Source: Statistics South Africa"
  ) 

At this point, we may note that the four stages of the map production that I referred to earlier was an over-simplification and we can add a fifth (but it is the fourth below; sorry to be confusing):

  1. Say which data are to be plotted;
  2. Say which aesthetics of the chart (e.g. colour, line type, point size) will vary with the data;
  3. Say which types of plots (which ‘geoms’) are to feature in the chart – geom_sf for mapping;
  4. (the new one) ‘Scale’ the data – in the above examples, link the mapped variables to map classes and colour codes. The scaling is what scale_fill_... was doing;
  5. (Optional customisation) change other attributes of the chart to add titles, rename the axis labels, and so forth.

We could also use the scaling to more literally scale (or transform) the data, in a mathematical sense, by using a log scale to plot the data. This will require the original data, before we converted it to factors:

Code
ggplot(municipal, aes(fill = No_schooling)) +
  geom_sf() +
  scale_fill_distiller("%", palette = "RdYlBu", trans = "log") +
  theme_minimal() +
  labs(
    title = "Percentage of Population with No Schooling",
    subtitle = "2011 South African Census Data",
    caption = "Source: Statistics South Africa"
  )  

The legend of that map would look nicer with the values formatted to a more sensible number of decimal places:

Code
ggplot(municipal, aes(fill = No_schooling)) +
  geom_sf() +
  scale_fill_distiller("%", palette = "RdYlBu", trans = "log", label = \(x) sprintf("%.1f", x)) +
  theme_minimal() +
  labs(
    title = "Percentage of Population with No Schooling",
    subtitle = "2011 South African Census Data",
    caption = "Source: Statistics South Africa"
  )  

Annotating the map with ggspatial

Having created the basic map using ggplot2, we can add some additional map elements using ggspatial – a package that provides some extra cartographic functions. The following code chunk adds a backdrop to the map. Different backgrounds (alternative map tiles) can be chosen from the list at rosm::osm.types(); see here for what they look like.

Before running the code, we may note a change from the previous code chunk (above) which is in addition to installing and requiring ggspatial and adding the map tile as a backdrop. Specifically, if you look at the previous code chunk you will find that the data, municipal are handed to ggplot in the top line, ggplot(municipal, ...), where the first argument is the data one. In other words, ggplot(municipal, ...) is the same as, ggplot(data = municipal, ...) (check the help file, ?ggplot2::ggplot to confirm this). By specifying the data in the top line, in essence this sets data = municipal as a global parameter for the plot: it is where ggplot will, by default, now look for the variable called for in aes(fill = No_schooling_gp) and where it will look for other variables too. Whilst this would work just fine in the code immediately below, a little later I introduce a second geom_sf into the plot and no longer want municipal to be the default choice for all the aesthetics of the chart, just some of them. To pre-empt any problems that might otherwise arise, I no longer specify municipal as the default dataset in my ggplot() but, instead, specifically name municipal where I want to use it, which is as the fill data, in the line geom_sf(data = municipal, aes(fill = No_schooling_gp)). This will leave me free to associate other data with different aesthetics in due course. More simply, if you set the data = argument and also any aesthetics, mapping = aes() in the ggplot(...) line of the code, then you are basically saying “use these for everything that follows” unless you override them by saying what you want for each specific geom.

Code
if(!("ggspatial" %in% installed)) install.packages("ggspatial",
                                                   dependencies = TRUE)
require(ggspatial)

ggplot() +
  annotation_map_tile(type = "cartolight", progress = "none") +
  geom_sf(data = municipal, aes(fill = No_schooling_gp)) +
  scale_fill_brewer("%", palette = "RdYlBu", direction = -1) +
  theme_minimal() +
  guides(fill = guide_legend(reverse = TRUE)) +
  labs(
    title = "Percentage of Population with No Schooling",
    subtitle = "2011 South African Census Data",
    caption = "Source: Statistics South Africa"
  ) 


Or, using our previous log scale,

Code
ggplot() +
  annotation_map_tile(type = "cartolight", progress = "none") +
  geom_sf(data = municipal, aes(fill = No_schooling)) +
  scale_fill_distiller("%", palette = "RdYlBu", trans = "log", label = \(x) sprintf("%.1f", x)) +
  theme_minimal() +
  labs(
    title = "Percentage of Population with No Schooling",
    subtitle = "2011 South African Census Data",
    caption = "Source: Statistics South Africa"
  ) 


A north arrow and a scale bar can also be added, although including the scale bar generates a warning because the map-to-true-life distance ratio is not actually constant across the map but varies with longitude and latitude (so maybe we should not be adding a scale bar at all?). The argument, location = "tl" is short for top left; location = "br" for bottom right. See ?annotation_north_arrow and ?annotation_scale for further details and options. Note also the use of the last_plot() function to more easily add content to the last ggplot.

Code
last_plot() +
  annotation_north_arrow(location = "tl",
                         style = north_arrow_minimal(text_size = 14)) +
  annotation_scale(location = "br", style = "ticks")

Adding point symbols to the map

In the next example, the locations of South African cities are added to the map, with a symbol drawn in proportion to their population size. The source of the data is a shapefile from https://data.humdata.org/dataset/hotosm_zaf_populated_places. The symbol shape that is specified by pch = 3 has the same numeric coding as those in ?graphics::points (i.e. 0 is a square, 1, is a circle, 2 is a triangle, and so forth).

The function scales::label_comma() forces decimal display of numbers to avoid displaying scientific notation.

The notation [pagkage_name]::[function] is a way of saying in which library/package a specific function is found – the label_comma() function is in the scales library. Sometimes this can be useful to run a function from a package/library without first requiring (loading) it. It can also be useful to avoid the problem when two packages contain a function with the same name. For example, there is a select() function in both the ’raster and dplyr libraries. If you require (load) both of these libraries then the select function in whichever is loaded second will mask the select function in the one loaded first, potentially causing errors or confusions. A way around the problem is to use the :: notation to be specific about which package’s select you wish to use, i.e. raster::select() or dplyr::select().

Code
if(!("scales" %in% installed)) install.packages("scales", dependencies = TRUE)

download.file("https://github.com/profrichharris/profrichharris.github.io/blob/main/MandM/boundary%20files/hotosm_zaf_populated_places_points_shp.zip?raw=true",
              "cities.zip", mode = "wb", quiet = TRUE)
unzip("cities.zip")

read_sf("hotosm_zaf_populated_places_points.shp") |>
  filter(place == "city") |>
  mutate(population = as.numeric(population)) ->
  cities

last_plot() +
  geom_sf(data = cities, aes(size = population), pch = 3) +
  scale_size("Population", labels = scales::label_comma())

Slightly confusingly, a shapefile actually consists of at least three files, one with the extension .shp (the coordinate/shape data), one .shx (an index file) and one .dbf (the attribute data). If you use a shapefile you need to make sure you download all of them and keep them together in the same folder.

Labelling using ggsflabel

Nice labelling of the cities is provided by ggsflabel, in this example using a pipe |> to filter and only label cities with over a million population. The function geom_sf_label_repel() is designed to stop labels from being placed over each other. We could use last_plot() again but, instead, here is the code in full, going back to using the grouped data, rather than the continuous, log scale, and a different colour palette (just for a bit of variety)

Code
if(!("remotes" %in% installed)) install.packages("remotes", dependencies = TRUE)
if(!("ggsflabel" %in% installed)) remotes::install_github("yutannihilation/ggsflabel")
require(ggsflabel)

ggplot() +
  annotation_map_tile(type = "thunderforestoutdoors", progress = "none") +
  geom_sf(data = municipal, aes(fill = No_schooling_gp)) +
  scale_fill_brewer("%", palette = "BuPu") +
    geom_sf(data = cities, aes(size = population), pch = 3) +
  scale_size("Population", labels = scales::label_comma()) +
  geom_sf_label_repel(data = cities |> filter(population > 1e6),
                    aes(label = name), alpha = 0.7, size = 3) +
  theme_minimal() +
  # The line below removes some annoying axis titles that otherwise appear
  theme(axis.title.x = element_blank(), axis.title.y = element_blank()) +
  guides(fill = guide_legend(reverse = TRUE)) +
  labs(
    title = "Percentage of Population with No Schooling",
    subtitle = "2011 South African Census Data",
    caption = "Source: Statistics South Africa"
  ) 

Saving the map as a graphic file

Having created the map, it can now be saved as a graphic. If you are not using an R Project to save your files into, you may wish to change your working directory before saving the graphic, using setwd(dir) and substituting dir with the pathname to the preferred directory, or by using Session -> Set Working Directory -> Choose Directory from the dropdown menus. Once you have done so, the last_plot() is easily saved using the function ggsave(). For example, in .pdf format, to a print quality,

Code
ggsave("no_schooling.pdf", device = "pdf", width = 6, units = "in",
       dpi = "print")

Alternatively, we can write directly to a graphics device, using one of the functions bmp(), jpeg(), png(), tiff() or pdf(). For instance,

Code
jpeg("no_schooling.jpg", res = 150)
last_plot()
dev.off()

Creating an interactive map using ggiraph

So far all the maps we have created have been static. This is obviously better for anything that will be printed but, for a website or similar, we may wish to include more ‘interaction’. The package ggiraph package creates dynamic ggplot2 graphs and we can use it to create an interactive map where information about the areas appears as we brush over those areas on the map with the mouse pointer. This is achieved by replacing, in the code, geom_sf() with the geom_sf_interactive() function from ggiraph, specifying the text to show with the tooltip (the example below pastes a number of character elements together without a space between them, hence paste0() but does include a carriage return, \n) and rendering the resulting ggplot2 object with girafe().

Code
if(!("ggiraph" %in% installed)) install.packages("ggiraph", dependencies = TRUE)
require(ggiraph)

g <- ggplot() +
  annotation_map_tile(type = "stamenwatercolor", progress = "none") +
  geom_sf_interactive(data = municipal,
                      aes(tooltip = paste0(LocalMunicipalityName.x, "\n",
                                           round(No_schooling,1), "%"),
                             fill = No_schooling_gp)) +
  scale_fill_brewer("%", palette = "BuPu") +
  theme_minimal() +
  guides(fill = guide_legend(reverse = TRUE)) +
  labs(
    title = "Percentage of Population with No Schooling",
    subtitle = "2011 South African Census Data",
    caption = "Source: Statistics South Africa"
  ) +
  annotation_north_arrow(location = "tl",
                         style = north_arrow_minimal(text_size = 14)) +
  annotation_scale(location = "br", style = "ticks")

girafe(ggobj = g)

Save your workspace

This is the end of Part 1 and is a good place to stop or, at least, take a break. If you won’t be continuing immediately to Part 2 then don’t forget to save your workspace. For example, by using,

Code
save.image("making_maps.RData")