A Brief Guide

Introduction

Base R is what you download from CRAN. You might think of it as classic R. A short introduction of ‘the basics’ is provided below. For a fuller introduction, see the software manual, An Introduction to R. It is worth reading even if you end-up regularly using the Tidyverse variant of R described in the next session as there are some tasks that are (in my opinion) easier to do using Base R or by mixing it up a little.

Functions

R is a functional programming language where functions ‘do things’ to objects. What they do is dependent upon the class/type and attributes of the objects that go into the function, and also on the arguments of the function.

For example, try typing the following into the R Console, which is the bottom left panel of R Studio. Type it alongside the prompt symbol, > then hit Enter/Return.

Code
round(10.32, digits = 0)
[1] 10

This calls the function round(), which is operating on the numeric object, 10.32. The argument digits specifies the number of digits to round to. It is set to zero in the example above.

Because digits = 0 is the default value for the function, we could just write

Code
round(10.32)
[1] 10

and obtain the same answer as before. I know that digits = 0 is the default value because, as I type the name of the function into the R Console, I see the arguments of the function and any default values appear.

We can also find out more about the function, including some examples of its use, by opening its help file.

Code
?round


Should we wish to round 10.32 to one digit then we are no longer rounding to the default of zero decimal places and must therefore specify the argument explicitly (the default is no longer what we want).

Code
round(10.32, digits = 1)
[1] 10.3

The following also works because it preserves the order of the arguments in the function.

Code
round(10.32, 1)
[1] 10.3

In other words, if we do not specifically state that x = 10.32 (where x is a numeric vector; here, 10.32) and digits = 1 then they will be taken as the first and second arguments of the function. This requires care to make sure they genuinely are in the right order. If you aren’t certain, then define the arguments explicitly because they will then work out of order.

Code
round(digits = 1, x = 10.32)
[1] 10.3


In the examples above, both the input to and output from the function are a numeric vector of type double. The input is:

Code
class(10.32)
[1] "numeric"
Code
typeof(10.32)
[1] "double"

The output is:

Code
class(round(10.32, digits = 1))
[1] "numeric"
Code
typeof(round(10.32, digits = 1))
[1] "double"

Note how a function can be wrapped within a function, as in the example above: class(round(...)).


At the moment we are using x = 10.32, which is a numeric vector of length 1,

Code
length(10.32)
[1] 1

However, the round() function can operate on numeric vectors of other lengths too.

Code
round(c(1.1, 2.2, 3.3, 4.4, 5.5, 6.6, 7.7))
[1] 1 2 3 4 6 7 8

Here the combine function, c is used to create a vector of length 7, which is the input into round(). The output is of length 7 too.

Code
length(round(c(1.1, 2.2, 3.3, 4.4, 5.5, 6.6, 7.7)))
[1] 7

There are lots of functions for R and I often forget what I need. Fortunately, there is a large user community too and so a quick web search often helps me quickly find what I need. Don’t be afraid to do a Google search for what you need.

Writing a new function

We can write our own functions. The following will take a number and report whether it is a prime number or not.

Code
is.prime <- function(x) {
  if(x == 2) return(TRUE)
  if(x < 2 | x %% floor(x) != 0) {
    warning("Please enter an integer number above 1")
    return(NA)
  }
  y <- 2:(x-1)
  ifelse(all(x%%y > 0), return(TRUE), return(FALSE))
}

Let’s try it.

Code
is.prime(2)
[1] TRUE
Code
is.prime(10)
[1] FALSE
Code
is.prime(13)
[1] TRUE
Code
is.prime(3.3)
[1] NA


There is quite a lot to unpack about the function. It is not all immediately relevant but it is instructive to have an overview of what it is doing. First of all the function takes the form

f <- function(x) {
  ...
}

where x is the input into the function in much the same way that x is the number to be rounded in round(x = ...). It is a ‘place holder’ for the input into the function.

Statements such as if(x == 2) are logical statements: if(...) is true then do whatever follows. If what is to be done spans over multiple lines, they are enclosed by ‘curly brackets’, {...}.

The statement if(x < 2 | x %% floor(x) != 0) in the function is also a logical statement with the inclusion of an or statement, denoted by |. What it is checking is whether x < 2 or if x is a fraction. Had we needed to have both conditions to be met, then an and statement would be used, denoted by & instead of |. Note that ! means not, so != tests for not equal to and is the opposite of ==, which tests for equality.

Where it says, 2:(x-1), this is equivalent to the function, seq(from = 2, to = (x-1), by = 1). It generates a sequence of integer numbers from \(2\) to \((x-1)\).

Code
x <- 10
2 : (x - 1)
[1] 2 3 4 5 6 7 8 9
Code
seq(from = 2, to = (x-1), by = 1)
[1] 2 3 4 5 6 7 8 9

ifelse() is another logical statement. It takes the form, ifelse(condition, a, b): if the condition is met then do a, else do b. In the prime number function it is checking whether dividing \(x\) by any of the numbers from \(2\) to \((x-1)\) generates a whole number.

Finally, the function return() returns an output from the function; here, a logical vector of length 1 that is TRUE, FALSE or NA dependent upon whether \(x\) is or is not a prime number, or if it is not a whole number above \(1\).

Note that in newer versions of R, functions can also take the form,

f <- \(x) {
  ...
}

Therefore the following is exactly equivalent to before.

Code
is.prime <- \(x) {
  if(x == 2) return(TRUE)
  if(x < 2 | x %% floor(x) != 0) {
    warning("Please enter an integer number above 1")
    return(NA)
  }
  y <- 2:(x-1)
  ifelse(all(x%%y > 0), return(TRUE), return(FALSE))
}

Objects and Classes

Our function that checks for a prime number is stored in the object is.prime.

Code
class(is.prime)
[1] "function"

There are other classes of object in R. Some of the most common are listed below.

Logical

The output from the is.prime() function is an example of an object of class logical because the answer is TRUE or FALSE (or NA, not applicable).

Code
x <- is.prime(10)
print(x)
[1] FALSE
Code
class(x)
[1] "logical"

Some other examples:

Code
y <- 10 > 5
print(y)
[1] TRUE
Code
class(y)
[1] "logical"
Code
z <- 2 == 5   # is 2 equal to 5?
print(z)
[1] FALSE

Numeric

We have already seen that some objects are numeric.

Code
x <- mean(0:100)
print(x)
[1] 50
Code
class(x)
[1] "numeric"

This presently is of type double; i.e. it allows for decimal places even where they are not required.

Code
typeof(x)
[1] "double"

but it could be converted to class integer (a whole number with no decimal places).

Code
x <- as.integer(x)
class(x)
[1] "integer"

Character

Other classes include character. Note the difference between the length() of a character vector and the number of characters, nchar(), that any element of that vector contains.

Code
x <- "Mapping and Modelling in R"
print(x)
[1] "Mapping and Modelling in R"
Code
length(x)   # There is only one element in this vector
[1] 1
Code
nchar(x)    # And that element contains 26 letters
[1] 26
Code
class(x)
[1] "character"
Code
y <- paste(x, "with Richard Harris")
print(y)
[1] "Mapping and Modelling in R with Richard Harris"
Code
length(y)   # There is still only one element
[1] 1
Code
nchar(y)    # But now it contains more letters
[1] 46
Code
class(y)
[1] "character"
Code
z <- unlist(strsplit(x, " "))
print(z)
[1] "Mapping"   "and"       "Modelling" "in"        "R"        
Code
length(z)   # The initial vectors has been split into 5 parts
[1] 5
Code
nchar(z)
[1] 7 3 9 2 1
Code
class(z)
[1] "character"

As the name suggests, print is a function that prints its contents to screen. Often it can be omitted in favour of referencing the object directly. For instance, in the example above, rather than typing print(z) it would be sufficient just to type z. Just occasionally though you will find that an object does not print as you intended when the function is omitted. If this happens, try putting print back in.

Matrix

An example of a matrix is

Code
x <- matrix(1:9, ncol = 3)
x
     [,1] [,2] [,3]
[1,]    1    4    7
[2,]    2    5    8
[3,]    3    6    9
Code
ncol(x)   # Number of columns
[1] 3
Code
nrow(x)   # Number of rows
[1] 3
Code
class(x)
[1] "matrix" "array" 

Here the argument byrow is changed from its default value of FALSE to be TRUE:

Code
y <- matrix(1:9, ncol = 3, byrow = TRUE)

This result is equivalent to the transpose of the original matrix.

Code
y
     [,1] [,2] [,3]
[1,]    1    2    3
[2,]    4    5    6
[3,]    7    8    9
Code
t(x)
     [,1] [,2] [,3]
[1,]    1    2    3
[2,]    4    5    6
[3,]    7    8    9

Data frame

A data.frame is a table of data, such as,

Code
df <- data.frame(Day = c("Mon", "Tues", "Wed", "Thurs", "Fri", "Sat", "Sun"),
                 Date = 20:26,
                 Month = "June",
                 Year = 2022)
df
    Day Date Month Year
1   Mon   20  June 2022
2  Tues   21  June 2022
3   Wed   22  June 2022
4 Thurs   23  June 2022
5   Fri   24  June 2022
6   Sat   25  June 2022
7   Sun   26  June 2022
Code
class(df)
[1] "data.frame"
Code
ncol(df)    # Number of columns
[1] 4
Code
nrow(df)    # Number of rows
[1] 7
Code
length(df)  # The length is also the number of columns
[1] 4
Code
names(df)   # The names of the variables in the data frame
[1] "Day"   "Date"  "Month" "Year" 

Note that the length of each column should be equal in the specification of the data frame. The following will generate an error because the Date column is now too short. You might wonder why the Month and Year columns were fine previously when, in fact, they were give only one value, whereas there are 7 days and 7 dates. It is because R recycled them the requisite number of times (i.e. it gave all the rows the same value for Month and Year – it recycled June and 2022 seven times). That option isn’t available for the example below where there are 7 days but 6 dates.

# This will generate an error
df <- data.frame(Day = c("Mon", "Tues", "Wed", "Thurs", "Fri", "Sat", "Sun"),
                 Date = 20:25,
                 Month = "June",
                 Year = 2022)

Factors

Earlier versions of R would, by default, convert character fields in a data frame into factors. The equivalent operation now is,

Code
df2 <- data.frame(Day = c("Mon", "Tues", "Wed", "Thurs", "Fri", "Sat", "Sun"),
                 Date = 20:26,
                 Month = "June",
                 Year = 2022, stringsAsFactors = TRUE)

Treating character fields as factors was clever but frustrating if you didn’t realise it was happening and wanted the characters to remains as characters. The difference is not immediately obvious,

Code
head(df, n= 2)    # with stringsAsFactors = FALSE (the current default)
   Day Date Month Year
1  Mon   20  June 2022
2 Tues   21  June 2022
Code
head(df2, n = 2)  # with stringsAsFactors = TRUE  (the historic default)
   Day Date Month Year
1  Mon   20  June 2022
2 Tues   21  June 2022

These appear to be the same but differences begin to be apparent in the following:

Code
df$Day
[1] "Mon"   "Tues"  "Wed"   "Thurs" "Fri"   "Sat"   "Sun"  
Code
df2$Day
[1] Mon   Tues  Wed   Thurs Fri   Sat   Sun  
Levels: Fri Mon Sat Sun Thurs Tues Wed
Code
df$Month
[1] "June" "June" "June" "June" "June" "June" "June"
Code
df2$Month
[1] June June June June June June June
Levels: June

Basically, a factor is a categorical variable: it encodes which groups or categories (which levels) are to be found in the variable. Knowing this, it is possible to count the number of each group, as in,

Code
summary(df2)
    Day         Date       Month        Year     
 Fri  :1   Min.   :20.0   June:7   Min.   :2022  
 Mon  :1   1st Qu.:21.5            1st Qu.:2022  
 Sat  :1   Median :23.0            Median :2022  
 Sun  :1   Mean   :23.0            Mean   :2022  
 Thurs:1   3rd Qu.:24.5            3rd Qu.:2022  
 Tues :1   Max.   :26.0            Max.   :2022  
 Wed  :1                                         

but not

Code
summary(df)
     Day                 Date         Month                Year     
 Length:7           Min.   :20.0   Length:7           Min.   :2022  
 Class :character   1st Qu.:21.5   Class :character   1st Qu.:2022  
 Mode  :character   Median :23.0   Mode  :character   Median :2022  
                    Mean   :23.0                      Mean   :2022  
                    3rd Qu.:24.5                      3rd Qu.:2022  
                    Max.   :26.0                      Max.   :2022  


Factors can be useful but do not always behave as you might anticipate. For example,

Code
x <- c("2021", "2022")
as.numeric(x)
[1] 2021 2022

is different from,

Code
x <- factor(c("2021", "2022"))
as.numeric(x)
[1] 1 2

These days the defult is stringsAsFactors = FALSE, which is better when using functions such as read.csv() to read a .csv file into a data.frame in R.

Lists

A list is a more flexible class that can hold together other types of object. Without a list, the following only works because the 1:3 are coerced from numbers in x to characters in y – note the " " that appear around them, which shows they are now text.

Code
x <- as.integer(1:3)
class(x)
[1] "integer"
Code
y <- c("a", x)
y
[1] "a" "1" "2" "3"
Code
class(y)
[1] "character"

On the other hand,

Code
y <- list("a", x)

creates a ragged list of two parts:

Code
class(y)
[1] "list"
Code
y
[[1]]
[1] "a"

[[2]]
[1] 1 2 3

The first part has the character "a" in it.

Code
y[[1]]
[1] "a"
Code
class(y[[1]])
[1] "character"

The second has the numbers 1 to 3 in it.

Code
y[[2]]
[1] 1 2 3
Code
class(y[[2]])
[1] "integer"

Note that the length of the list is the length of its parts. Presently it is 2 but the following example has a length of three.

Code
y <- list("a", x, df)
y
[[1]]
[1] "a"

[[2]]
[1] 1 2 3

[[3]]
    Day Date Month Year
1   Mon   20  June 2022
2  Tues   21  June 2022
3   Wed   22  June 2022
4 Thurs   23  June 2022
5   Fri   24  June 2022
6   Sat   25  June 2022
7   Sun   26  June 2022
Code
length(y)
[1] 3


This should not be confused with the length of any one part.

Code
length(y[[1]])
[1] 1
Code
length(y[[2]])
[1] 3
Code
length(y[[3]])
[1] 4

Assignments

Throughout this document I have used the assignment term <- to store the output of a function, as in x <- as.integer(1:3) and y <- list("a", x, df), and so forth. The <- is used to assign the result of a function to an object. You can, if you prefer use =. For example, all the following make the same assignment, which is to give x the value of 1.

Code
x <- 1
x = 1
1 -> x

Personally, I avoid using = as an assignment for the following reasons.
First, not to confuse assignments with arguments,

Code
x <- round(10.32, digits = 1)   # I think this is a bit clearer
x = round(10.32, digits = 1)    # and this a bit less so

Second, to not confuse assignments with logical statements,

Code
x <- 1
y <- 2
z <- x == y   # Again, this is a bit clearer
z = x == y    # and this not so much

Third – but this is pedantic – to avoid the following sort of situation which makes no sense mathematically…

Code
x = 1
y = 2
x = y

… but does in terms of what it really means:

Code
x <- 1
y <- 2
x <- y # Assign the value of y to x, overwriting its previous value


Which you use is a matter of personal preference and, of course, = has one less character than <- to worry about. However, this course is written with,

<- (or ->) is as assignment, as in x <- 1;

= is the value of an argument, as in round(x, digits = 1); and

== is a logical test for equality, as in x == y.

It is important to remember that R is case sensitive. An object called x is different from one called X; y is not the same as Y and so forth.

Manipulating objects

In addition to passing objects to functions such as…

Code
x <- 0:100
mean(x)
[1] 50
Code
sum(x)
[1] 5050
Code
summary(x)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
      0      25      50      50      75     100 
Code
median(x)
[1] 50
Code
quantile(x, probs = c(0, 0.25, 0.5, 0.75, 1))
  0%  25%  50%  75% 100% 
   0   25   50   75  100 
Code
head(sqrt(x)) # The square roots of the first of x
[1] 0.000000 1.000000 1.414214 1.732051 2.000000 2.236068
Code
tail(x^2)     # The square roots of the last of x
[1]  9025  9216  9409  9604  9801 10000
Code
sd(x)         # The standard deviation of x
[1] 29.30017

…there are other ways we may wish to interact with objects.

Mathematical operations

Mathematical operations generally operate on a pairwise basis between corresponding elements in a vector. For example,

Code
x <- 1
y <- 3
x + y
[1] 4
Code
x <- 1:5
y <- 6:10
x + y
[1]  7  9 11 13 15
Code
x * y   # Multiplication
[1]  6 14 24 36 50
Code
x / y   # Divisions
[1] 0.1666667 0.2857143 0.3750000 0.4444444 0.5000000

If one vector is shorter that the other, values will be recycled. In the following example the results are \(1\times6\), \(2\times7\), \(3\times8\), \(4\times9\) and then \(5\times6\) as y is recycled.

Code
x <- 1:5  # This is a vector of length 5
y <- 6:9  # This is a vector of length 4
x * y     # A vector of length 5 but some of y is recycled
[1]  6 14 24 36 30

Subsets of objects

Vectors

If x is a vector then x[n] is the nth element in the vector (the nth position, the nth item). To illustrate,

Code
x <- c("a", "b", "c", "d", "e", "f")
x[1]
[1] "a"
Code
x[3]
[1] "c"
Code
x[c(1, 3, 5)]
[1] "a" "c" "e"
Code
x[length(x)]
[1] "f"

The notation -n can be used to exclude elements.

Code
x[-3]   # All of x except the 3rd element
[1] "a" "b" "d" "e" "f"
Code
x[c(-1, -3, -5)]    # x without the 1st, 3rd and 5th elements
[1] "b" "d" "f"

Matrices

If x is a matrix then x[i, j] is the value of the ith row of the jth column:

Code
x <- matrix(1:10, ncol = 2)
x
     [,1] [,2]
[1,]    1    6
[2,]    2    7
[3,]    3    8
[4,]    4    9
[5,]    5   10
Code
x[1, 1]     # row 1, column 1
[1] 1
Code
x[2, 1]     # row 2, column 1
[1] 2
Code
x[c(3, 5), 2]   # rows 3 and 5 of column 2
[1]  8 10
Code
x[nrow(x), ncol(x)]   # the final entry in the matrix
[1] 10


All of the values in the ith row can be selected using the form x[i, ]

Code
x[1, ]    # row 1
[1] 1 6
Code
x[3, ]    # row 3
[1] 3 8
Code
x[c(1, 5), ]  # rows 1 and 5
     [,1] [,2]
[1,]    1    6
[2,]    5   10
Code
x[c(-1, -3), ]  # All except the 1st and 3rd rows
     [,1] [,2]
[1,]    2    7
[2,]    4    9
[3,]    5   10

Similarly, all of the values in the jth column can be selected using the form x[, j]

Code
x[ ,1]    # column 1
[1] 1 2 3 4 5
Code
x[ ,2]    # column 2
[1]  6  7  8  9 10
Code
x[ , 1:2]   # columns 1 and 2
     [,1] [,2]
[1,]    1    6
[2,]    2    7
[3,]    3    8
[4,]    4    9
[5,]    5   10
Code
x[-3 , 1:2]   # columns 1 and 2 except row 3
     [,1] [,2]
[1,]    1    6
[2,]    2    7
[3,]    4    9
[4,]    5   10

Data frames

Data frames are not unlike a matrix.

Code
df <- data.frame(Day = c("Mon", "Tues", "Wed", "Thurs", "Fri", "Sat", "Sun"),
                 Date = 20:26,
                 Month = "June",
                 Year = 2022)
df[, 1]   # The first column
[1] "Mon"   "Tues"  "Wed"   "Thurs" "Fri"   "Sat"   "Sun"  
Code
df[1, 1]  # The first row of the first column (Day)
[1] "Mon"
Code
df[2, 2]  # The second row of the second column (Date)
[1] 21

However, you can also reference the variable name directly, through the x$variable style notation,

Code
df$Day
[1] "Mon"   "Tues"  "Wed"   "Thurs" "Fri"   "Sat"   "Sun"  
Code
df$Day[1]
[1] "Mon"
Code
df$Date[2]
[1] 21

Alternatively, if you wish, with the square brackets, using the [, "variable"] format.

Code
df[, "Day"]
[1] "Mon"   "Tues"  "Wed"   "Thurs" "Fri"   "Sat"   "Sun"  
Code
df[1, "Day"]
[1] "Mon"
Code
df[2, "Date"]
[1] 21

Lists

We have already seen the use of double square brackets, [[...]] to refer to a part of a list:

Code
x <- 1:3
y <- list("a", x, df)
y[[1]]
[1] "a"
Code
y[[2]]
[1] 1 2 3
Code
y[[3]]
    Day Date Month Year
1   Mon   20  June 2022
2  Tues   21  June 2022
3   Wed   22  June 2022
4 Thurs   23  June 2022
5   Fri   24  June 2022
6   Sat   25  June 2022
7   Sun   26  June 2022

The extension to this is to be able to refer to a specific element within a part of the list by combining it with the other notation. Some examples are:

Code
y[[1]][1]
[1] "a"
Code
y[[2]][3]
[1] 3
Code
y[[3]]$Day
[1] "Mon"   "Tues"  "Wed"   "Thurs" "Fri"   "Sat"   "Sun"  
Code
y[[3]]$Day[1]
[1] "Mon"
Code
y[[3]][2, "Date"]
[1] 21

The way to remember the difference between [[...]] and [...] is that the double square brackets reference a specific part of a list, for example [[3]], the third part; the single square brackets reference a position or element in a vector, such as [4], the fourth. Combining them, [[3]][4] is the 4th element of a vector where that vector forms the 3rd part of a list.

Deleting objects and saving the workspace

My current working directory is,

Code
getwd()
[1] "/Users/ggrjh/Dropbox/github/MandM"

and it contains the following objects:

Code
ls()
[1] "df"       "df2"      "is.prime" "x"        "y"        "z"       

Yours will be different. Remember, it can be useful to create a new project for a new collection of work that you are doing in R and then opening that project each time you start R will ensure that the working directory is that of the project.

To delete a specific object, use rm(),

Code
rm(z)

Or, more than one,

Code
rm(df, df2, is.prime)

To save the workspace and all the objects it now contains use the save.image() function.

Code
save.image("workspace1.RData")

To delete all the objects created in your workspace, use

Code
rm(list=ls())

It is a good idea to save a workspace with a new filename before deleting too much from your workspace to allow you to recover it if necessary. Be especially careful if you use rm(list=ls()) as there is no undo function. The best you can do is load the workspace as it was the last time that you saved it.

To (re)load a workspace, use load().

Code
load("workspace1.RData")

Further reading

This short introduction to base R has really only scratched the surface. There are many books about R that provide a lot more detail but, to remind you, the manual that comes with the software is worth reading and probably the best place to start – An Introduction to R. It is thorough but also relatively short.

Don’t worry if not everything makes sense at this stage. The best way to learn R is to put it into practice and that is what we shall be doing in later sessions.