Base R

A Brief Guide

Introduction

Base R is what you download from CRAN. You might think of it as classic R. A short introduction of ‘the basics’ is provided below. For a fuller introduction, see the software manual, An Introduction to R. It is worth reading even if you end-up regularly using the Tidyverse variant of R described in the next session as there are some tasks that are (in my opinion) easier to do using Base R or by mixing it up a little.

Functions

R is a functional programming language where functions ‘do things’ to objects. What they do is dependent upon the class/type and attributes of the objects that go into the function, and also on the arguments of the function.

For example, try typing the following into the R Console, which is the bottom left panel of R Studio. Type it alongside the prompt symbol, > then hit Enter/Return.

Code

round(10.32, digits = 0)

[1] 10

This calls the function round(), which is operating on the numeric object, 10.32. The argument digits specifies the number of digits to round to. It is set to zero in the example above.

Because digits = 0 is the default value for the function, we could just write

Code

round(10.32)

[1] 10

and obtain the same answer as before. I know that digits = 0 is the default value because, as I type the name of the function into the R Console, I see the arguments of the function and any default values appear.

We can also find out more about the function, including some examples of its use, by opening its help file.

Code

?round

Should we wish to round 10.32 to one digit then we are no longer rounding to the default of zero decimal places and must therefore specify the argument explicitly (the default is no longer what we want).

Code

round(10.32, digits = 1)

[1] 10.3

The following also works because it preserves the order of the arguments in the function.

Code

round(10.32, 1)

[1] 10.3

In other words, if we do not specifically state that x = 10.32 (where x is a numeric vector; here, 10.32) and digits = 1 then they will be taken as the first and second arguments of the function. This requires care to make sure they genuinely are in the right order. If you aren’t certain, then define the arguments explicitly because they will then work out of order.

Code

round(digits = 1, x = 10.32)

[1] 10.3

In the examples above, both the input to and output from the function are a numeric vector of type double. The input is:

Code

class(10.32)

[1] "numeric"

Code

typeof(10.32)

[1] "double"

The output is:

Code

class(round(10.32, digits = 1))

[1] "numeric"

Code

typeof(round(10.32, digits = 1))

[1] "double"

Note how a function can be wrapped within a function, as in the example above: class(round(...)).

At the moment we are using x = 10.32, which is a numeric vector of length 1,

Code

length(10.32)

[1] 1

However, the round() function can operate on numeric vectors of other lengths too.

Code

round(c(1.1, 2.2, 3.3, 4.4, 5.5, 6.6, 7.7))

[1] 1 2 3 4 6 7 8

Here the combine function, c is used to create a vector of length 7, which is the input into round(). The output is of length 7 too.

Code

length(round(c(1.1, 2.2, 3.3, 4.4, 5.5, 6.6, 7.7)))

[1] 7

There are lots of functions for R and I often forget what I need. Fortunately, there is a large user community too and so a quick web search often helps me quickly find what I need. Don’t be afraid to do a Google search for what you need.

Writing a new function

We can write our own functions. The following will take a number and report whether it is a prime number or not.

Code

is.prime <- function(x) {
  if(x == 2) return(TRUE)
  if(x < 2 | x %% floor(x) != 0) {
    warning("Please enter an integer number above 1")
    return(NA)
  }
  y <- 2:(x-1)
  ifelse(all(x%%y > 0), return(TRUE), return(FALSE))
}

Let’s try it.

Code

is.prime(2)

[1] TRUE

Code

is.prime(10)

[1] FALSE

Code

is.prime(13)

[1] TRUE

Code

is.prime(3.3)

[1] NA

There is quite a lot to unpack about the function. It is not all immediately relevant but it is instructive to have an overview of what it is doing. First of all the function takes the form

f <- function(x) {
  ...
}

where x is the input into the function in much the same way that x is the number to be rounded in round(x = ...). It is a ‘place holder’ for the input into the function.

Statements such as if(x == 2) are logical statements: if(...) is true then do whatever follows. If what is to be done spans over multiple lines, they are enclosed by ‘curly brackets’, {...}.

The statement if(x < 2 | x %% floor(x) != 0) in the function is also a logical statement with the inclusion of an or statement, denoted by |. What it is checking is whether x < 2 or if x is a fraction. Had we needed to have both conditions to be met, then an and statement would be used, denoted by & instead of |. Note that ! means not, so != tests for not equal to and is the opposite of ==, which tests for equality.

Where it says, 2:(x-1), this is equivalent to the function, seq(from = 2, to = (x-1), by = 1). It generates a sequence of integer numbers from $2$ to $(x-1)$.

Code

x <- 10
2 : (x - 1)

[1] 2 3 4 5 6 7 8 9

Code

seq(from = 2, to = (x-1), by = 1)

[1] 2 3 4 5 6 7 8 9

ifelse() is another logical statement. It takes the form, ifelse(condition, a, b): if the condition is met then do a, else do b. In the prime number function it is checking whether dividing $x$ by any of the numbers from $2$ to $(x-1)$ generates a whole number.

Finally, the function return() returns an output from the function; here, a logical vector of length 1 that is TRUE, FALSE or NA dependent upon whether $x$ is or is not a prime number, or if it is not a whole number above $1$.

Note that in newer versions of R, functions can also take the form,

f <- \(x) {
  ...
}

Therefore the following is exactly equivalent to before.

Code

is.prime <- \(x) {
  if(x == 2) return(TRUE)
  if(x < 2 | x %% floor(x) != 0) {
    warning("Please enter an integer number above 1")
    return(NA)
  }
  y <- 2:(x-1)
  ifelse(all(x%%y > 0), return(TRUE), return(FALSE))
}

Objects and Classes

Our function that checks for a prime number is stored in the object is.prime.

Code

class(is.prime)

[1] "function"

There are other classes of object in R. Some of the most common are listed below.

Logical

The output from the is.prime() function is an example of an object of class logical because the answer is TRUE or FALSE (or NA, not applicable).

Code

x <- is.prime(10)
print(x)

[1] FALSE

Code

class(x)

[1] "logical"

Some other examples:

Code

y <- 10 > 5
print(y)

[1] TRUE

Code

class(y)

[1] "logical"

Code

z <- 2 == 5   # is 2 equal to 5?
print(z)

[1] FALSE

Numeric

We have already seen that some objects are numeric.

Code

x <- mean(0:100)
print(x)

[1] 50

Code

class(x)

[1] "numeric"

This presently is of type double; i.e. it allows for decimal places even where they are not required.

Code

typeof(x)

[1] "double"

but it could be converted to class integer (a whole number with no decimal places).

Code

x <- as.integer(x)
class(x)

[1] "integer"

Character

Other classes include character. Note the difference between the length() of a character vector and the number of characters, nchar(), that any element of that vector contains.

Code

x <- "Mapping and Modelling in R"
print(x)

[1] "Mapping and Modelling in R"

Code

length(x)   # There is only one element in this vector

[1] 1

Code

nchar(x)    # And that element contains 26 letters

[1] 26

Code

class(x)

[1] "character"

Code

y <- paste(x, "with Richard Harris")
print(y)

[1] "Mapping and Modelling in R with Richard Harris"

Code

length(y)   # There is still only one element

[1] 1

Code

nchar(y)    # But now it contains more letters

[1] 46

Code

class(y)

[1] "character"

Code

z <- unlist(strsplit(x, " "))
print(z)

[1] "Mapping"   "and"       "Modelling" "in"        "R"

Code

length(z)   # The initial vectors has been split into 5 parts

[1] 5

Code

nchar(z)

[1] 7 3 9 2 1

Code

class(z)

[1] "character"

As the name suggests, print is a function that prints its contents to screen. Often it can be omitted in favour of referencing the object directly. For instance, in the example above, rather than typing print(z) it would be sufficient just to type z. Just occasionally though you will find that an object does not print as you intended when the function is omitted. If this happens, try putting print back in.

Matrix

An example of a matrix is

Code

x <- matrix(1:9, ncol = 3)
x

     [,1] [,2] [,3]
[1,]    1    4    7
[2,]    2    5    8
[3,]    3    6    9

Code

ncol(x)   # Number of columns

[1] 3

Code

nrow(x)   # Number of rows

[1] 3

Code

class(x)

[1] "matrix" "array"

Here the argument byrow is changed from its default value of FALSE to be TRUE:

Code

y <- matrix(1:9, ncol = 3, byrow = TRUE)

This result is equivalent to the transpose of the original matrix.

Code

     [,1] [,2] [,3]
[1,]    1    2    3
[2,]    4    5    6
[3,]    7    8    9

Code

t(x)

     [,1] [,2] [,3]
[1,]    1    2    3
[2,]    4    5    6
[3,]    7    8    9

Data frame

A data.frame is a table of data, such as,

Code

df <- data.frame(Day = c("Mon", "Tues", "Wed", "Thurs", "Fri", "Sat", "Sun"),
                 Date = 20:26,
                 Month = "June",
                 Year = 2022)
df

    Day Date Month Year
1   Mon   20  June 2022
2  Tues   21  June 2022
3   Wed   22  June 2022
4 Thurs   23  June 2022
5   Fri   24  June 2022
6   Sat   25  June 2022
7   Sun   26  June 2022

Code

class(df)

[1] "data.frame"

Code

ncol(df)    # Number of columns

[1] 4

Code

nrow(df)    # Number of rows

[1] 7

Code

length(df)  # The length is also the number of columns

[1] 4

Code

names(df)   # The names of the variables in the data frame

[1] "Day"   "Date"  "Month" "Year"

Note that the length of each column should be equal in the specification of the data frame. The following will generate an error because the Date column is now too short. You might wonder why the Month and Year columns were fine previously when, in fact, they were give only one value, whereas there are 7 days and 7 dates. It is because R recycled them the requisite number of times (i.e. it gave all the rows the same value for Month and Year – it recycled June and 2022 seven times). That option isn’t available for the example below where there are 7 days but 6 dates.

# This will generate an error
df <- data.frame(Day = c("Mon", "Tues", "Wed", "Thurs", "Fri", "Sat", "Sun"),
                 Date = 20:25,
                 Month = "June",
                 Year = 2022)

Factors

Earlier versions of R would, by default, convert character fields in a data frame into factors. The equivalent operation now is,

Code

df2 <- data.frame(Day = c("Mon", "Tues", "Wed", "Thurs", "Fri", "Sat", "Sun"),
                 Date = 20:26,
                 Month = "June",
                 Year = 2022, stringsAsFactors = TRUE)

Treating character fields as factors was clever but frustrating if you didn’t realise it was happening and wanted the characters to remains as characters. The difference is not immediately obvious,

Code

head(df, n= 2)    # with stringsAsFactors = FALSE (the current default)

   Day Date Month Year
1  Mon   20  June 2022
2 Tues   21  June 2022

Code

head(df2, n = 2)  # with stringsAsFactors = TRUE  (the historic default)

   Day Date Month Year
1  Mon   20  June 2022
2 Tues   21  June 2022

These appear to be the same but differences begin to be apparent in the following:

Code

df$Day

[1] "Mon"   "Tues"  "Wed"   "Thurs" "Fri"   "Sat"   "Sun"

Code

df2$Day

[1] Mon   Tues  Wed   Thurs Fri   Sat   Sun  
Levels: Fri Mon Sat Sun Thurs Tues Wed

Code

df$Month

[1] "June" "June" "June" "June" "June" "June" "June"

Code

df2$Month

[1] June June June June June June June
Levels: June

Basically, a factor is a categorical variable: it encodes which groups or categories (which levels) are to be found in the variable. Knowing this, it is possible to count the number of each group, as in,

Code

summary(df2)

    Day         Date       Month        Year     
 Fri  :1   Min.   :20.0   June:7   Min.   :2022  
 Mon  :1   1st Qu.:21.5            1st Qu.:2022  
 Sat  :1   Median :23.0            Median :2022  
 Sun  :1   Mean   :23.0            Mean   :2022  
 Thurs:1   3rd Qu.:24.5            3rd Qu.:2022  
 Tues :1   Max.   :26.0            Max.   :2022  
 Wed  :1

but not

Code

summary(df)

     Day                 Date         Month                Year     
 Length:7           Min.   :20.0   Length:7           Min.   :2022  
 Class :character   1st Qu.:21.5   Class :character   1st Qu.:2022  
 Mode  :character   Median :23.0   Mode  :character   Median :2022  
                    Mean   :23.0                      Mean   :2022  
                    3rd Qu.:24.5                      3rd Qu.:2022  
                    Max.   :26.0                      Max.   :2022

Factors can be useful but do not always behave as you might anticipate. For example,

Code

x <- c("2021", "2022")
as.numeric(x)

[1] 2021 2022

is different from,

Code

x <- factor(c("2021", "2022"))
as.numeric(x)

[1] 1 2

These days the defult is stringsAsFactors = FALSE, which is better when using functions such as read.csv() to read a .csv file into a data.frame in R.

Lists

A list is a more flexible class that can hold together other types of object. Without a list, the following only works because the 1:3 are coerced from numbers in x to characters in y – note the " " that appear around them, which shows they are now text.

Code

x <- as.integer(1:3)
class(x)

[1] "integer"

Code

y <- c("a", x)
y

[1] "a" "1" "2" "3"

Code

class(y)

[1] "character"

On the other hand,

Code

y <- list("a", x)

creates a ragged list of two parts:

Code

class(y)

[1] "list"

Code

[[1]]
[1] "a"

[[2]]
[1] 1 2 3

The first part has the character "a" in it.

Code

y[[1]]

[1] "a"

Code

class(y[[1]])

[1] "character"

The second has the numbers 1 to 3 in it.

Code

y[[2]]

[1] 1 2 3

Code

class(y[[2]])

[1] "integer"

Note that the length of the list is the length of its parts. Presently it is 2 but the following example has a length of three.

Code

y <- list("a", x, df)
y

[[1]]
[1] "a"

[[2]]
[1] 1 2 3

[[3]]
    Day Date Month Year
1   Mon   20  June 2022
2  Tues   21  June 2022
3   Wed   22  June 2022
4 Thurs   23  June 2022
5   Fri   24  June 2022
6   Sat   25  June 2022
7   Sun   26  June 2022

Code

length(y)

[1] 3

This should not be confused with the length of any one part.

Code

length(y[[1]])

[1] 1

Code

length(y[[2]])

[1] 3

Code

length(y[[3]])

[1] 4

Assignments

Throughout this document I have used the assignment term <- to store the output of a function, as in x <- as.integer(1:3) and y <- list("a", x, df), and so forth. The <- is used to assign the result of a function to an object. You can, if you prefer use =. For example, all the following make the same assignment, which is to give x the value of 1.

Code

x <- 1
x = 1
1 -> x

Personally, I avoid using = as an assignment for the following reasons.
First, not to confuse assignments with arguments,

Code

x <- round(10.32, digits = 1)   # I think this is a bit clearer
x = round(10.32, digits = 1)    # and this a bit less so

Second, to not confuse assignments with logical statements,

Code

x <- 1
y <- 2
z <- x == y   # Again, this is a bit clearer
z = x == y    # and this not so much

Third – but this is pedantic – to avoid the following sort of situation which makes no sense mathematically…

Code

x = 1
y = 2
x = y

… but does in terms of what it really means:

Code

x <- 1
y <- 2
x <- y # Assign the value of y to x, overwriting its previous value

Which you use is a matter of personal preference and, of course, = has one less character than <- to worry about. However, this course is written with,

<- (or ->) is as assignment, as in x <- 1;

= is the value of an argument, as in round(x, digits = 1); and

== is a logical test for equality, as in x == y.

It is important to remember that R is case sensitive. An object called x is different from one called X; y is not the same as Y and so forth.

Manipulating objects

In addition to passing objects to functions such as…

Code

x <- 0:100
mean(x)

[1] 50

Code

sum(x)

[1] 5050

Code

summary(x)

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
      0      25      50      50      75     100

Code

median(x)

[1] 50

Code

quantile(x, probs = c(0, 0.25, 0.5, 0.75, 1))

  0%  25%  50%  75% 100% 
   0   25   50   75  100

Code

head(sqrt(x)) # The square roots of the first of x

[1] 0.000000 1.000000 1.414214 1.732051 2.000000 2.236068

Code

tail(x^2)     # The square roots of the last of x

[1]  9025  9216  9409  9604  9801 10000

Code

sd(x)         # The standard deviation of x

[1] 29.30017

…there are other ways we may wish to interact with objects.

Mathematical operations

Mathematical operations generally operate on a pairwise basis between corresponding elements in a vector. For example,

Code

x <- 1
y <- 3
x + y

[1] 4

Code

x <- 1:5
y <- 6:10
x + y

[1]  7  9 11 13 15

Code

x * y   # Multiplication

[1]  6 14 24 36 50

Code

x / y   # Divisions

[1] 0.1666667 0.2857143 0.3750000 0.4444444 0.5000000

If one vector is shorter that the other, values will be recycled. In the following example the results are $1\times6$, $2\times7$, $3\times8$, $4\times9$ and then $5\times6$ as y is recycled.

Code

x <- 1:5  # This is a vector of length 5
y <- 6:9  # This is a vector of length 4
x * y     # A vector of length 5 but some of y is recycled

[1]  6 14 24 36 30

Subsets of objects

Vectors

If x is a vector then x[n] is the nth element in the vector (the nth position, the nth item). To illustrate,

Code

x <- c("a", "b", "c", "d", "e", "f")
x[1]

[1] "a"

Code

x[3]

[1] "c"

Code

x[c(1, 3, 5)]

[1] "a" "c" "e"

Code

x[length(x)]

[1] "f"

The notation -n can be used to exclude elements.

Code

x[-3]   # All of x except the 3rd element

[1] "a" "b" "d" "e" "f"

Code

x[c(-1, -3, -5)]    # x without the 1st, 3rd and 5th elements

[1] "b" "d" "f"

Matrices

If x is a matrix then x[i, j] is the value of the ith row of the jth column:

Code

x <- matrix(1:10, ncol = 2)
x

     [,1] [,2]
[1,]    1    6
[2,]    2    7
[3,]    3    8
[4,]    4    9
[5,]    5   10

Code

x[1, 1]     # row 1, column 1

[1] 1

Code

x[2, 1]     # row 2, column 1

[1] 2

Code

x[c(3, 5), 2]   # rows 3 and 5 of column 2

[1]  8 10

Code

x[nrow(x), ncol(x)]   # the final entry in the matrix

[1] 10

All of the values in the ith row can be selected using the form x[i, ]

Code

x[1, ]    # row 1

[1] 1 6

Code

x[3, ]    # row 3

[1] 3 8

Code

x[c(1, 5), ]  # rows 1 and 5

     [,1] [,2]
[1,]    1    6
[2,]    5   10

Code

x[c(-1, -3), ]  # All except the 1st and 3rd rows

     [,1] [,2]
[1,]    2    7
[2,]    4    9
[3,]    5   10

Similarly, all of the values in the jth column can be selected using the form x[, j]

Code

x[ ,1]    # column 1

[1] 1 2 3 4 5

Code

x[ ,2]    # column 2

[1]  6  7  8  9 10

Code

x[ , 1:2]   # columns 1 and 2

     [,1] [,2]
[1,]    1    6
[2,]    2    7
[3,]    3    8
[4,]    4    9
[5,]    5   10

Code

x[-3 , 1:2]   # columns 1 and 2 except row 3

     [,1] [,2]
[1,]    1    6
[2,]    2    7
[3,]    4    9
[4,]    5   10

Data frames

Data frames are not unlike a matrix.

Code

df <- data.frame(Day = c("Mon", "Tues", "Wed", "Thurs", "Fri", "Sat", "Sun"),
                 Date = 20:26,
                 Month = "June",
                 Year = 2022)
df[, 1]   # The first column

[1] "Mon"   "Tues"  "Wed"   "Thurs" "Fri"   "Sat"   "Sun"

Code

df[1, 1]  # The first row of the first column (Day)

[1] "Mon"

Code

df[2, 2]  # The second row of the second column (Date)

[1] 21

However, you can also reference the variable name directly, through the x$variable style notation,

Code

df$Day

[1] "Mon"   "Tues"  "Wed"   "Thurs" "Fri"   "Sat"   "Sun"

Code

df$Day[1]

[1] "Mon"

Code

df$Date[2]

[1] 21

Alternatively, if you wish, with the square brackets, using the [, "variable"] format.

Code

df[, "Day"]

[1] "Mon"   "Tues"  "Wed"   "Thurs" "Fri"   "Sat"   "Sun"

Code

df[1, "Day"]

[1] "Mon"

Code

df[2, "Date"]

[1] 21

Lists

We have already seen the use of double square brackets, [[...]] to refer to a part of a list:

Code

x <- 1:3
y <- list("a", x, df)
y[[1]]

[1] "a"

Code

y[[2]]

[1] 1 2 3

Code

y[[3]]

    Day Date Month Year
1   Mon   20  June 2022
2  Tues   21  June 2022
3   Wed   22  June 2022
4 Thurs   23  June 2022
5   Fri   24  June 2022
6   Sat   25  June 2022
7   Sun   26  June 2022

The extension to this is to be able to refer to a specific element within a part of the list by combining it with the other notation. Some examples are:

Code

y[[1]][1]

[1] "a"

Code

y[[2]][3]

[1] 3

Code

y[[3]]$Day

[1] "Mon"   "Tues"  "Wed"   "Thurs" "Fri"   "Sat"   "Sun"

Code

y[[3]]$Day[1]

[1] "Mon"

Code

y[[3]][2, "Date"]

[1] 21

The way to remember the difference between [[...]] and [...] is that the double square brackets reference a specific part of a list, for example [[3]], the third part; the single square brackets reference a position or element in a vector, such as [4], the fourth. Combining them, [[3]][4] is the 4th element of a vector where that vector forms the 3rd part of a list.

Deleting objects and saving the workspace

My current working directory is,

Code

getwd()

[1] "/Users/ggrjh/Dropbox/github/MandM"

and it contains the following objects:

Code

ls()

[1] "df"       "df2"      "is.prime" "x"        "y"        "z"

Yours will be different. Remember, it can be useful to create a new project for a new collection of work that you are doing in R and then opening that project each time you start R will ensure that the working directory is that of the project.

To delete a specific object, use rm(),

Code

rm(z)

Or, more than one,

Code

rm(df, df2, is.prime)

To save the workspace and all the objects it now contains use the save.image() function.

Code

save.image("workspace1.RData")

To delete all the objects created in your workspace, use

Code

rm(list=ls())

It is a good idea to save a workspace with a new filename before deleting too much from your workspace to allow you to recover it if necessary. Be especially careful if you use rm(list=ls()) as there is no undo function. The best you can do is load the workspace as it was the last time that you saved it.

To (re)load a workspace, use load().

Code

load("workspace1.RData")

Introduction

Functions

Writing a new function

Objects and Classes

Logical

Numeric

Character

Matrix

Data frame

Factors

Lists

Assignments

Manipulating objects

Mathematical operations

Subsets of objects

Vectors

Matrices

Data frames

Lists

Deleting objects and saving the workspace

Further reading