Code
round(10.32, digits = 0)
[1] 10
A Brief Guide
Base R is what you download from CRAN. You might think of it as classic R. A short introduction of ‘the basics’ is provided below. For a fuller introduction, see the software manual, An Introduction to R. It is worth reading even if you end-up regularly using the Tidyverse variant of R described in the next session as there are some tasks that are (in my opinion) easier to do using Base R or by mixing it up a little.
R is a functional programming language where functions ‘do things’ to objects. What they do is dependent upon the class/type and attributes of the objects that go into the function, and also on the arguments of the function.
For example, try typing the following into the R Console, which is the bottom left panel of R Studio. Type it alongside the prompt symbol, >
then hit Enter
/Return
.
This calls the function round()
, which is operating on the numeric object, 10.32
. The argument digits
specifies the number of digits to round to. It is set to zero in the example above.
Because digits = 0
is the default value for the function, we could just write
and obtain the same answer as before. I know that digits = 0
is the default value because, as I type the name of the function into the R Console, I see the arguments of the function and any default values appear.
We can also find out more about the function, including some examples of its use, by opening its help file.
Should we wish to round 10.32 to one digit then we are no longer rounding to the default of zero decimal places and must therefore specify the argument explicitly (the default is no longer what we want).
The following also works because it preserves the order of the arguments in the function.
In other words, if we do not specifically state that x = 10.32
(where x
is a numeric vector; here, 10.32) and digits = 1
then they will be taken as the first and second arguments of the function. This requires care to make sure they genuinely are in the right order. If you aren’t certain, then define the arguments explicitly because they will then work out of order.
In the examples above, both the input to and output from the function are a numeric
vector of type double
. The input is:
The output is:
[1] "numeric"
[1] "double"
Note how a function can be wrapped within a function, as in the example above: class(round(...))
.
At the moment we are using x = 10.32
, which is a numeric vector of length
1,
However, the round()
function can operate on numeric vectors of other lengths too.
Here the combine function, c
is used to create a vector of length 7, which is the input into round()
. The output is of length 7 too.
There are lots of functions for R and I often forget what I need. Fortunately, there is a large user community too and so a quick web search often helps me quickly find what I need. Don’t be afraid to do a Google search for what you need.
We can write our own functions. The following will take a number and report whether it is a prime number or not.
Let’s try it.
[1] TRUE
[1] FALSE
[1] TRUE
[1] NA
There is quite a lot to unpack about the function. It is not all immediately relevant but it is instructive to have an overview of what it is doing. First of all the function takes the form
where x
is the input into the function in much the same way that x
is the number to be rounded in round(x = ...)
. It is a ‘place holder’ for the input into the function.
Statements such as if(x == 2)
are logical statements: if(...)
is true then do whatever follows. If what is to be done spans over multiple lines, they are enclosed by ‘curly brackets’, {...}
.
The statement if(x < 2 | x %% floor(x) != 0)
in the function is also a logical statement with the inclusion of an or
statement, denoted by |
. What it is checking is whether x < 2
or if x
is a fraction. Had we needed to have both conditions to be met, then an and
statement would be used, denoted by &
instead of |
. Note that !
means not, so !=
tests for not equal to and is the opposite of ==
, which tests for equality.
Where it says, 2:(x-1)
, this is equivalent to the function, seq(from = 2, to = (x-1), by = 1)
. It generates a sequence of integer numbers from \(2\) to \((x-1)\).
[1] 2 3 4 5 6 7 8 9
[1] 2 3 4 5 6 7 8 9
ifelse()
is another logical statement. It takes the form, ifelse(condition, a, b)
: if the condition
is met then do a
, else do b
. In the prime number function it is checking whether dividing \(x\) by any of the numbers from \(2\) to \((x-1)\) generates a whole number.
Finally, the function return()
returns an output from the function; here, a logical vector of length 1 that is TRUE
, FALSE
or NA
dependent upon whether \(x\) is or is not a prime number, or if it is not a whole number above \(1\).
Note that in newer versions of R, functions can also take the form,
Therefore the following is exactly equivalent to before.
Our function that checks for a prime number is stored in the object is.prime
.
There are other classes of object in R. Some of the most common are listed below.
The output from the is.prime()
function is an example of an object of class logical because the answer is TRUE
or FALSE
(or NA
, not applicable).
Some other examples:
We have already seen that some objects are numeric
.
This presently is of type double
; i.e. it allows for decimal places even where they are not required.
but it could be converted to class integer
(a whole number with no decimal places).
Other classes include character
. Note the difference between the length()
of a character vector and the number of characters, nchar()
, that any element of that vector contains.
[1] "Mapping and Modelling in R"
[1] 1
[1] 26
[1] "character"
[1] "Mapping and Modelling in R with Richard Harris"
[1] 1
[1] 46
[1] "character"
[1] "Mapping" "and" "Modelling" "in" "R"
[1] 5
[1] 7 3 9 2 1
[1] "character"
As the name suggests, print
is a function that prints its contents to screen. Often it can be omitted in favour of referencing the object directly. For instance, in the example above, rather than typing print(z)
it would be sufficient just to type z
. Just occasionally though you will find that an object does not print as you intended when the function is omitted. If this happens, try putting print
back in.
An example of a matrix
is
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 9
[1] 3
[1] 3
[1] "matrix" "array"
Here the argument byrow
is changed from its default value of FALSE
to be TRUE
:
This result is equivalent to the transpose of the original matrix.
A data.frame
is a table of data, such as,
Day Date Month Year
1 Mon 20 June 2022
2 Tues 21 June 2022
3 Wed 22 June 2022
4 Thurs 23 June 2022
5 Fri 24 June 2022
6 Sat 25 June 2022
7 Sun 26 June 2022
[1] "data.frame"
[1] 4
[1] 7
[1] 4
[1] "Day" "Date" "Month" "Year"
Note that the length of each column should be equal in the specification of the data frame. The following will generate an error because the Date column is now too short. You might wonder why the Month and Year columns were fine previously when, in fact, they were give only one value, whereas there are 7 days and 7 dates. It is because R recycled them the requisite number of times (i.e. it gave all the rows the same value for Month and Year – it recycled June and 2022 seven times). That option isn’t available for the example below where there are 7 days but 6 dates.
Earlier versions of R would, by default, convert character fields in a data frame into factors. The equivalent operation now is,
Treating character fields as factors was clever but frustrating if you didn’t realise it was happening and wanted the characters to remains as characters. The difference is not immediately obvious,
Day Date Month Year
1 Mon 20 June 2022
2 Tues 21 June 2022
Day Date Month Year
1 Mon 20 June 2022
2 Tues 21 June 2022
These appear to be the same but differences begin to be apparent in the following:
[1] "Mon" "Tues" "Wed" "Thurs" "Fri" "Sat" "Sun"
[1] Mon Tues Wed Thurs Fri Sat Sun
Levels: Fri Mon Sat Sun Thurs Tues Wed
[1] "June" "June" "June" "June" "June" "June" "June"
[1] June June June June June June June
Levels: June
Basically, a factor
is a categorical variable: it encodes which groups or categories (which levels
) are to be found in the variable. Knowing this, it is possible to count the number of each group, as in,
Day Date Month Year
Fri :1 Min. :20.0 June:7 Min. :2022
Mon :1 1st Qu.:21.5 1st Qu.:2022
Sat :1 Median :23.0 Median :2022
Sun :1 Mean :23.0 Mean :2022
Thurs:1 3rd Qu.:24.5 3rd Qu.:2022
Tues :1 Max. :26.0 Max. :2022
Wed :1
but not
Day Date Month Year
Length:7 Min. :20.0 Length:7 Min. :2022
Class :character 1st Qu.:21.5 Class :character 1st Qu.:2022
Mode :character Median :23.0 Mode :character Median :2022
Mean :23.0 Mean :2022
3rd Qu.:24.5 3rd Qu.:2022
Max. :26.0 Max. :2022
Factors can be useful but do not always behave as you might anticipate. For example,
is different from,
These days the defult is stringsAsFactors = FALSE
, which is better when using functions such as read.csv()
to read a .csv file into a data.frame
in R.
A list
is a more flexible class that can hold together other types of object. Without a list, the following only works because the 1:3
are coerced from numbers in x
to characters in y
– note the " "
that appear around them, which shows they are now text.
[1] "integer"
[1] "a" "1" "2" "3"
[1] "character"
On the other hand,
creates a ragged list of two parts:
The first part has the character "a"
in it.
The second has the numbers 1 to 3 in it.
Note that the length of the list is the length of its parts. Presently it is 2 but the following example has a length of three.
[[1]]
[1] "a"
[[2]]
[1] 1 2 3
[[3]]
Day Date Month Year
1 Mon 20 June 2022
2 Tues 21 June 2022
3 Wed 22 June 2022
4 Thurs 23 June 2022
5 Fri 24 June 2022
6 Sat 25 June 2022
7 Sun 26 June 2022
[1] 3
This should not be confused with the length of any one part.
Throughout this document I have used the assignment term <-
to store the output of a function, as in x <- as.integer(1:3)
and y <- list("a", x, df)
, and so forth. The <-
is used to assign the result of a function to an object. You can, if you prefer use =
. For example, all the following make the same assignment, which is to give x
the value of 1.
Personally, I avoid using =
as an assignment for the following reasons.
First, not to confuse assignments with arguments,
Second, to not confuse assignments with logical statements,
Third – but this is pedantic – to avoid the following sort of situation which makes no sense mathematically…
… but does in terms of what it really means:
Which you use is a matter of personal preference and, of course, =
has one less character than <-
to worry about. However, this course is written with,
<-
(or ->
) is as assignment, as in x <- 1
;
=
is the value of an argument, as in round(x, digits = 1)
; and
==
is a logical test for equality, as in x == y
.
It is important to remember that R is case sensitive. An object called x
is different from one called X
; y
is not the same as Y
and so forth.
In addition to passing objects to functions such as…
[1] 50
[1] 5050
Min. 1st Qu. Median Mean 3rd Qu. Max.
0 25 50 50 75 100
[1] 50
0% 25% 50% 75% 100%
0 25 50 75 100
[1] 0.000000 1.000000 1.414214 1.732051 2.000000 2.236068
[1] 9025 9216 9409 9604 9801 10000
[1] 29.30017
…there are other ways we may wish to interact with objects.
Mathematical operations generally operate on a pairwise basis between corresponding elements in a vector. For example,
[1] 4
[1] 7 9 11 13 15
[1] 6 14 24 36 50
[1] 0.1666667 0.2857143 0.3750000 0.4444444 0.5000000
If one vector is shorter that the other, values will be recycled. In the following example the results are \(1\times6\), \(2\times7\), \(3\times8\), \(4\times9\) and then \(5\times6\) as y
is recycled.
If x
is a vector
then x[n]
is the nth element in the vector (the nth position, the nth item). To illustrate,
[1] "a"
[1] "c"
[1] "a" "c" "e"
[1] "f"
The notation -n
can be used to exclude elements.
If x
is a matrix
then x[i, j]
is the value of the ith row of the jth column:
[,1] [,2]
[1,] 1 6
[2,] 2 7
[3,] 3 8
[4,] 4 9
[5,] 5 10
[1] 1
[1] 2
[1] 8 10
[1] 10
All of the values in the ith row can be selected using the form x[i, ]
[1] 1 6
[1] 3 8
[,1] [,2]
[1,] 1 6
[2,] 5 10
[,1] [,2]
[1,] 2 7
[2,] 4 9
[3,] 5 10
Similarly, all of the values in the jth column can be selected using the form x[, j]
Data frames are not unlike a matrix.
[1] "Mon" "Tues" "Wed" "Thurs" "Fri" "Sat" "Sun"
[1] "Mon"
[1] 21
However, you can also reference the variable name directly, through the x$variable
style notation,
[1] "Mon" "Tues" "Wed" "Thurs" "Fri" "Sat" "Sun"
[1] "Mon"
[1] 21
Alternatively, if you wish, with the square brackets, using the [, "variable"]
format.
We have already seen the use of double square brackets, [[...]]
to refer to a part of a list:
[1] "a"
[1] 1 2 3
Day Date Month Year
1 Mon 20 June 2022
2 Tues 21 June 2022
3 Wed 22 June 2022
4 Thurs 23 June 2022
5 Fri 24 June 2022
6 Sat 25 June 2022
7 Sun 26 June 2022
The extension to this is to be able to refer to a specific element within a part of the list by combining it with the other notation. Some examples are:
[1] "a"
[1] 3
[1] "Mon" "Tues" "Wed" "Thurs" "Fri" "Sat" "Sun"
[1] "Mon"
[1] 21
The way to remember the difference between [[...]]
and [...]
is that the double square brackets reference a specific part of a list, for example [[3]]
, the third part; the single square brackets reference a position or element in a vector, such as [4]
, the fourth. Combining them, [[3]][4]
is the 4th element of a vector where that vector forms the 3rd part of a list.
My current working directory is,
and it contains the following objects:
Yours will be different. Remember, it can be useful to create a new project for a new collection of work that you are doing in R and then opening that project each time you start R will ensure that the working directory is that of the project.
To delete a specific object, use rm()
,
Or, more than one,
To save the workspace and all the objects it now contains use the save.image()
function.
To delete all the objects created in your workspace, use
It is a good idea to save a workspace with a new filename before deleting too much from your workspace to allow you to recover it if necessary. Be especially careful if you use rm(list=ls())
as there is no undo function. The best you can do is load the workspace as it was the last time that you saved it.
To (re)load a workspace, use load()
.
This short introduction to base R has really only scratched the surface. There are many books about R that provide a lot more detail but, to remind you, the manual that comes with the software is worth reading and probably the best place to start – An Introduction to R. It is thorough but also relatively short.
Don’t worry if not everything makes sense at this stage. The best way to learn R is to put it into practice and that is what we shall be doing in later sessions.
---
title: "Base R"
subtitle: "A Brief Guide"
execute:
warning: false
message: false
---
![](Rlogo.svg){width=100}
## Introduction
Base R is what you download from [CRAN](https://cran.r-project.org/){target="_blank"}. You might think of it as classic R. A short introduction of 'the basics' is provided below. For a fuller introduction, see the software manual, [An Introduction to R](https://cran.r-project.org/manuals.html){target="_blank"}. It is worth reading even if you end-up regularly using the Tidyverse variant of R described in the next session as there are some tasks that are (in my opinion) easier to do using Base R or by mixing it up a little.
## Functions
R is a functional programming language where functions 'do things' to objects. What they do is dependent upon the class/type and attributes of the objects that go into the function, and also on the arguments of the function.
For example, try typing the following into the R Console, which is the bottom left panel of R Studio. Type it alongside the prompt symbol, `>` then hit `Enter`/`Return`.
```{r}
round(10.32, digits = 0)
```
This calls the function `round()`, which is operating on the numeric object, `10.32`. The argument `digits` specifies the number of digits to round to. It is set to zero in the example above.
Because `digits = 0` is the default value for the function, we could just write
```{r}
round(10.32)
```
and obtain the same answer as before. I know that `digits = 0` is the default value because, as I type the name of the function into the R Console, I see the arguments of the function and any default values appear.
![](round.png)
We can also find out more about the function, including some examples of its use, by opening its help file.
```{r}
#| eval: false
?round
```
</br>
Should we wish to round 10.32 to one digit then we are no longer rounding to the default of zero decimal places and must therefore specify the argument explicitly (the default is no longer what we want).
```{r}
round(10.32, digits = 1)
```
The following also works because it preserves the order of the arguments in the function.
```{r}
round(10.32, 1)
```
In other words, if we do not specifically state that `x = 10.32` (where `x` is a numeric vector; here, 10.32) and `digits = 1` then they will be taken as the first and second arguments of the function. This requires care to make sure they genuinely are in the right order. If you aren't certain, then define the arguments explicitly because they will then work out of order.
```{r}
round(digits = 1, x = 10.32)
```
</br>
In the examples above, both the input to and output from the function are a `numeric` vector of type `double`. The input is:
```{r}
class(10.32)
typeof(10.32)
```
The output is:
```{r}
class(round(10.32, digits = 1))
typeof(round(10.32, digits = 1))
```
Note how a function can be wrapped within a function, as in the example above: `class(round(...))`.
</br>
At the moment we are using `x = 10.32`, which is a numeric vector of `length` 1,
```{r}
length(10.32)
```
However, the `round()` function can operate on numeric vectors of other lengths too.
```{r}
round(c(1.1, 2.2, 3.3, 4.4, 5.5, 6.6, 7.7))
```
Here the combine function, `c` is used to create a vector of length 7, which is the input into `round()`. The output is of length 7 too.
```{r}
length(round(c(1.1, 2.2, 3.3, 4.4, 5.5, 6.6, 7.7)))
```
![](hazard.gif){width=75}
<font size = 3>There are **lots** of functions for R and I often forget what I need. Fortunately, there is a large user community too and so a quick web search often helps me quickly find what I need. Don't be afraid to do a Google search for what you need.</font>
### Writing a new function
We can write our own functions. The following will take a number and report whether it is a prime number or not.
```{r}
is.prime <- function(x) {
if(x == 2) return(TRUE)
if(x < 2 | x %% floor(x) != 0) {
warning("Please enter an integer number above 1")
return(NA)
}
y <- 2:(x-1)
ifelse(all(x%%y > 0), return(TRUE), return(FALSE))
}
```
Let's try it.
```{r}
is.prime(2)
is.prime(10)
is.prime(13)
is.prime(3.3)
```
</br>
There is quite a lot to unpack about the function. It is not all immediately relevant but it is instructive to have an overview of what it is doing. First of all the function takes the form
````{verbatim}
f <- function(x) {
...
}
````
where `x` is the input into the function in much the same way that `x` is the number to be rounded in `round(x = ...)`. It is a 'place holder' for the input into the function.
Statements such as `if(x == 2)` are logical statements: `if(...)` is true then do whatever follows. If what is to be done spans over multiple lines, they are enclosed by 'curly brackets', `{...}`.
The statement `if(x < 2 | x %% floor(x) != 0)` in the function is also a logical statement with the inclusion of an `or` statement, denoted by `|`. What it is checking is whether `x < 2` **or** if `x` is a fraction. Had we needed to have both conditions to be met, then an `and` statement would be used, denoted by `&` instead of `|`. Note that `!` means not, so `!=` tests for not equal to and is the opposite of `==`, which tests for equality.
Where it says, `2:(x-1)`, this is equivalent to the function, `seq(from = 2, to = (x-1), by = 1)`. It generates a sequence of integer numbers from $2$ to $(x-1)$.
```{r}
x <- 10
2 : (x - 1)
seq(from = 2, to = (x-1), by = 1)
```
`ifelse()` is another logical statement. It takes the form, `ifelse(condition, a, b)`: if the `condition` is met then do `a`, else do `b`. In the prime number function it is checking whether dividing $x$ by any of the numbers from $2$ to $(x-1)$ generates a whole number.
Finally, the function `return()` returns an output from the function; here, a logical vector of length 1 that is `TRUE`, `FALSE` or `NA` dependent upon whether $x$ is or is not a prime number, or if it is not a whole number above $1$.
Note that in newer versions of R, functions can also take the form,
````{verbatim}
f <- \(x) {
...
}
````
Therefore the following is exactly equivalent to before.
```{r}
is.prime <- \(x) {
if(x == 2) return(TRUE)
if(x < 2 | x %% floor(x) != 0) {
warning("Please enter an integer number above 1")
return(NA)
}
y <- 2:(x-1)
ifelse(all(x%%y > 0), return(TRUE), return(FALSE))
}
```
## Objects and Classes
Our function that checks for a prime number is stored in the object `is.prime`.
```{r}
class(is.prime)
```
There are other classes of object in R. Some of the most common are listed below.
### Logical
The output from the `is.prime()` function is an example of an object of class logical because the answer is `TRUE` or `FALSE` (or `NA`, not applicable).
```{r}
x <- is.prime(10)
print(x)
class(x)
```
Some other examples:
```{r}
y <- 10 > 5
print(y)
class(y)
z <- 2 == 5 # is 2 equal to 5?
print(z)
```
### Numeric
We have already seen that some objects are `numeric`.
```{r}
x <- mean(0:100)
print(x)
class(x)
```
This presently is of type `double`; i.e. it allows for decimal places even where they are not required.
```{r}
typeof(x)
```
but it could be converted to class `integer` (a whole number with no decimal places).
```{r}
x <- as.integer(x)
class(x)
```
### Character
Other classes include `character`. Note the difference between the `length()` of a character vector and the number of characters, `nchar()`, that any element of that vector contains.
```{r}
x <- "Mapping and Modelling in R"
print(x)
length(x) # There is only one element in this vector
nchar(x) # And that element contains 26 letters
class(x)
y <- paste(x, "with Richard Harris")
print(y)
length(y) # There is still only one element
nchar(y) # But now it contains more letters
class(y)
z <- unlist(strsplit(x, " "))
print(z)
length(z) # The initial vectors has been split into 5 parts
nchar(z)
class(z)
```
![](hazard.gif){width=75}
<font size = 3>As the name suggests, `print` is a function that prints its contents to screen. Often it can be omitted in favour of referencing the object directly. For instance, in the example above, rather than typing `print(z)` it would be sufficient just to type `z`. Just occasionally though you will find that an object does not print as you intended when the function is omitted. If this happens, try putting `print` back in.</font>
### Matrix
An example of a `matrix` is
```{r}
x <- matrix(1:9, ncol = 3)
x
ncol(x) # Number of columns
nrow(x) # Number of rows
class(x)
```
Here the argument `byrow` is changed from its default value of `FALSE` to be `TRUE`:
```{r}
y <- matrix(1:9, ncol = 3, byrow = TRUE)
```
This result is equivalent to the transpose of the original matrix.
```{r}
y
t(x)
```
### Data frame
A `data.frame` is a table of data, such as,
```{r}
df <- data.frame(Day = c("Mon", "Tues", "Wed", "Thurs", "Fri", "Sat", "Sun"),
Date = 20:26,
Month = "June",
Year = 2022)
df
class(df)
ncol(df) # Number of columns
nrow(df) # Number of rows
length(df) # The length is also the number of columns
names(df) # The names of the variables in the data frame
```
Note that the length of each column should be equal in the specification of the data frame. The following will generate an error because the Date column is now too short. You might wonder why the Month and Year columns were fine previously when, in fact, they were give only one value, whereas there are 7 days and 7 dates. It is because R recycled them the requisite number of times (i.e. it gave all the rows the same value for Month and Year -- it recycled June and 2022 seven times). That option isn't available for the example below where there are 7 days but 6 dates.
````{verbatim}
# This will generate an error
df <- data.frame(Day = c("Mon", "Tues", "Wed", "Thurs", "Fri", "Sat", "Sun"),
Date = 20:25,
Month = "June",
Year = 2022)
````
### Factors
Earlier versions of R would, by default, convert character fields in a data frame into factors. The equivalent operation now is,
```{r}
df2 <- data.frame(Day = c("Mon", "Tues", "Wed", "Thurs", "Fri", "Sat", "Sun"),
Date = 20:26,
Month = "June",
Year = 2022, stringsAsFactors = TRUE)
```
Treating character fields as factors was clever but frustrating if you didn't realise it was happening and wanted the characters to remains as characters. The difference is not immediately obvious,
```{r}
head(df, n= 2) # with stringsAsFactors = FALSE (the current default)
head(df2, n = 2) # with stringsAsFactors = TRUE (the historic default)
```
These appear to be the same but differences begin to be apparent in the following:
```{r}
df$Day
df2$Day
df$Month
df2$Month
```
Basically, a `factor` is a categorical variable: it encodes which groups or categories (which `levels`) are to be found in the variable. Knowing this, it is possible to count the number of each group, as in,
```{r}
summary(df2)
```
but not
```{r}
summary(df)
```
</br>
Factors can be useful but do not always behave as you might anticipate. For example,
```{r}
x <- c("2021", "2022")
as.numeric(x)
```
is different from,
```{r}
x <- factor(c("2021", "2022"))
as.numeric(x)
```
These days the defult is `stringsAsFactors = FALSE`, which is better when using functions such as `read.csv()` to read a .csv file into a `data.frame` in R.
### Lists
A `list` is a more flexible class that can hold together other types of object. Without a list, the following only works because the `1:3` are coerced from numbers in `x` to characters in `y` -- note the `" "` that appear around them, which shows they are now text.
```{r}
x <- as.integer(1:3)
class(x)
y <- c("a", x)
y
class(y)
```
On the other hand,
```{r}
y <- list("a", x)
```
creates a ragged list of two parts:
```{r}
class(y)
y
```
The first part has the character `"a"` in it.
```{r}
y[[1]]
class(y[[1]])
```
The second has the numbers 1 to 3 in it.
```{r}
y[[2]]
class(y[[2]])
```
Note that the length of the list is the length of its parts. Presently it is 2 but the following example has a length of three.
```{r}
y <- list("a", x, df)
y
length(y)
```
</br>
This should not be confused with the length of any one part.
```{r}
length(y[[1]])
length(y[[2]])
length(y[[3]])
```
## Assignments
Throughout this document I have used the assignment term `<-` to store the output of a function, as in `x <- as.integer(1:3)` and `y <- list("a", x, df)`, and so forth. The `<-` is used to assign the result of a function to an object. You can, if you prefer use `=`. For example, all the following make the same assignment, which is to give `x` the value of 1.
```{r}
x <- 1
x = 1
1 -> x
```
Personally, I avoid using `=` as an assignment for the following reasons.
</br>
First, not to confuse assignments with arguments,
```{r}
x <- round(10.32, digits = 1) # I think this is a bit clearer
x = round(10.32, digits = 1) # and this a bit less so
```
Second, to not confuse assignments with logical statements,
```{r}
x <- 1
y <- 2
z <- x == y # Again, this is a bit clearer
z = x == y # and this not so much
```
Third -- but this is pedantic -- to avoid the following sort of situation which makes no sense mathematically...
```{r}
x = 1
y = 2
x = y
```
... but does in terms of what it really means:
```{r}
x <- 1
y <- 2
x <- y # Assign the value of y to x, overwriting its previous value
```
</br>
Which you use is a matter of personal preference and, of course, `=` has one less character than `<-` to worry about. However, this course is written with,
`<-` (or `->`) is as assignment, as in `x <- 1`;
`=` is the value of an argument, as in `round(x, digits = 1)`; and
`==` is a logical test for equality, as in `x == y`.
![](hazard.gif){width=75}
<font size = 3>It is important to remember that R is **case sensitive**. An object called `x` is different from one called `X`; `y` is not the same as `Y` and so forth.</font>
## Manipulating objects
In addition to passing objects to functions such as...
```{r}
x <- 0:100
mean(x)
sum(x)
summary(x)
median(x)
quantile(x, probs = c(0, 0.25, 0.5, 0.75, 1))
head(sqrt(x)) # The square roots of the first of x
tail(x^2) # The square roots of the last of x
sd(x) # The standard deviation of x
```
...there are other ways we may wish to interact with objects.
### Mathematical operations
Mathematical operations generally operate on a pairwise basis between corresponding elements in a vector. For example,
```{r}
x <- 1
y <- 3
x + y
x <- 1:5
y <- 6:10
x + y
x * y # Multiplication
x / y # Divisions
```
If one vector is shorter that the other, values will be recycled. In the following example the results are $1\times6$, $2\times7$, $3\times8$, $4\times9$ and *then* $5\times6$ as `y` is recycled.
```{r}
x <- 1:5 # This is a vector of length 5
y <- 6:9 # This is a vector of length 4
x * y # A vector of length 5 but some of y is recycled
```
### Subsets of objects
#### Vectors
If `x` is a `vector` then `x[n]` is the nth element in the vector (the nth position, the nth item). To illustrate,
```{r}
x <- c("a", "b", "c", "d", "e", "f")
x[1]
x[3]
x[c(1, 3, 5)]
x[length(x)]
```
The notation `-n` can be used to exclude elements.
```{r}
x[-3] # All of x except the 3rd element
x[c(-1, -3, -5)] # x without the 1st, 3rd and 5th elements
```
#### Matrices
If `x` is a `matrix` then `x[i, j]` is the value of the ith row of the jth column:
```{r}
x <- matrix(1:10, ncol = 2)
x
x[1, 1] # row 1, column 1
x[2, 1] # row 2, column 1
x[c(3, 5), 2] # rows 3 and 5 of column 2
x[nrow(x), ncol(x)] # the final entry in the matrix
```
</br>
All of the values in the ith row can be selected using the form `x[i, ]`
```{r}
x[1, ] # row 1
x[3, ] # row 3
x[c(1, 5), ] # rows 1 and 5
x[c(-1, -3), ] # All except the 1st and 3rd rows
```
Similarly, all of the values in the jth column can be selected using the form `x[, j]`
```{r}
x[ ,1] # column 1
x[ ,2] # column 2
x[ , 1:2] # columns 1 and 2
x[-3 , 1:2] # columns 1 and 2 except row 3
```
#### Data frames
Data frames are not unlike a matrix.
```{r}
df <- data.frame(Day = c("Mon", "Tues", "Wed", "Thurs", "Fri", "Sat", "Sun"),
Date = 20:26,
Month = "June",
Year = 2022)
df[, 1] # The first column
df[1, 1] # The first row of the first column (Day)
df[2, 2] # The second row of the second column (Date)
```
However, you can also reference the variable name directly, through the `x$variable` style notation,
```{r}
df$Day
df$Day[1]
df$Date[2]
```
Alternatively, if you wish, with the square brackets, using the `[, "variable"]` format.
```{r}
df[, "Day"]
df[1, "Day"]
df[2, "Date"]
```
#### Lists
We have already seen the use of **double** square brackets, `[[...]]` to refer to a part of a list:
```{r}
x <- 1:3
y <- list("a", x, df)
y[[1]]
y[[2]]
y[[3]]
```
The extension to this is to be able to refer to a specific element within a part of the list by combining it with the other notation. Some examples are:
```{r}
y[[1]][1]
y[[2]][3]
y[[3]]$Day
y[[3]]$Day[1]
y[[3]][2, "Date"]
```
![](hazard.gif){width=75}
<font size = 3>The way to remember the difference between `[[...]]` and `[...]` is that the double square brackets reference a specific part of a list, for example `[[3]]`, the third part; the single square brackets reference a position or element in a vector, such as `[4]`, the fourth. Combining them, `[[3]][4]` is the 4th element of a vector where that vector forms the 3rd part of a list.</font>
## Deleting objects and saving the workspace
My current working directory is,
```{r}
getwd()
```
and it contains the following objects:
```{r}
ls()
```
Yours will be different. Remember, it can be useful to create a new project for a new collection of work that you are doing in R and then opening that project each time you start R will ensure that the working directory is that of the project.
To delete a specific object, use `rm()`,
```{r}
rm(z)
```
Or, more than one,
```{r}
rm(df, df2, is.prime)
```
To save the workspace and all the objects it now contains use the `save.image()` function.
```{r}
save.image("workspace1.RData")
```
To delete all the objects created in your workspace, use
```{r}
rm(list=ls())
```
![](hazard.gif){width=75}
<font size = 3>It is a good idea to save a workspace with a new filename before deleting too much from your workspace to allow you to recover it if necessary. **Be especially careful** if you use `rm(list=ls())` as there is **no undo function**. The best you can do is load the workspace as it was the last time that you saved it.</font>
To (re)load a workspace, use `load()`.
```{r}
load("workspace1.RData")
```
## Further reading
This short introduction to base R has really only scratched the surface. There are many books about R that provide a lot more detail but, to remind you, the manual that comes with the software is worth reading and probably the best place to start -- [An Introduction to R](https://cran.r-project.org/manuals.html){target="_blank"}. It is thorough but also relatively short.
Don't worry if not everything makes sense at this stage. The best way to learn R is to put it into practice and that is what we shall be doing in later sessions.