2.1 Loading Data
R has many different functions for loading data, based on the format of the file. Since data often come in comma separated variables (CSV) format, we’ll use that as an example.
We will load some data on the weight, length, and age of great white sharks (Carcharodon carcharias). It is in a file called pups.csv
, which is a text file containing:
id,weight,length,age,clutch
1,20.3,97,14,1
2,28,104,16,2
3,31.5,106,16,3
4,31.5,108,17,1
5,32.5,109,18,2
etc.
You can download the file from the labs website on GitHub: pups.csv.
To load a csv file like this, there are two basic options.
2.1.2 The read_csv()
Function (in the readr
package)
You can either type this in directly to the R console command line (lower left pane of the Rstudio screen), or save it in a script that can be “sourced” (executed) whenever you like. Scripts are written/saved/sourced in the top left pane of the RStudio screen.
For this method you need to provide the location of the data, and the name you want the object to have when it is loaded into R. For this example, one copy of the data is online at https://mdkarcher.github.io/StatLabs/pups.csv, and we wish to store the data in the variable pups
. We would execute the code,
library(readr)
pups <- read_csv("https://mdkarcher.github.io/StatLabs/pups.csv")
Notice that this is very similar to what we see in the Code Preview pane in the Import Dataset method above.
Alternately, if we only want to use read_csv()
once or twice, we do not have to load the readr
package, but we do have to tell R that read_csv()
is in readr
using the ::
operator. Similarly, if we have a copy of pups.csv
saved on our computer, in a directory R calls the working directory (see next section), we can use the name "pups.csv"
instead of the full URL. Like so,
pups <- readr::read_csv("pups.csv")
2.1.3 The Working Directory
The working directory is the location on your computer where R looks for files and data first. You can run the command getwd()
to find out what the current working directory is. In RStudio, you can also see the current working directory directly under the console tab in the console pane.
To set the working directory, you can run the command setwd()
. In RStudio, you can also set the working directory by navigating somewhere in the Files
tab, clicking More
, and selecting Set As Working Directory
.