2.1 Loading Data

R has many different functions for loading data, based on the format of the file. Since data often come in comma separated variables (CSV) format, we’ll use that as an example.

We will load some data on the weight, length, and age of great white sharks (Carcharodon carcharias). It is in a file called pups.csv, which is a text file containing:

id,weight,length,age,clutch
1,20.3,97,14,1
2,28,104,16,2  
3,31.5,106,16,3
4,31.5,108,17,1
5,32.5,109,18,2
etc.

You can download the file from the labs website on GitHub: pups.csv.

To load a csv file like this, there are two basic options.

2.1.1 The Import Dataset RStudio Button

Click the Import Dataset button in RStudio’s Environment tab in the upper right pane. Copy and paste the link above into the window that appears and click Update, or download the CSV file and click the browse button and navigate to where you saved it. The contents of the file will appear in the preview pane. In the lower left, you can choose what to call the variable where the data will be stored, and in the lower right, you can see a preview of the code that R will execute. Click Import and the pups variable should appear under Data in the Environment pane.

This is easy, but if you want to save a series of commands so that you can execute them again (and this is essential for reproducible research), you should use the next option.

2.1.2 The read_csv() Function (in the readr package)

You can either type this in directly to the R console command line (lower left pane of the Rstudio screen), or save it in a script that can be “sourced” (executed) whenever you like. Scripts are written/saved/sourced in the top left pane of the RStudio screen.

For this method you need to provide the location of the data, and the name you want the object to have when it is loaded into R. For this example, one copy of the data is online at https://mdkarcher.github.io/StatLabs/pups.csv, and we wish to store the data in the variable pups. We would execute the code,

library(readr)
pups <- read_csv("https://mdkarcher.github.io/StatLabs/pups.csv")

Notice that this is very similar to what we see in the Code Preview pane in the Import Dataset method above.

Alternately, if we only want to use read_csv() once or twice, we do not have to load the readr package, but we do have to tell R that read_csv() is in readr using the :: operator. Similarly, if we have a copy of pups.csv saved on our computer, in a directory R calls the working directory (see next section), we can use the name "pups.csv" instead of the full URL. Like so,

pups <- readr::read_csv("pups.csv")

2.1.3 The Working Directory

The working directory is the location on your computer where R looks for files and data first. You can run the command getwd() to find out what the current working directory is. In RStudio, you can also see the current working directory directly under the console tab in the console pane.

To set the working directory, you can run the command setwd(). In RStudio, you can also set the working directory by navigating somewhere in the Files tab, clicking More, and selecting Set As Working Directory.