Loading External Data

Learn how to pull data from a CSV file into our R environment, which lays the foundation for more complex database pulls.

Without the ability to load external data sources, we need to hardcode our data into R using statements like the ones below:

Press + to interact
#Store some survey data in a data frame object
VAR_DataFrame <- data.frame(
Q1_Ans = c(1,4,3,5,1,2),
Q2_Ans = c(5,3,2,2,5,1),
Q3_Ans = c(TRUE,TRUE,FALSE,TRUE,FALSE,FALSE))
VAR_DataFrame #Print the resulting data frame
  • Lines 2–5: We create a hardcoded data frame, using the data.frame function and explicitly coding values into that data frame.

While snippets like these are helpful in particular circumstances, most of the time, we’ll deal with much more extensive, possibly dynamic, datasets, so hardcoding them into our scripts would ruin our efficiency. Most of the time in data science, we’ll pull these larger datasets from other sources—csv files, databases, and websites. Fortunately, R is well-suited to the task of dynamically loading data.

Press + to interact
Bringing external data into our scripts
Bringing external data into our scripts

The read.csv function

In base-R, the primary function to pull in data from a csv file is read.csv(). Say we have a csv file called MySurveyData.csv, which we can examine in the code window below.

This data would be frustrating to hardcode into our scripts. And if we re-ran the survey or added new questions to the study, updating our script would be very frustrating. Luckily, we can read the csv ...