Loading External Data
Learn how to pull data from a CSV file into our R environment, which lays the foundation for more complex database pulls.
We'll cover the following...
Without the ability to load external data sources, we need to hardcode our data into R using statements like the ones below:
#Store some survey data in a data frame objectVAR_DataFrame <- data.frame(Q1_Ans = c(1,4,3,5,1,2),Q2_Ans = c(5,3,2,2,5,1),Q3_Ans = c(TRUE,TRUE,FALSE,TRUE,FALSE,FALSE))VAR_DataFrame #Print the resulting data frame
- Lines 2–5: We create a hardcoded data frame, using the
data.frame
function and explicitly coding values into that data frame.
While snippets like these are helpful in particular circumstances, most of the time, we’ll deal with much more extensive, possibly dynamic, datasets, so hardcoding them into our scripts would ruin our efficiency. Most of the time in data science, we’ll pull these larger datasets from other sources—csv
files, databases, and websites. Fortunately, R is well-suited to the task of dynamically loading data.
The read.csv
function
In base-R, the primary function to pull in data from a csv
file is read.csv()
. Say we have a csv
file called MySurveyData.csv
, which we can examine in the code window below.
This data would be frustrating to hardcode into our scripts. And if we re-ran the survey or added new questions to the study, updating our script would be very frustrating. Luckily, we can read the csv
...