Data Science in R: From Basics to Machine Learning/

...

Database Connections

Learn to pull database data into R in the tidyverse using DBI, odbc, dbplyr, and other database packages.

We'll cover the following...

DBI
The odbc and RSQLite packages
The dbplyr package
Logging into an external database using dbConnect
Referencing a table using tbl
Querying a table without SQL
Pulling results using collect

Most data science tasks begin with pulling data from some external source, sometimes it’s a csv file, but often it’s something else, e.g. live databases, Microsoft Excel sheets, and websites. Those aren’t the only options, but they’re the most common.

Among those options, the most common is to pull from a live database. Live databases have a few significant advantages that support the data science team:

Typically, data is up to date. There’s often an automatic pipeline that feeds new data into the database.
They can host a vast number of records, depending on the type of database.
The database engine can manipulate data without needing to load it into our computer’s memory first.

Press + to interact

In particular, manipulating data outside of memory is critical for data scientists. When we leverage csv files, our first step is:

VAR_MyData <- read_csv(“Mycsvfile.csv”)

This loads the entire csv file into memory. That’s not a big deal when the csv file is a few thousand rows long, but once we get into huge datasets, the kinds that we’re often going to be dealing with, it becomes very problematic. We quickly hit our computer’s limitations when manipulating the data.

Fortunately, the tidyverse provides a convenient mechanism for connecting to a database and performing data manipulations at the database end rather than in-memory. That is, we can interact with a database directly and pare down the data to something manageable before transferring data back to our local machine. So, to interact with a database in R, we’ll need to install the following packages:

Press + to interact

Why R?

R Fundamentals

R Fundamentals Exercises

Readable Coding with tidyverse

Tidyverse Exercises

Importing More Data Sources

Data Visualization with ggplot2

Best Practices for Data Scientists

Statistical Analysis and Machine Learning with tidymodels

Exploring tidymodels through Exercises

Useful Libraries for Data Science

Git Integration

Getting The Most Out of R

Appendix

Credit Card Fraud Detection using the R Language

Database Connections

`DBI`