The R language has a package named the sqldf
to work with the SQL databases. This package allows the SQL operations to be performed within R code, just as they could be performed in a SQL database. While the sqldf
package enables us to execute SQL queries, it’s important to note that these queries are solely conducted on the R data frame. We don’t need a database connection for these queries.
The sqldf
package can be installed using the following script:
install.packages("sqldf")
We need to import the sqldf
module into our code to perform the SQL functions using R. This is done using the following command:
library(sqldf)
sqldf
The sqldf
primarily focuses on performing SELECT
, JOIN
, GROUP BY
, and other SQL statements. These operations are used to retrieve, filter, transform, or summarize data. These queries work on the data stored in R data frames.
Let’s look at how to perform basic SQL queries in R:
# Import without showing any warning or messagesoptions(warn = -1)suppressMessages(library(sqldf))# Create a sample dataframedata <- data.frame(Name = c("Alice", "Bob", "Charlie"),Age = c(25, 30, 22),City = c("New York", "San Francisco", "Los Angeles"))# Example 1: Select all columns for people older than 25result1 <- sqldf("SELECT * FROM data WHERE Age > 25")# Example 2: Calculate the average ageresult2 <- sqldf("SELECT AVG(Age) AS AverageAge FROM data")# Example 3: Group data by city and calculate the average age for each cityresult3 <- sqldf("SELECT City, AVG(Age) AS AverageAge FROM data GROUP BY City")# Example 4: Sort data by age in descending orderresult4 <- sqldf("SELECT * FROM data ORDER BY Age DESC")# Example 5: Select names that start with 'A'result5 <- sqldf("SELECT * FROM data WHERE Name LIKE 'A%'")# Displaying resultscat("People older than 25 \n")print(result1)cat("\n\n Average age of the group \n" )print(result2)cat("\n\n Average age by city \n")print(result3)cat("\n\n Data sorted by age \n")print(result4)cat("\n\n Names starting with 'A' \n")print(result5)
Lines 2–3: We import the sqldf
package while suppressing any warnings or additional messages for a clean output.
Lines 6–10: We define a data frame in R that has attributes the Name
, Age
and City
.
Line 13: We write an SQL query to retrieve data instances where the Age
is greater than 25.
Line 16: We write an SQL query to find the average Age
from the data frame.
Line 19: We define an SQL query to group data on the City
, and find average the Age
in each city.
Line 22: We write an SQL query to sort data by Age
in descending order.
Line 25: We write an SQL query to retrieve the Name
starting from the alphabet “A.”
Lines 28–37: We display the results of all the above queries with a little formatting for better readability.
Free Resources