How to perform SQL queries in R

The R language has a package named the sqldf to work with the SQL databases. This package allows the SQL operations to be performed within R code, just as they could be performed in a SQL database. While the sqldf package enables us to execute SQL queries, it’s important to note that these queries are solely conducted on the R data frame. We don’t need a database connection for these queries.

Installing the package

The sqldf package can be installed using the following script:

install.packages("sqldf")

Importing the package

We need to import the sqldf module into our code to perform the SQL functions using R. This is done using the following command:

library(sqldf)

Scope of the sqldf

The sqldf primarily focuses on performing data manipulation queriesQueries that allow the manipulation of the data in the database. rather than data definition queriesQueries that allow the definition and manipulation of the database structure.. Data manipulation queries include operations like the SELECT, JOIN, GROUP BY, and other SQL statements. These operations are used to retrieve, filter, transform, or summarize data. These queries work on the data stored in R data frames.

Example

Let’s look at how to perform basic SQL queries in R:

# Import without showing any warning or messages
options(warn = -1)
suppressMessages(library(sqldf))
# Create a sample dataframe
data <- data.frame(
Name = c("Alice", "Bob", "Charlie"),
Age = c(25, 30, 22),
City = c("New York", "San Francisco", "Los Angeles")
)
# Example 1: Select all columns for people older than 25
result1 <- sqldf("SELECT * FROM data WHERE Age > 25")
# Example 2: Calculate the average age
result2 <- sqldf("SELECT AVG(Age) AS AverageAge FROM data")
# Example 3: Group data by city and calculate the average age for each city
result3 <- sqldf("SELECT City, AVG(Age) AS AverageAge FROM data GROUP BY City")
# Example 4: Sort data by age in descending order
result4 <- sqldf("SELECT * FROM data ORDER BY Age DESC")
# Example 5: Select names that start with 'A'
result5 <- sqldf("SELECT * FROM data WHERE Name LIKE 'A%'")
# Displaying results
cat("People older than 25 \n")
print(result1)
cat("\n\n Average age of the group \n" )
print(result2)
cat("\n\n Average age by city \n")
print(result3)
cat("\n\n Data sorted by age \n")
print(result4)
cat("\n\n Names starting with 'A' \n")
print(result5)

Code explanation

  • Lines 2–3: We import the sqldf package while suppressing any warnings or additional messages for a clean output.

  • Lines 6–10: We define a data frame in R that has attributes the Name, Age and City.

  • Line 13: We write an SQL query to retrieve data instances where the Age is greater than 25.

  • Line 16: We write an SQL query to find the average Age from the data frame.

  • Line 19: We define an SQL query to group data on the City, and find average the Age in each city.

  • Line 22: We write an SQL query to sort data by Age in descending order.

  • Line 25: We write an SQL query to retrieve the Name starting from the alphabet “A.”

  • Lines 28–37: We display the results of all the above queries with a little formatting for better readability.

Free Resources

Copyright ©2025 Educative, Inc. All rights reserved