How to import data from Excel to R using tidyxl

In R, there’s a powerful package called tidyxl. It’s useful for working with Excel files and provides a seamless approach to importing data into R for analysis. In this Answer, we’ll uncover the process of importing data from Excel using the tidyxl package.

Install the tidyxl package

First, we must install the required tidyxl package in our R environment. We can install it using the following command:

install.packages("tidyxl")

Load the tidyxl package

Once we’ve installed the required tidyxl package, we’ve to load it into our R session by running the following command:

library(tidyxl)

Import data from an Excel file

Now that we’ve loaded the tidyxl package in our R session, we can import data from the given Excel file—from an entire Excel workbook or a specific worksheet.

Import data from an entire Excel workbook

We can import an entire Excel workbook using the xlsx_cells() function. The basic syntax is as follows:

data <- xlsx_cells("path/to/excel_file.xlsx")

Note: Replace "path/to/excel_file.xlsx" with the actual file path of your Excel workbook.

Import data from a specific Excel worksheet

If our Excel workbook contains multiple worksheets and we want to import a specific one, we can specify the sheet name using the sheet argument. For example:

data <- xlsx_cells("path/to/excel_file.xlsx", sheet="Sheet1")

Note: Replace "Sheet1" with the worksheet name according to the needs of your import.

Once we’ve successfully imported the data from Excel, it’ll be stored in a data frame in R with information about each cell, including values, formulas, and cell formatting. We can perform various operations just like any other data frame. For instance, to take a quick look at the imported data, we can use the head() function to view the first few rows as follows:

head(data)

Let’s look at an example of how to import data from an Excel file using the tidyxl package.

Example

Here, we have an Excel file named duncan_data.xlsx containing data on 45 occupations in the USA. We want to load this data into an R data frame and extract values from a specific sheet in a particular column.

# Loading the tidyxl package
library(tidyxl)

# Specifying the path to the Excel file
excel_file <- "/educative/duncan_data.xlsx"

# Reading data from an Excel workbook into a data frame
duncan_data <- xlsx_cells(excel_file)

# Specifying the sheet in the Excel file
sheet_name <- "Sheet1"

# Extracting the values from first column
column_values <- duncan_data[duncan_data$sheet == sheet_name & duncan_data$col == 1, ]

# Viewing the extracted data
head(column_values)
Importing values from a specific Excel sheet in a particular column

Explanation

In the example:

  • Line 2: We’ll load the tidyxl package, making its functions and capabilities available in our R environment.

  • Line 5: We’ll specify the file path to the Excel file (i.e., "/educative/duncan_data.xlsx") we want to work with.

  • Line 8: Read the data from the Excel workbook specified in excel_file and store it in the duncan_data data frame. The xlsx_cells() function extracts cell-level information from the Excel file, such as cell values, formulas, formatting, etc.

  • Line 11: Specify the worksheet’s name within the Excel file we want to work with. In this case, we’re setting sheet_name to "Sheet1".

  • Line 14: Extract values from duncan_data based on two conditions:

    • duncan_data$sheet == sheet_name checks if the sheet name matches "Sheet1"

    • duncan_data$col == 1 checks if the column number is 1

  • Line 17: Quickly inspect the data.

Note: The Duncan's occupational prestige data used in the given example is borrowed from the R’s carData package.

Conclusion

The tidyxl package in R provides a straightforward and efficient way to import data from Excel workbooks and sheets. We can access and work with our data in R with simple commands, making it a valuable tool in any data science project. Hence, it simplifies integrating Excel data into our R workflows.

Free Resources

Copyright ©2024 Educative, Inc. All rights reserved