What is the group_by() function in R Programming?

Overview

In R programming, the group_by() function is applied on data frames or tables. It groups them accordingly so that various operations could be performed. It works similar to PIVOT Table command in Excel and GROUP BY in SQL.

How group_by() works?

Syntax


group_by(.data, ..., .add = FALSE, .drop = group_by_drop_default(.data))
Syntax of 'group_by()' function

Parameters

It takes the following argument values:

  • data: It represents the data frames or a data table.
  • add: The default value of add is FALSE. But if it is applied to existing data, the value will be TRUE.
  • .drop = group_by_drop_default(.data): It represents the default value for the .drop attribute in the group_by() function. So, by default .data will be .tbl.

Return value

This function returns the given data in grouped form like a table.

Code example

In the code snippet below, we'll group two attributes of mtcars dataset with itself to see how the group_by() function works:

# including dplyr library
library(dplyr, warn.conflicts = FALSE)
# it will chain commands: mtcars and group_by(vs, am) data
by_vs_am <- mtcars %>% group_by(vs, am)
# summarise() will remove previous grouped attributes
by_vs <- by_vs_am %>% summarise(total = n())
# print remaining ungrouped values
print(by_vs)

Code explanation

  • Line 2: We load dplyr library in the program, where warn.conflicts = FALSE hides conflict alert due to different loading modules.
  • Line 4: We use group_by(vs, am) to group vs (engine shape, either v-shape or straight) and amm (transmission either automatic or manual) feature of mtcars dataset to itself as %>% forward pipe operator pushes vs and am into it.
  • Line 6: We use summarise(total = n()) to ungroup the grouped values above with mtcars dataset. It returns a tibble with an additional column to keep count of unique entries in vs and am columns.
  • Line 8: We print a 4x3 tibble with vs, am, and, total columns to the console.

Free Resources