Select and Filter Data Using R

Learn how to use the select and filter functionalities of the pipeline operator in R.

Select and filter data

We need to understand and analyze data at different levels of detail and granularity to reveal its patterns and heterogenous parts.

Although we typically deal with aggregations of whole datasets, there are inevitably times we have to deal with specific columns or other smaller components of the data frames separately. In order to do so, we should be able to isolate the target parts from the data frame.

We can isolate the desired parts of our data using the select() and filter() functions and then work on them.

The pipeline operator, %>%, of the dplyr package helps us achieve this in a practical way. We can use multiple functions in one line in a pipeline and place the pipeline operator before each function. Here is the structure:

<data> %>% <function>(<column>) %>% <function>(<column>) #Syntax example

Select data pieces from data frames

The select() function allows us to isolate a data piece and make it ready for processing. As mentioned above, we use the pipeline operator along with the select() function to do this. On the other hand, adding the subtraction operator (-) as a prefix to the names of the columns deselects them. For example, we can select and deselect the sales column from the invoice dataset like this:

invoice %>% select(sales) # Single column selection

invoice %>% select(-sales) # Single column deselection

invoice %>% select(c(sales,costs)) # Multiple columns

invoice %>% select(-c(sales,costs)) #
...

Get hands-on with 1400+ tech skills courses.