Select and Filter Data Using R
Learn how to use the select and filter functionalities of the pipeline operator in R.
Select and filter data
We need to understand and analyze data at different levels of detail and granularity to reveal its patterns and heterogenous parts.
Although we typically deal with aggregations of whole datasets, there are inevitably times we have to deal with specific columns or other smaller components of the data frames separately. In order to do so, we should be able to isolate the target parts from the data frame.
We can isolate the desired parts of our data using the select()
and filter()
functions and then work on them.
The pipeline operator, %>%
, of the dplyr
package helps us achieve this in a practical way. We can use multiple functions in one line in a pipeline and place the pipeline operator before each function. Here is the structure:
<data> %>% <function>(<column>) %>% <function>(<column>) #Syntax example
Select data pieces from data frames
The select()
function allows us to isolate a data piece and make it ready for processing. As mentioned above, we use the pipeline operator along with the select()
function to do this. On the other hand, adding the subtraction operator (-
) as a prefix to the names of the columns deselects them.
For example, we can select and deselect the sales
column from the invoice
dataset like this:
invoice %>% select(sales) # Single column selection
invoice %>% select(-sales) # Single column deselection
invoice %>% select(c(sales,costs)) # Multiple columns
invoice %>% select(-c(sales,costs)) #
...Get hands-on with 1400+ tech skills courses.