...

Data Frames

In this lesson, we will introduce you to data frames.

We'll cover the following...

Difference between Matrices and Data Frames
Creating Data Frames
Accessing and Manipulating Data Frames
Merging Two Data Frames
- Syntax of merge()

Data frames are an important type of object in R language. This object is particularly useful in various statistical modeling applications. Basically, Data frames are used to store tabular data in R.

Data frames store data as a sequence of columns. Each column can be of a different data type.

Difference between Matrices and Data Frames

Data frames can store different classes of objects in each column. In matrices, all the elements are of the same type, for example, all integers or all numeric.

Let’s have a look at an example. Say you want to store data of an employee. Each employee will have a name (string), address (string), phone number (integer), and gender (character). We can represent the data as follows:

Press + to interact

Accessing and Manipulating Data Frames

Let’s learn how to fetch a single element (given the row number and column number of that element) from a data frame. We can do this by using square brackets [] after the name of the data frame whose elements are being accessed. Simply put the row index and then the column index inside the brackets.

For example, we want to access the phone number of the first employee. Here, the first employee has row number $1$ and phone numbers are placed in column number $3$ :

Press + to interact

Notice that in the first code tab, while we are fetching names of the employees, the line

Levels: Alex Brian Charles

is printed. A data frame has unique row names; in this case, the name of rows has become Alex, Charles, and Brian respectively. This is because character vectors/variables passed to a data frame are converted to factors. We will be studying more about Factors in the coming lesson. Factors have an attribute called levels of character mode. The levels have to be unique.

This can be done simply by using the merge() function.

Syntax of `merge()`

merge(x, y, by.x, by.y, sort = TRUE)
# Here the parameters x and y are the two data frames that we want to merge
# by.x and by.y provide the specifications of the columns through which merging will take place
# sort parameter tells whether the result should be sorted on the specified column

By default, the data frames are merged on the columns with names they both have, but separate specifications of the columns can be given by by.x and by.y.

We use by.x or by.y only when the names of columns are different and we have to choose the ones on which merging should take place.

Let’s code the above example. We have a data frame containing employee’s data and another data frame containing employees ID. These two were kept separately to keep the employee ID confidential. Now our task is to merge the two data frames:

Press + to interact

Introduction to R

R variables

Data Structures in R

Operator in R

Conditional Statements in R

Loops in R

Function in R

Input/Output in R

Exception Handling in R

Classes in R

R Programming Challenges

Conclusion

Data Frames

Difference between Matrices and Data Frames

Creating Data Frames

Accessing and Manipulating Data Frames

Merging Two Data Frames

Syntax of `merge()`

Data Frames

Difference between Matrices and Data Frames

Creating Data Frames

Accessing and Manipulating Data Frames

Merging Two Data Frames

Syntax of merge()

Syntax of `merge()`