Correlation
This lesson explains the relationship between variables in the data.
We'll cover the following...
Correlation is used to obtain the relationship between variables. Variables are not always independent from each other. They can change with other variables. In this lesson, we will learn different ways to figure out these relationships.
Contingency table
A contingency table is used to show the relationships within categorical data.
Suppose we have the data of salaried employees in a company. We have two variables. One is the experience of the employee in years, and the other is their monthly salary. Here is the data:
Years in Experience | Salary |
---|---|
2 | $3,000 |
3 | $3,500 |
6 | $5,000 |
8 | $5,500 |
7 | $5,200 |
3 | $4,000 |
4 | $4,600 |
2 | $2,500 |
8 | $6,700 |
12 | $8,000 |
10 | $9,000 |
7 | $6,900 |
This data is not directly useful, but we can create buckets from this.
Experience/Salary | <$2000 | $2000 - $5000 | $5000-$8000 | $8000+ | Total |
---|---|---|---|---|---|
<2 Years | 56 | 25 | 16 | 5 | 102 |
2 - 5 Years | 41 | 78 | 58 | 16 | 193 |
5 - 10 years | 21 | 51 | 125 | 69 | 266 |
10+ Years | 3 | 8 | 15 | 19 | 45 |
Total | 121 | 162 | 214 | 109 | 606 |
Now we can understand that 56 of 121 employees have a salary less than $2,000. Also, more experience corresponds with higher salary. So, this table is a good way to understand the relationship between variables. ...