How to implement Cronbach's Alpha for reliability in Python

In psychology and the social sciences, Cronbach’s alpha is the most used indicator of scale reliability. No popular data science libraries, such as Sklearn, Pandas, or NumPy, offer Cronbach alpha measurements. Its range is between 0 and 1.

Applications

Learning how our clients feel about products in a business setting can be beneficial. Let’s say a company manager wants to assess customer satisfaction overall, so he sends a survey to 10 customers, and asks them to score the company on a scale of 1 to 3 for several areas. We’ll get this survey, make its data frame, and calculate Cronbach’s alpha to assess customers’ attitudes towards the product.

Internal consistency refers to how well a survey, poll, or test truly measures what we want it to evaluate. We can be more confident that our survey is reliable if the internal consistency improves.

Cronbach's Alpha

Internal Consistency

0.9 ≤ α

Excellent

0.8 ≤ α < 0.9

Good

0.7 ≤ α < 0.8

Acceptable

0.6 ≤ α < 0.7

Questionable

0.5 ≤ α < 0.6

Poor

α < 0.5

Unacceptable

Formula

The formula to calculate Cronbach’s alpha is as follows:

Where, N is the number of questions and r is the mean correlation

Implementation

We can implement Cronbach’s alpha using the pingouin library or by making its function without using the library, that is, from scratch.

The pingouin library

We can calculate Cronbach’s alpha using a library named pingouin. For that, we have to install it first. We can use the following command to install it:

pip install pingouin

Code example

Let's look at the code below:

# Importing libraries
import pandas as pd
import pingouin as pg
# Enter survey responses of a product as a Dataframe
data = pd.DataFrame({'P1': [1, 2, 2, 3, 1, 2, 3, 3, 2, 3],
'P2': [1, 1, 1, 2, 1, 3, 2, 3, 3, 3],
'P3': [1, 1, 2, 3, 1, 3, 3, 3, 2, 3]})
# View the above Dataframe
print(data)
# Calling cronbach_alpha to calculate reliability
pg.cronbach_alpha(data=data)

Code Explanation

  • Lines 2-3: We import the necessary packages.
  • Line 5: We make a data frame of the survey using the Pandas library.
  • Line 10: We print the data frame to view it.
  • Lines 12-13: We calculate Cronbach’s alpha and show its value in the output.

Note: The output array represents the confidence interval’sThe mean of our estimate plus and minus the range of that estimate forms a confidence interval. lower and upper bound. If we repeat our test, we can expect the estimate to fall between these numbers with a reasonable level of certainty.

Without using library

Code example

Let's look at the code below:

# Importing libraries
import pandas as pd
import numpy as np
def cronbach_alpha(data):
# Transform the data frame into a correlation matrix
df_corr = data.corr()
# Calculate N
# The number of variables is equal to the number of columns in the dataframe
N = data.shape[1]
# Calculate r
# For this, we'll loop through all the columns and append every
# relevant correlation to an array called 'r_s'. Then, we'll
# calculate the mean of 'r_s'.
rs = np.array([])
for i, col in enumerate(df_corr.columns):
sum_ = df_corr[col][i+1:].values
rs = np.append(sum_, rs)
mean_r = np.mean(rs)
# Use the formula to calculate Cronbach's Alpha
cronbach_alpha = (N * mean_r) / (1 + (N - 1) * mean_r)
return cronbach_alpha
# Calling function to the calculate value of Cronbach's alpha
cronbach_alpha(data)

Code explanation

  • Lines 2-3: We import Numpy to operate arrays and Pandas to manipulate tabular data.
  • Lines 7-21: We calculate the number of questions such as the number of columns and mean correlation.
  • Line 24: The above formula is used to calculate Cronbach’s alpha.

Result

The value of Cronbach’s alpha on our survey is 0.8960, so we can say that our internal consistency of this survey is “Good.”

Free Resources

Copyright ©2024 Educative, Inc. All rights reserved