The explained_variance_ratio_
attribute is a valid attribute in scikit-learn’s (or sklearn) Principal Component Analysis (PCA) module according to the release history of sklearn. It represents the ratio of variance explained by each of the selected principal components. Sometimes, using this attribute in our code can lead to a nonexisting error with its possible causes listed below.
This sort of error can occur due to the following reasons:
Outdated sklearn
version: There is a possibility that sklearn
is outdated for our code.
PCA not instantiated: The PCA object is not created at the start of the code. It is not trained at the time of using the explained_variance_ratio_
attribute.
sklearn
versionWhen we implement PCA with code, we sometimes overlook the version that is being used of sklearn
for our projects. Attribute names vary with each version of a particular module.
To solve this, we must use the up-to-date sklearn
version that supports this attribute. The command to ensure this is given below:
pip install --upgrade scikit-learn
Let's take a look at what the output of this command will look like in the terminal.
By running this terminal, we should be able to see that the latest version of sklearn
along with its depedencies, which are installed successfully.
When we fit our initialized data to the PCA model, there could be a possibility where we forget to create a PCA object before being able to access the explained_variance_ratio_
attribute as shown below:
from sklearn.decomposition import PCA# Load or define the data as XX = [[1,2,3], [4,5,6]]# Create a PCA objectpca = PCA(n_components=2)# Access the explained variance ratiosexplained_var_ratio = pca.explained_variance_ratio_# Print or use the explained variance ratiosprint("Explained Variance Ratios:", explained_var_ratio)
We must ensure that an object of PCA is created at the start of our code so that the explained_variance_ratio_
attribute can be used properly. Alongside this, we must use the fit
method to train our 2D data as shown in the codes above:
Here’s an example code to understand this further:
from sklearn.decomposition import PCA# Load or define the data as XX = [[1,2,3], [4,5,6]]# Create a PCA objectpca = PCA(n_components=2)# Fit the PCA model to the datapca.fit(X)# Access the explained variance ratiosexplained_var_ratio = pca.explained_variance_ratio_# Print or use the explained variance ratiosprint("Explained Variance Ratios:", explained_var_ratio)
We can clearly see that a PCA object is initialized at line 7 with n_components
being equal to 2
, meaning that two principal components will be retained after dimensionality reduction.
By adhering to the suggested solutions presented above, we'll be able to identify and resolve the error of the explained_variance_ratio_
attribute not existing. It is a valuable tool for understanding the proportion of total variance retained by each principal component whilst performing PCA. By computing these ratios, we can make informed decisions about the number of components to retain when reducing the dimensionality of our data.
Free Resources