pca – Tarik Billa

R function prcomp fails with NA’s values even though NA’s are allowed

April 4, 2024 by Tarik

PCA projection and reconstruction in scikit-learn

January 2, 2024 by Tarik

You can do proj = pca.inverse_transform(X_train_pca) That way you do not have to worry about how to do the multiplications. What you obtain after pca.fit_transform or pca.transform are what is usually called the “loadings” for each sample, meaning how much of each component you need to describe it best using a linear combination of the … Read more

Factor Loadings using sklearn

December 24, 2023 by Tarik

Multiply each component by the square root of its corresponding eigenvalue: pca.components_.T * np.sqrt(pca.explained_variance_) This should produce your loading matrix.

PCA on sklearn – how to interpret pca.components_

December 4, 2023 by Tarik

Terminology: First of all, the results of a PCA are usually discussed in terms of component scores, sometimes called factor scores (the transformed variable values corresponding to a particular data point), and loadings (the weight by which each standardized original variable should be multiplied to get the component score). PART1: I explain how to check … Read more

Plotting pca biplot with ggplot2

July 29, 2023 by Tarik

Obtain eigen values and vectors from sklearn PCA

July 12, 2023 by Tarik

Your implementation You are computing the eigenvectors of the correlation matrix, that is the covariance matrix of the normalized variables. data/=np.std(data, axis=0) is not part of the classic PCA, we only center the variables. So the sklearn PCA does not feature scale the data beforehand. Apart from that you are on the right track, if … Read more

raise LinAlgError(“SVD did not converge”) LinAlgError: SVD did not converge in matplotlib pca determination

May 31, 2023 by Tarik

This can happen when there are inf or nan values in the data. Use this to remove nan values: ori_data.dropna(inplace=True)

Python scikit learn pca.explained_variance_ratio_ cutoff

May 29, 2023 by Tarik

Yes, you are nearly right. The pca.explained_variance_ratio_ parameter returns a vector of the variance explained by each dimension. Thus pca.explained_variance_ratio_[i] gives the variance explained solely by the i+1st dimension. You probably want to do pca.explained_variance_ratio_.cumsum(). That will return a vector x such that x[i] returns the cumulative variance explained by the first i+1 dimensions. import … Read more

Selecting multiple odd or even columns/rows for dataframe

May 24, 2023 by Tarik

Principal components analysis using pandas dataframe

March 7, 2023 by Tarik

Most sklearn objects work with pandas dataframes just fine, would something like this work for you? import pandas as pd import numpy as np from sklearn.decomposition import PCA df = pd.DataFrame(data=np.random.normal(0, 1, (20, 10))) pca = PCA(n_components=5) pca.fit(df) You can access the components themselves with pca.components_