Python sci-kit learn (metrics): difference between r2_score and explained_variance_score?

Question

Most of the answers I found (including here) emphasize on the difference between R² and Explained Variance Score, that is: The Mean Residue (i.e. The Mean of Error).

However, there is an important question left behind, that is: Why on earth I need to consider The Mean of Error?

Refresher:

R²: is the Coefficient of Determination which measures the amount of variation explained by the (least-squares) Linear Regression.

You can look at it from a different angle for the purpose of evaluating the predicted values of y like this:

Variance_{actual_y} × R²_{actual_y} = Variance_{predicted_y}

So intuitively, the more R² is closer to 1, the more actual_y and predicted_y will have same variance (i.e. same spread)

As previously mentioned, the main difference is the Mean of Error; and if we look at the formulas, we find that’s true:

R² = 1 - [(Sum of Squared Residuals / n) / Variance_{y_actual}]

Explained Variance Score = 1 - [Variance_{(Y_predicted - Y_actual)} / Variance_{y_actual}]

in which:

Variance(Y_predicted - Y_actual) = (Sum of Squared Residuals - Mean Error) / n

So, obviously the only difference is that we are subtracting the Mean Error from the first formula! … But Why?

When we compare the R² Score with the Explained Variance Score, we are basically checking the Mean Error; so if R² = Explained Variance Score, that means: The Mean Error = Zero!

The Mean Error reflects the tendency of our estimator, that is: the Biased v.s Unbiased Estimation.

In Summary:

If you want to have unbiased estimator so our model is not underestimating or overestimating, you may consider taking Mean of Error into account.

In Summary:

Leave a Comment Cancel reply