Most of the answers I found (including here) emphasize on the difference between R2 and Explained Variance Score, that is: The Mean Residue (i.e. The Mean of Error).
However, there is an important question left behind, that is: Why on earth I need to consider The Mean of Error?
Refresher:
R2: is the Coefficient of Determination which measures the amount of variation explained by the (least-squares) Linear Regression.
You can look at it from a different angle for the purpose of evaluating the predicted values of y
like this:
Varianceactual_y × R2actual_y = Variancepredicted_y
So intuitively, the more R2 is closer to 1
, the more actual_y and predicted_y will have same variance (i.e. same spread)
As previously mentioned, the main difference is the Mean of Error; and if we look at the formulas, we find that’s true:
R2 = 1 - [(Sum of Squared Residuals / n) / Variancey_actual]
Explained Variance Score = 1 - [Variance(Ypredicted - Yactual) / Variancey_actual]
in which:
Variance(Ypredicted - Yactual) = (Sum of Squared Residuals - Mean Error) / n
So, obviously the only difference is that we are subtracting the Mean Error from the first formula! … But Why?
When we compare the R2 Score with the Explained Variance Score, we are basically checking the Mean Error; so if R2 = Explained Variance Score, that means: The Mean Error = Zero!
The Mean Error reflects the tendency of our estimator, that is: the Biased v.s Unbiased Estimation.
In Summary:
If you want to have unbiased estimator so our model is not underestimating or overestimating, you may consider taking Mean of Error into account.