…use both feature scaling (dividing by the
“max-min”, or range, of a feature) and mean normalization.
So for any individual feature f:
f_norm = (f - f_mean) / (f_max - f_min)
e.g. for x2,(midterm exam)^2 = {7921, 5184, 8836, 4761}
> x2 <- c(7921, 5184, 8836, 4761)
> mean(x2)
6676
> max(x2) - min(x2)
4075
> (x2 - mean(x2)) / (max(x2) - min(x2))
0.306 -0.366 0.530 -0.470
Hence norm(5184) = 0.366
(using R language, which is great at vectorizing expressions like this)
I agree it’s confusing they used the notation x2 (2) to mean x2 (norm) or x2′
EDIT: in practice everyone calls the builtin scale(...) function, which does the same thing.