If you have reason to suspect there is a worthwhile performance gain to be had, cut it out with the rules of thumb and measure. The purpose of the advise you quote is that you don’t copy great amounts of data for no reason, but don’t jeopardize optimizations by making everything a reference either. If something is on the edge between “clearly cheap to copy” and “clearly expensive to copy”, then you can afford either option. If you must have the decision taken away from you, flip a coin.
A type is cheap to copy if it has no funky copy constructor and its sizeof
is small. There is no hard number for “small” that’s optimal, not even on a per-platform basis since it depends very much on the calling code and the function itself. Just go by your gut feeling. One, two, three words are small. Ten, who knows. A 4×4 matrix is not small.