Whenever you have multiple values x1 … xN, you trivially have the tuple (x1, ..., xN) which is a single (albeit composite) value. If you don’t have tuples in your language, use any aggregate type (struct, class, arrays, other collections) at will. So from this perspective, returning “multiple” values and returning a single value is completely equivalent.
That just means that you only need one, but why omit the other?
First, you need aggregate types anyway, so the choice is between having both or only having aggregate types. Second, if a function can return “multiple values”, you face a semantic conundrum: Suddenly an expression does not evaluate to value (which we have previously defined very specifically), it’s this new, different category of thing called “multiple values”. Now what is the static type of this result? What can the program do and not do with it? What does it mean in implementation terms?
You can of course artificially answer these questions, but any sensible approach will just amount to tuple types. Ignoring that makes you blind to a very useful perspective, and refusing to make them first-class values is probably more complicated than saying “these are tuple types, they can be constructed like this and deconstructed like this”.