Note that the regex need not be recompiled each time. From the Javadoc:
An invocation of this method of the form
str.split(regex, n)
yields the same result as the expression
Pattern.compile(regex).split(str, n)
That is, if you are worried about performance, you may precompile the pattern and then reuse it:
Pattern p = Pattern.compile(regex);
...
String[] tokens1 = p.split(str1);
String[] tokens2 = p.split(str2);
...
instead of
String[] tokens1 = str1.split(regex);
String[] tokens2 = str2.split(regex);
...
I believe that the main reason for this API design is convenience. Since regular expressions include all “fixed” strings/chars too, it simplifies the API to have one method instead of several. And if someone is worried about performance, the regex can still be precompiled as shown above.
My feeling (which I can’t back with any statistical evidence) is that most of the cases String.split()
is used in a context where performance is not an issue. E.g. it is a one-off action, or the performance difference is negligible compared to other factors. IMO rare are the cases where you split strings using the same regex thousands of times in a tight loop, where performance optimization indeed makes sense.
It would be interesting to see a performance comparison of a regex matcher implementation with fixed strings/chars compared to that of a matcher specialized to these. The difference might not be big enough to justify the separate implementation.