Splitting strings through regular expressions by punctuation and whitespace etc in java

You have one small mistake in your regex. Try this:

String[] Res = Text.split("[\\p{Punct}\\s]+");

[\\p{Punct}\\s]+ move the + form inside the character class to the outside. Other wise you are splitting also on a + and do not combine split characters in a row.

So I get for this code

String Text = "But I know. For example, the word \"can\'t\" should";

String[] Res = Text.split("[\\p{Punct}\\s]+");
System.out.println(Res.length);
for (String s:Res){
    System.out.println(s);
}

this result

10
But
I
know
For
example
the
word
can
t
should

Which should meet your requirement.

As an alternative you can use

String[] Res = Text.split("\\P{L}+");

\\P{L} means is not a unicode code point that has the property “Letter”

Leave a Comment

tech