Separate title string with no spaces into words

Here is a regex which seems to work well, at least for your sample input:

(?<=[a-z])(?=[A-Z])|(?<=[0-9])(?=[A-Za-z])|(?<=[A-Za-z])(?=[0-9])|(?<=\W)(?=\W)

This patten says to make a split on a boundary of one of the following conditions:

  • what precedes is a lowercase, and what precedes is an uppercase (or
    vice-versa)
  • what precedes is a digit and what follows is a letter (or
    vice-versa)
  • what precedes and what follows is a non word character
    (e.g. quote, parenthesis, etc.)

string title = "ThisIsAnExampleTitleHELLO-WORLD2019T.E.S.T.(Test)[Test]\"Test\"'Test'";
string[] split =  Regex.Split(title, "(?<=[a-z])(?=[A-Z])|(?<=[0-9])(?=[A-Za-z])|(?<=[A-Za-z])(?=[0-9])|(?<=\\W)(?=\\W)"); 
split = (from e in split select e.Trim()).ToArray();
string newtitle = string.Join(" ", split);

This Is An Example Title HELLO-WORLD 2019 T.E.S.T. (Test) [Test] "Test" 'Test'

Note: You might also want to add this assertion to the regex alternation:

(?<=\W)(?=\w)|(?<=\w)(?=\W)

We got away with this here, because this boundary condition never happened. But you might need it with other inputs.

Leave a Comment

Hata!: SQLSTATE[HY000] [1045] Access denied for user 'divattrend_liink'@'localhost' (using password: YES)