Is there a bug in Ruby lookbehind assertions (1.9/2.0)?
This has been officially classified as a bug and subsequently fixed, together with another problem concerning \Z anchors in multiline strings.
This has been officially classified as a bug and subsequently fixed, together with another problem concerning \Z anchors in multiline strings.
This sounds like a job for lookbehinds, though you should be aware that not all regex flavors support them. In your example: (?<=\bipsum\s)(\w+) This will match any sequence of letter characters which follows “ipsum” as a whole word followed by a space. It does not match “ipsum” itself, you don’t need to worry about reinserting … Read more
Many regular expression libraries do only allow strict expressions to be used in look behind assertions like: only match strings of the same fixed length: (?<=foo|bar|\s,\s) (three characters each) only match strings of fixed lengths: (?<=foobar|\r\n) (each branch with fixed length) only match strings with a upper bound length: (?<=\s{,4}) (up to four repetitions) The … Read more
Lookahead and lookbehind aren’t nearly as similar as their names imply. The lookahead expression works exactly the same as it would if it were a standalone regex, except it’s anchored at the current match position and it doesn’t consume what it matches. Lookbehind is a whole different story. Starting at the current match position, it … Read more
GNU sed does not have support for lookaround assertions. You could use a more powerful language such as Perl or possibly experiment with ssed which supports Perl-style regular expressions. perl -pe ‘s/(?<=foo)bar/test/g’ file.txt
Python re lookbehinds really need to be fixed-width, and when you have alternations in a lookbehind pattern that are of different length, there are several ways to handle this situation: Rewrite the pattern so that you do not have to use alternation (e.g. Tim’s above answer using a word boundary, or you might also use … Read more
The answer to the question you ask, which is whether a larger class of languages than the regular languages can be recognised with regular expressions augmented by lookaround, is no. A proof is relatively straightforward, but an algorithm to translate a regular expression containing lookarounds into one without is messy. First: note that you can … Read more