“Variable length lookbehind not implemented” but it isn’t variable length

Question

I have reduced your problem to this:

my $text="M Y H A P P Y T E X T";
my $regex = '(?<!st)A';
print ($text =~ m/$regex/i ? "true\n" : "false\n");

Due to presence of /i (case insensitive) modifier and presence of certain character combinations such as "ss" or "st" that can be replaced by a Typographic_ligature causing it to be a variable length (/August/i matches for instance on both AUGUST (6 characters) and auguﬆ (5 characters, the last one being U+FB06)).

However if we remove /i (case insensitive) modifier then it works because typographic ligatures are not matched.

Solution: Use aa modifiers i.e.:

/(?<!st)A/iaa

Or in your regex:

my $text="M Y H A P P Y T E X T";
my $regex = '(?<!(Mon|Fri|Sun)day |August )abcd';
print ($text =~ m/$regex/iaa ? "true\n" : "false\n");

From perlre:

To forbid ASCII/non-ASCII matches (like “k” with “\N{KELVIN SIGN}”), specify the “a” twice, for example /aai or /aia. (The first occurrence of “a” restricts the \d, etc., and the second occurrence adds the “/i” restrictions.) But, note that code points outside the ASCII range will use Unicode rules for /i matching, so the modifier doesn’t really restrict things to just ASCII; it just forbids the intermixing of ASCII and non-ASCII.

See a closely related discussion here

Leave a Comment Cancel reply