how to test for a regex match

Question

re.match matches only at the beginning of the string.

def url_match(line, url):
    match = re.match(r'<a href="https://stackoverflow.com/questions/4742662/(?P<url>[^"]*?)"', line)
    return match and match.groupdict()['url'] == url:

example usage:

>>> url_match('<a href="https://stackoverflow.com/questions/4742662/test">', "https://stackoverflow.com/questions/4742662/test")
True
>>> url_match('<a href="https://stackoverflow.com/questions/4742662/test">', 'te')
False
>>> url_match('this is a <a href="https://stackoverflow.com/questions/4742662/test">', "https://stackoverflow.com/questions/4742662/test")
False

If the pattern could occur anywhere in the line, use re.search.

def url_search(line, url):
    match = re.search(r'<a href="https://stackoverflow.com/questions/4742662/(?P<url>[^"]*?)"', line)
    return match and match.groupdict()['url'] == url:

example usage:

>>> url_search('<a href="https://stackoverflow.com/questions/4742662/test">', "https://stackoverflow.com/questions/4742662/test")
True
>>> url_search('<a href="https://stackoverflow.com/questions/4742662/test">', 'te')
False
>>> url_search('this is a <a href="https://stackoverflow.com/questions/4742662/test">', "https://stackoverflow.com/questions/4742662/test")
True

N.B : If you are trying to parsing HTML using a regex, read RegEx match open tags except XHTML self-contained tags before going any further.

Leave a Comment Cancel reply