Parsing a tweet to extract hashtags into an array

Question

A simple regex should do the job:

>>> import re
>>> s = "I love #stackoverflow because #people are very #helpful!"
>>> re.findall(r"#(\w+)", s)
['stackoverflow', 'people', 'helpful']

Note though, that as suggested in other answers, this may also find non-hashtags, such as a hash location in a URL:

>>> re.findall(r"#(\w+)", "http://example.org/#comments")
['comments']

So another simple solution would be the following (removes duplicates as a bonus):

>>> def extract_hash_tags(s):
...    return set(part[1:] for part in s.split() if part.startswith('#'))
...
>>> extract_hash_tags("#test http://example.org/#comments #test")
set(['test'])

Leave a Comment Cancel reply