What about "//comment-like strings inside quotes"?
OP is asking how to do do it using regular expressions; so:
def remove_comments(string):
pattern = r"(\".*?\"|\'.*?\')|(/\*.*?\*/|//[^\r\n]*$)"
# first group captures quoted strings (double or single)
# second group captures comments (//single-line or /* multi-line */)
regex = re.compile(pattern, re.MULTILINE|re.DOTALL)
def _replacer(match):
# if the 2nd group (capturing comments) is not None,
# it means we have captured a non-quoted (real) comment string.
if match.group(2) is not None:
return "" # so we will return empty to remove the comment
else: # otherwise, we will return the 1st group
return match.group(1) # captured quoted-string
return regex.sub(_replacer, string)
This WILL remove:
/* multi-line comments */// single-line comments
Will NOT remove:
String var1 = "this is /* not a comment. */";char *var2 = "this is // not a comment, either.";url="http://not.comment.com";
Note: This will also work for Javascript source.