Note:
-
If you’re looking for prepackaged functionality based on the techniques discussed in this answer:
bashfunctions that enable robust escaping even in multi-line substitutions can be found at the bottom of this post (plus aperlsolution that usesperl‘s built-in support for such escaping).- @EdMorton’s answer contains a tool (
bashscript) that robustly performs single-line substitutions.- Ed’s answer now has an improved version of the
sedcommand used below, corrected in calestyo’s answer, which is needed if you want to escape string literals for potential use with other regex-processing tools, such asawkandperl. In short: for cross-tool use,\must be escaped as\\rather than as[\], which means: instead of the
sed 's/[^^]/[&]/g; s/\^/\\^/g'command used below, you must use
sed 's/[^^\]/[&]/g; s/[\^]/\\&/g;'
- Ed’s answer now has an improved version of the
-
All snippets below assume
bashas the shell (POSIX-compliant reformulations are possible):
SINGLE-line Solutions
Escaping a string literal for use as a regex in sed:
To give credit where credit is due: I found the regex used below in this answer.
Assuming that the search string is a single-line string:
search="abc\n\t[a-z]\+\([^ ]\)\{2,3\}\3" # sample input containing metachars.
searchEscaped=$(sed 's/[^^]/[&]/g; s/\^/\\^/g' <<<"$search") # escape it.
sed -n "s/$searchEscaped/foo/p" <<<"$search" # Echoes 'foo'
- Every character except
^is placed in its own character set[...]expression to treat it as a literal.- Note that
^is the one char. you cannot represent as[^], because it has special meaning in that location (negation).
- Note that
- Then,
^chars. are escaped as\^.- Note that you cannot just escape every char by putting a
\in front of it because that can turn a literal char into a metachar, e.g.\<and\bare word boundaries in some tools,\nis a newline,\{is the start of a RE interval like\{1,3\}, etc.
- Note that you cannot just escape every char by putting a
The approach is robust, but not efficient.
The robustness comes from not trying to anticipate all special regex characters – which will vary across regex dialects – but to focus on only 2 features shared by all regex dialects:
- the ability to specify literal characters inside a character set.
- the ability to escape a literal
^as\^
Escaping a string literal for use as the replacement string in sed‘s s/// command:
The replacement string in a sed s/// command is not a regex, but it recognizes placeholders that refer to either the entire string matched by the regex (&) or specific capture-group results by index (\1, \2, …), so these must be escaped, along with the (customary) regex delimiter, /.
Assuming that the replacement string is a single-line string:
replace="Laurel & Hardy; PS\2" # sample input containing metachars.
replaceEscaped=$(sed 's/[&/\]/\\&/g' <<<"$replace") # escape it
sed -n "s/.*/$replaceEscaped/p" <<<"foo" # Echoes $replace as-is
MULTI-line Solutions
Escaping a MULTI-LINE string literal for use as a regex in sed:
Note: This only makes sense if multiple input lines (possibly ALL) have been read before attempting to match.
Since tools such as sed and awk operate on a single line at a time by default, extra steps are needed to make them read more than one line at a time.
# Define sample multi-line literal.
search="/abc\n\t[a-z]\+\([^ ]\)\{2,3\}\3
/def\n\t[A-Z]\+\([^ ]\)\{3,4\}\4"
# Escape it.
searchEscaped=$(sed -e 's/[^^]/[&]/g; s/\^/\\^/g; $!a\'$'\n''\\n' <<<"$search" | tr -d '\n') #'
# Use in a Sed command that reads ALL input lines up front.
# If ok, echoes 'foo'
sed -n -e ':a' -e '$!{N;ba' -e '}' -e "s/$searchEscaped/foo/p" <<<"$search"
- The newlines in multi-line input strings must be translated to
'\n'strings, which is how newlines are encoded in a regex. $!a\'$'\n''\\n'appends string'\n'to every output line but the last (the last newline is ignored, because it was added by<<<)tr -d '\nthen removes all actual newlines from the string (sedadds one whenever it prints its pattern space), effectively replacing all newlines in the input with'\n'strings.
-
-e ':a' -e '$!{N;ba' -e '}'is the POSIX-compliant form of asedidiom that reads all input lines a loop, therefore leaving subsequent commands to operate on all input lines at once.- If you’re using GNU
sed(only), you can use its-zoption to simplify reading all input lines at once:
sed -z "s/$searchEscaped/foo/" <<<"$search"
- If you’re using GNU
Escaping a MULTI-LINE string literal for use as the replacement string in sed‘s s/// command:
# Define sample multi-line literal.
replace="Laurel & Hardy; PS\2
Masters\1 & Johnson\2"
# Escape it for use as a Sed replacement string.
IFS= read -d '' -r < <(sed -e ':a' -e '$!{N;ba' -e '}' -e 's/[&/\]/\\&/g; s/\n/\\&/g' <<<"$replace")
replaceEscaped=${REPLY%$'\n'}
# If ok, outputs $replace as is.
sed -n "s/\(.*\) \(.*\)/$replaceEscaped/p" <<<"foo bar"
- Newlines in the input string must be retained as actual newlines, but
\-escaped. -e ':a' -e '$!{N;ba' -e '}'is the POSIX-compliant form of asedidiom that reads all input lines a loop.'s/[&/\]/\\&/gescapes all&,\and/instances, as in the single-line solution.s/\n/\\&/g'then\-prefixes all actual newlines.IFS= read -d '' -ris used to read thesedcommand’s output as is (to avoid the automatic removal of trailing newlines that a command substitution ($(...)) would perform).${REPLY%$'\n'}then removes a single trailing newline, which the<<<has implicitly appended to the input.
bash functions based on the above (for sed):
quoteRe()quotes (escapes) for use in a regexquoteSubst()quotes for use in the substitution string of as///call.- both handle multi-line input correctly
- Note that because
sedreads a single line at at time by default, use ofquoteRe()with multi-line strings only makes sense insedcommands that explicitly read multiple (or all) lines at once. - Also, using command substitutions (
$(...)) to call the functions won’t work for strings that have trailing newlines; in that event, use something likeIFS= read -d '' -r escapedValue <(quoteSubst "$value")
- Note that because
# SYNOPSIS
# quoteRe <text>
quoteRe() { sed -e 's/[^^]/[&]/g; s/\^/\\^/g; $!a\'$'\n''\\n' <<<"$1" | tr -d '\n'; }
# SYNOPSIS
# quoteSubst <text>
quoteSubst() {
IFS= read -d '' -r < <(sed -e ':a' -e '$!{N;ba' -e '}' -e 's/[&/\]/\\&/g; s/\n/\\&/g' <<<"$1")
printf %s "${REPLY%$'\n'}"
}
Example:
from=$'Cost\(*):\n$3.' # sample input containing metachars.
to='You & I'$'\n''eating A\1 sauce.' # sample replacement string with metachars.
# Should print the unmodified value of $to
sed -e ':a' -e '$!{N;ba' -e '}' -e "s/$(quoteRe "$from")/$(quoteSubst "$to")/" <<<"$from"
Note the use of -e ':a' -e '$!{N;ba' -e '}' to read all input at once, so that the multi-line substitution works.
perl solution:
Perl has built-in support for escaping arbitrary strings for literal use in a regex: the quotemeta() function or its equivalent \Q...\E quoting.
The approach is the same for both single- and multi-line strings; for example:
from=$'Cost\(*):\n$3.' # sample input containing metachars.
to='You owe me $1/$& for'$'\n''eating A\1 sauce.' # sample replacement string w/ metachars.
# Should print the unmodified value of $to.
# Note that the replacement value needs NO escaping.
perl -s -0777 -pe 's/\Q$from\E/$to/' -- -from="$from" -to="$to" <<<"$from"
-
Note the use of
-0777to read all input at once, so that the multi-line substitution works. -
The
-soption allows placing-<var>=<val>-style Perl variable definitions following--after the script, before any filename operands.