escape()
is defined in section B.2.1.2 escape and the introduction text of Annex B says:
… All of the language features and behaviours specified in this annex have one or more undesirable characteristics and in the absence of legacy usage would be removed from this specification. …
For characters, whose code unit value is 0xFF or less, escape()
produces a two-digit escape sequence: %xx
. This basically means, that escape()
converts a string containing only characters from U+0000
to U+00FF
to an percent-encoded string using the latin-1 encoding.
For characters with a greater code unit, the four-digit format %uxxxx
is used. This is not allowed within the hfields
section (where subject and body are stored) of an mailto:
-URI (as defined in RFC6068):
mailtoURI = "mailto:" [ to ] [ hfields ]
to = addr-spec *("," addr-spec )
hfields = "?" hfield *( "&" hfield )
hfield = hfname "=" hfvalue
hfname = *qchar
hfvalue = *qchar
...
qchar = unreserved / pct-encoded / some-delims
some-delims = "!" / "$" / "'" / "(" / ")" / "*"
/ "+" / "," / ";" / ":" / "@"
unreserved
and pct-encoded
are defined in STD66:
unreserved = ALPHA / DIGIT / "-" / "." / "_" / "~"
pct-encoded = "%" HEXDIG HEXDIG
A percent sign is only allowed if it is directly followed by two hexdigits, percent followed by u
is not allowed.
Using a self-implemented version, that behaves exactly like escape
doesn’t solve anything – instead just continue to use escape
, it won’t be removed anytime soon.
To summerise: Your previous usage of escape()
generated latin1-percent-encoded mailto-URIs if all characters are in the range U+0000
to U+00FF
, otherwise an invalid URI was generated (which might still be correctly interpreted by some applications, if they had javascript-encode/decode compatibility in mind).
It is more correct (no risk of creating invalid URIs) and future-proof, to generate UTF8-percent-encoded mailto-URIs using encodeURIComponent()
(don’t use encodeURI()
, it does not escape ?
, /
, …). RFC6068 requires usage of UTF-8 in many places (but allows other encodings for “MIME encoded words and for bodies in composed email messages”).
Example:
text_latin1="Swedish åäö"
text_other="Emoji 😎"
document.getElementById('escape-latin-1-link').href="https://stackoverflow.com/questions/26342123/mailto:?subject="+escape(text_latin1);
document.getElementById('escape-other-chars-link').href="https://stackoverflow.com/questions/26342123/mailto:?subject="+escape(text_other);
document.getElementById('utf8-link').href="https://stackoverflow.com/questions/26342123/mailto:?subject="+encodeURIComponent(text_latin1);
document.getElementById('utf8-other-chars-link').href="https://stackoverflow.com/questions/26342123/mailto:?subject="+encodeURIComponent(text_other);
function mime_word(text){
q_encoded = encodeURIComponent(text) //to utf8 percent encoded
.replace(/[_!'()*]/g, function(c){return '%'+c.charCodeAt(0).toString(16).toUpperCase();})// encode some more chars as utf8
.replace(/%20/g,'_') // mime Q-encoding is using underscore as space
.replace(/%/g,'='); //mime Q-encoding uses equal instead of percent
return encodeURIComponent('=?utf-8?Q?'+q_encoded+'?=');//add mime word stuff and escape for uri
}
//don't use mime_word for body!!!
document.getElementById('mime-word-link').href="https://stackoverflow.com/questions/26342123/mailto:?subject="+mime_word(text_latin1);
document.getElementById('mime-word-other-chars-link').href="https://stackoverflow.com/questions/26342123/mailto:?subject="+mime_word(text_other);
<a id="escape-latin-1-link">escape()-latin1</a><br/>
<a id="escape-other-chars-link">escape()-emoji</a><br/>
<a id="utf8-link">utf8</a><br/>
<a id="utf8-other-chars-link">utf8-emoji</a><br/>
<a id="mime-word-link">mime-word</a><br/>
<a id="mime-word-other-chars-link">mime-word-emoji</a><br/>
For me, the UTF-8 links and the Mime-Word links work in Thunderbird. Only the plain UTF-8 links work in Windows 10 builtin Mailapp and my up-to-date version of Outlook.