Mailto URI Composer and Validator
Friday, December 5, 2008 2:09:41 PM
I also, made a little mailto URI validator to help check mailto URIs.
The validator is strict in that:
- No duplicate hnames are allowed.
- Hnames must be lowercase.
- Only characters that need to be percent-encoded can be. Percent-encoding unreserved characters is an error.
- %HH must have 0-9A-F. No invalid %HH are allowed and no lowercase versions are allowed.
- Unsafe %HH like %00 etc. are not allowed.
- %HH sequences that make up a utf-8 sequence must be decodable and reencodable by decodeURIComponent and encodeURIComponent.
- Newlines must be represented by %0D%0A. You can't just have %0D or %0A.
- %0D%0A is not allowed in to, cc, bcc and subject hvalues. It's O.K. in other hvalues.
- %40(@) is required at least once in to, cc and bcc hvalues. (The validator doesn't decode these hvalues, parse them for the addresses and validate the addresses right now though.)
- Cannot have invalid hname=hvalue pairs or unnecessary '&' or '?'.
- Cannot have empty hnames or hvalues.
- '' must be all lowercase.
- All reserved characters must be percent-encoded if you want them in an hname or hvalue. The unreserved character list follows ECMAScript's encodeURIComponent rules as they're more compatible with the web and http than RFC2368's relaxed rules.
In doing research on mailto URIs, I've found that authors like to use duplicate hnames. For example, you might see where the author expects it to come out like the correct . You might see where the author expects it to come out like the correct . I'm not sure who exactly started this, but this is invalid even in RFC2368, so it didn't come from there. My guess would be Thunderbird developers, but I'm not sure. This also might have come about by thinking that header folding (which is mentioned in RFC2822) applies to mailto URIs.
Basically, authors like to use duplicate hnames for no reason and duplicate hnames are bad because clients can't agree on how to handle them and how you should handle them can depend on the type of the data in the hvalue. This means that if the client has a join method for arbitrary types, it might not be a good join method for that type.
Also, authors like to mix uppercase and lowercase in hnames and even the URI prefix. This is bad because URI hnames are technically case-sensitive. But, because of what authors do, clients usually have to treat hnames and prefixes as case-insensitive.
Also, authors like to leave '+' unencoded in mailto URI hvalues. This is as expected according to RFC2368. However, in this day and age, there are more http-based webmail systems out there that support compose URIs where mailto URI hvalues end up in an http query string. And, since + is treated as a space by most http webmail clients, mailto URI authors need to percent-encode their '+' characters into %2B so they actually get treated as a '+' and not a space. As an example, this can be a problem for Gmail's compose URIs. In local mail clients, this is not a problem though as they treat + and %2B the same.
So, with that said, my little mailto URI validator tries to take care of all those author issues and only allows compatible mailto URIs.
There is one thing that the validator allows that isn't 100% compatible though. In mailto URIs, and are equivalent. However, clients don't handle the latter as well. For example, right-clicking in Firefox and choosing to copy the email address might fail to pick up the address. I consider that a plain bug in Firefox though.