RFR: 2637: Decoding emails from quoted-printable is broken

Fri Dec 12 23:13:59 UTC 2025

On Fri, 12 Dec 2025 23:07:48 GMT, Zhao Song <zsong at openjdk.org> wrote:

>> During my initial implementation of Mailman 3 support, I made an attempt at decoding quoted-printable encoded email bodies. That implementation isn't working that well. I only took 2 byte encoded UTF-8 characters into account, but we of course need to also handle 3 and 4 byte characters.
>> 
>> Instead of trying to do this with regular expressions, I bit the bullet and started working on a byte array, byte by byte. That actually makes it a lot simpler as we just need to translate each encoded triplet (`=XX`) at a time and then just convert the resulting byte array using Java's built in character set decoder.
>
> email/src/main/java/org/openjdk/skara/email/Email.java line 148:
> 
>> 146:                     }
>> 147:                     default : {
>> 148:                         out[j++] = (byte) Integer.parseInt("" + (char) in[i++] + (char) in[i], 16);
> 
> There is no boundary check here, so it always assumes there are two digits following the "=". I don't know if it's  possible for mailman server to return malformed data, but if it happens, the bot will  endlessly process the malformed input.

Oh, I was wrong, the exception will be catched at Mbox#splitMbox(), so the bot won't process the malformed data endlessly.

-------------

PR Review Comment: https://git.openjdk.org/skara/pull/1747#discussion_r2615815364