RFR: 8311216: DataURI can lose information in some charset environments [v4]
John Hendrikx
jhendrikx at openjdk.org
Sat Oct 28 23:14:43 UTC 2023
On Sat, 28 Oct 2023 20:21:55 GMT, Michael Strauß <mstrauss at openjdk.org> wrote:
>> DataURI uses the following implementation to decode the percent-encoded payload of a "data" URI:
>>
>>
>> ...
>> String data = uri.substring(dataSeparator + 1);
>> Charset charset = Charset.defaultCharset();
>> ...
>> URLDecoder.decode(data.replace("+", "%2B"), charset).getBytes(charset)
>>
>>
>> This approach only works if the charset that is passed into `URLDecoder.decode` and `String.getBytes` doesn't lose information when converting between `String` and `byte[]` representations, as might happen in a US-ASCII environment.
>>
>> This PR solves the problem by not using `URLDecoder`, but instead simply decoding percent-encoded escape sequences as specified by RFC 3986, page 11.
>>
>> **Note to reviewers**: the failing test can only be observed when the JVM uses a default charset that can't represent the payload, which can be enforced by specifying the `-Dfile.encoding=US-ASCII` VM option.
>
> Michael Strauß has updated the pull request incrementally with one additional commit since the last revision:
>
> review changes
LGTM
-------------
Marked as reviewed by jhendrikx (Committer).
PR Review: https://git.openjdk.org/jfx/pull/1165#pullrequestreview-1702860946
More information about the openjfx-dev
mailing list