RFR: 8311216: DataURI can lose information in some charset environments

Andy Goryachev angorya at openjdk.org
Fri Jul 7 19:21:01 UTC 2023


On Sat, 1 Jul 2023 22:24:09 GMT, Michael Strauß <mstrauss at openjdk.org> wrote:

> DataURI uses the following implementation to decode the percent-encoded payload of a "data" URI:
> 
> 
> ...
> String data = uri.substring(dataSeparator + 1);
> Charset charset = Charset.defaultCharset();
> ...
> URLDecoder.decode(data.replace("+", "%2B"), charset).getBytes(charset)
> 
> 
> This approach only works if the charset that is passed into `URLDecoder.decode` and `String.getBytes` doesn't lose information when converting between `String` and `byte[]` representations, as might happen in a US-ASCII environment.
> 
> This PR solves the problem by not using `URLDecoder`, but instead simply decoding percent-encoded escape sequences as specified by RFC 3986, page 11.
> 
> **Note to reviewers**: the failing test can only be observed when the JVM uses a default charset that can't represent the payload, which can be enforced by specifying the `-Dfile.encoding=US-ASCII` VM option.

modules/javafx.graphics/src/main/java/com/sun/javafx/util/DataURI.java line 115:

> 113:             nameValuePairs,
> 114:             base64,
> 115:             base64 ? Base64.getDecoder().decode(data) : decodePercentEncoding(data));

I wonder if this is all necessary.  The data is supposed to be url-encoded, so it's essentially ASCII, no?

passing default charset to getBytes() is not right, it probably should be

URLDecoder.decode(data.replace("+", "%2B"), charset).getBytes(StandardCharsets.US_ASCII));

or am I missing something?

-------------

PR Review Comment: https://git.openjdk.org/jfx/pull/1165#discussion_r1256342382


More information about the openjfx-dev mailing list