RFR: 8311216: DataURI can lose information in some charset environments [v2]

John Hendrikx jhendrikx at openjdk.org
Sat Oct 28 06:29:47 UTC 2023


On Sat, 8 Jul 2023 23:38:05 GMT, Michael Strauß <mstrauss at openjdk.org> wrote:

>> DataURI uses the following implementation to decode the percent-encoded payload of a "data" URI:
>> 
>> 
>> ...
>> String data = uri.substring(dataSeparator + 1);
>> Charset charset = Charset.defaultCharset();
>> ...
>> URLDecoder.decode(data.replace("+", "%2B"), charset).getBytes(charset)
>> 
>> 
>> This approach only works if the charset that is passed into `URLDecoder.decode` and `String.getBytes` doesn't lose information when converting between `String` and `byte[]` representations, as might happen in a US-ASCII environment.
>> 
>> This PR solves the problem by not using `URLDecoder`, but instead simply decoding percent-encoded escape sequences as specified by RFC 3986, page 11.
>> 
>> **Note to reviewers**: the failing test can only be observed when the JVM uses a default charset that can't represent the payload, which can be enforced by specifying the `-Dfile.encoding=US-ASCII` VM option.
>
> Michael Strauß has updated the pull request incrementally with one additional commit since the last revision:
> 
>   added more tests

This change is probably a good move; not only is URLDecoder not doing the right thing, it's not that efficient when decoding a large amount of data (as it was intended for short URL's).

modules/javafx.graphics/src/main/java/com/sun/javafx/util/DataURI.java line 210:

> 208: 
> 209:     private static byte[] decodePercentEncoding(String input) {
> 210:         try (var output = new ByteArrayOutputStream(input.length())) {

If deemed important to optimize this a bit:

This could allocate an array that's a factor 3 too large, which depending on how these URI's are used can become quite big (although for large amounts of data, base64 is recommended).

I noticed that in the URLDecoder they limit the size of the buffer `new StringBuilder(numChars > 500 ? numChars / 2 : numChars);` -- perhaps you could do something similar (`new ByteArrayOutputStream(input.length() > 500 ? input.length() / 2 : input.length())`)

Another alternative would be to calculate the exact size first (by counting `%`s) avoiding the need to reallocate the `byte[]` -- performance is (IMHO) likely to be better:

- ByteArrayOutputStream method
  - allocate output array: zeroes n*3 bytes
  - decode: reads whole string once (n to n*3 reads)
  - decode: writes n bytes
  - reallocate: reads whole output once (n)
  - reallocate: writes n bytes
  - garbage created: n*3 bytes
  - max memory use: n*3 + n bytes

Total read/writes: 4n + 5n

- Count method
  - count: reads whole string once (n to n*3 reads)
  - allocate output array: zeroes n bytes
  - decode: reads whole string again (n to n*3 reads)
  - decode: writes n bytes
  - garbage created: none
  - max memory use: n bytes

Total read/writes: 6n + 2n

modules/javafx.graphics/src/main/java/com/sun/javafx/util/DataURI.java line 215:

> 213:         } catch (IOException ignored) {
> 214:             // can never happen for ByteArrayOutputStream
> 215:             return null;

If this can never happen, I'd prefer to just wrap and throw the original exception, using either `IllegalStateException` or `AssertionError` (the latter being the one that is thrown from a failed `assert`).

modules/javafx.graphics/src/main/java/com/sun/javafx/util/DataURI.java line 220:

> 218: 
> 219:     /**
> 220:      * Decodes percent-encoded text as specified by RFC 3986, page 11.

minor:
Suggestion:

     * Decodes percent-encoded text as specified by RFC 3986, section 2.1

modules/javafx.graphics/src/main/java/com/sun/javafx/util/DataURI.java line 224:

> 222:      */
> 223:     private static void decodePercentEncodingToStream(String input, OutputStream output) throws IOException {
> 224:         enum ParseState {

minor (just clarification):
Suggestion:

        enum ExpectedCharacter {

-------------

Marked as reviewed by jhendrikx (Committer).

PR Review: https://git.openjdk.org/jfx/pull/1165#pullrequestreview-1702702475
PR Review Comment: https://git.openjdk.org/jfx/pull/1165#discussion_r1375181167
PR Review Comment: https://git.openjdk.org/jfx/pull/1165#discussion_r1375169802
PR Review Comment: https://git.openjdk.org/jfx/pull/1165#discussion_r1375170235
PR Review Comment: https://git.openjdk.org/jfx/pull/1165#discussion_r1375171041


More information about the openjfx-dev mailing list