RFR: 8299807: newStringNoRepl should avoid copying arrays for ASCII compatible charsets

Sat Jan 28 04:06:51 UTC 2023

On Fri, 20 Jan 2023 16:47:27 GMT, Glavo <duke at openjdk.org> wrote:

> This is the javadoc of `JavaLangAccess::newStringNoRepl`:
> 
> 
>     /**
>      * Constructs a new {@code String} by decoding the specified subarray of
>      * bytes using the specified {@linkplain java.nio.charset.Charset charset}.
>      *
>      * The caller of this method shall relinquish and transfer the ownership of
>      * the byte array to the callee since the later will not make a copy.
>      *
>      * @param bytes the byte array source
>      * @param cs the Charset
>      * @return the newly created string
>      * @throws CharacterCodingException for malformed or unmappable bytes
>      */
> 
> 
> It is recorded in the document that it should be able to directly construct strings with parameter byte array to reduce array allocation.
> 
> However, at present, `newStringNoRepl` always copies arrays for UTF-8 or other ASCII compatible charsets.
> 
> This PR fixes this problem.

It seems odd that the benchmark seems slower for smaller files; can you suggest why that might be?
I'd expect the size distribution for Files.readString to be biased toward the smaller files.
Can you repeat the benchmark using the default file system.  OS file caching should eliminate the disk speed effects.

-------------

PR: https://git.openjdk.org/jdk/pull/12119