RFR: 8299807: String.newStringUTF8NoRepl and getBytesUTF8NoRepl always copy arrays

Wed Jan 11 13:12:14 UTC 2023

On Wed, 11 Jan 2023 09:56:58 GMT, Alan Bateman <alanb at openjdk.org> wrote:

>> `JavaLangAccess::newStringUTF8NoRepl` and `JavaLangAccess::getBytesUTF8NoRepl` are not implemented correctly. They always copy arrays, rather than avoiding copying as much as possible as javadoc says.
>> 
>> I ran the tier1 test without any new errors.
>
> Would it be possible to provide some context on which public API you are testing with and the micro benchmark that you are using?

@AlanBateman 

This PR mainly affects `Files.readString` and `java.util.zip.ZipCoder`.

I designed a micro benchmark for `Files.readString`: [NoRepl.java](https://gist.github.com/Glavo/0aa47d47f329ceabf7dd4c3b9d2848e2).

This benchmark tests the performance of `Files.readString`. To avoid interference, the test is based on the memory file system.

This is the baseline:

Benchmark                                (length)  Mode  Cnt         Score        Error  Units
NoRepl.testReadAscii                            0  avgt    5       192.584 ±      1.670  ns/op
NoRepl.testReadAscii                         1024  avgt    5       296.760 ±      2.599  ns/op
NoRepl.testReadAscii                         8192  avgt    5       427.220 ±      0.809  ns/op
NoRepl.testReadAscii                      1048576  avgt    5     29082.579 ±     34.780  ns/op
NoRepl.testReadAscii                     33554432  avgt    5   1168901.308 ± 240228.024  ns/op
NoRepl.testReadUTF8                             0  avgt    5       206.196 ±      2.296  ns/op
NoRepl.testReadUTF8                          1024  avgt    5      1290.403 ±      3.920  ns/op
NoRepl.testReadUTF8                          8192  avgt    5      9371.318 ±     55.165  ns/op
NoRepl.testReadUTF8                       1048576  avgt    5   1203194.297 ±   5787.171  ns/op
NoRepl.testReadUTF8                      33554432  avgt    5  44567374.591 ± 170568.947  ns/op

This is the result based on this PR:

Benchmark                                (length)  Mode  Cnt         Score        Error  Units
NoRepl.testReadAscii                            0  avgt    5       210.050 ±     22.174  ns/op
NoRepl.testReadAscii                         1024  avgt    5       285.811 ±      4.448  ns/op
NoRepl.testReadAscii                         8192  avgt    5       350.318 ±      0.504  ns/op
NoRepl.testReadAscii                      1048576  avgt    5     19565.571 ±     33.153  ns/op
NoRepl.testReadAscii                     33554432  avgt    5    857566.083 ±  18352.548  ns/op
NoRepl.testReadUTF8                             0  avgt    5       196.632 ±      0.633  ns/op
NoRepl.testReadUTF8                          1024  avgt    5      1295.354 ±      4.450  ns/op
NoRepl.testReadUTF8                          8192  avgt    5      9381.675 ±    127.045  ns/op
NoRepl.testReadUTF8                       1048576  avgt    5   1200648.741 ±   4259.763  ns/op
NoRepl.testReadUTF8                      33554432  avgt    5  44481499.656 ± 284353.880  ns/op

This PR has very slight performance degradation (about 0.1%, almost negligible) when reading files containing non-ASCII characters.

For large ASCII files, the performance is improved by 30%~50%.

Although such a significant performance improvement on the memory file system cannot be achieved on the hard disk file system, this PR can still reduce one copy of the array and temporary memory allocation for ASCII files.

-------------

PR: https://git.openjdk.org/jdk/pull/11897