RFR: 8259842: Remove Result cache from StringCoding [v11]
    Claes Redestad 
    redestad at openjdk.java.net
       
    Thu Jan 21 20:48:33 UTC 2021
    
    
  
> The `StringCoding.resultCached` mechanism is used to remove the allocation of a `StringCoding.Result` object on potentially hot paths used in some `String` constructors. Using a `ThreadLocal` has overheads though, and the overhead got a bit worse after JDK-8258596 which addresses a leak by adding a `SoftReference`.
> 
> This patch refactors much of the decode logic back into `String` and gets rid of not only the `Result` cache, but the `Result` class itself along with the `StringDecoder` class and cache.
> 
> Microbenchmark results:
> Baseline
> 
> Benchmark                                           (charsetName)  Mode  Cnt    Score    Error   Units
> decodeCharset                                            US-ASCII  avgt   15  193.043 ±  8.207   ns/op
> decodeCharset:·gc.alloc.rate.norm                        US-ASCII  avgt   15  112.009 ±  0.001    B/op
> decodeCharset                                          ISO-8859-1  avgt   15  164.580 ±  6.514   ns/op
> decodeCharset:·gc.alloc.rate.norm                      ISO-8859-1  avgt   15  112.009 ±  0.001    B/op
> decodeCharset                                               UTF-8  avgt   15  328.370 ± 18.420   ns/op
> decodeCharset:·gc.alloc.rate.norm                           UTF-8  avgt   15  224.019 ±  0.002    B/op
> decodeCharset                                               MS932  avgt   15  328.870 ± 20.056   ns/op
> decodeCharset:·gc.alloc.rate.norm                           MS932  avgt   15  232.020 ±  0.002    B/op
> decodeCharset                                          ISO-8859-6  avgt   15  193.603 ± 12.305   ns/op
> decodeCharset:·gc.alloc.rate.norm                      ISO-8859-6  avgt   15  112.010 ±  0.001    B/op
> decodeCharsetName                                        US-ASCII  avgt   15  209.454 ±  9.040   ns/op
> decodeCharsetName:·gc.alloc.rate.norm                    US-ASCII  avgt   15  112.009 ±  0.001    B/op
> decodeCharsetName                                      ISO-8859-1  avgt   15  188.234 ±  7.533   ns/op
> decodeCharsetName:·gc.alloc.rate.norm                  ISO-8859-1  avgt   15  112.009 ±  0.001    B/op
> decodeCharsetName                                           UTF-8  avgt   15  399.463 ± 12.437   ns/op
> decodeCharsetName:·gc.alloc.rate.norm                       UTF-8  avgt   15  224.019 ±  0.003    B/op
> decodeCharsetName                                           MS932  avgt   15  358.839 ± 15.385   ns/op
> decodeCharsetName:·gc.alloc.rate.norm                       MS932  avgt   15  208.017 ±  0.003    B/op
> decodeCharsetName                                      ISO-8859-6  avgt   15  162.570 ±  7.090   ns/op
> decodeCharsetName:·gc.alloc.rate.norm                  ISO-8859-6  avgt   15  112.009 ±  0.001    B/op
> decodeDefault                                                 N/A  avgt   15  316.081 ± 11.101   ns/op
> decodeDefault:·gc.alloc.rate.norm                             N/A  avgt   15  224.019 ±  0.002    B/op
> 
> Patched:
> Benchmark                                           (charsetName)  Mode  Cnt    Score    Error   Units
> decodeCharset                                            US-ASCII  avgt   15  159.153 ±  6.082   ns/op
> decodeCharset:·gc.alloc.rate.norm                        US-ASCII  avgt   15  112.009 ±  0.001    B/op
> decodeCharset                                          ISO-8859-1  avgt   15  134.433 ±  6.203   ns/op
> decodeCharset:·gc.alloc.rate.norm                      ISO-8859-1  avgt   15  112.009 ±  0.001    B/op
> decodeCharset                                               UTF-8  avgt   15  297.234 ± 21.654   ns/op
> decodeCharset:·gc.alloc.rate.norm                           UTF-8  avgt   15  224.019 ±  0.002    B/op
> decodeCharset                                               MS932  avgt   15  311.583 ± 16.445   ns/op
> decodeCharset:·gc.alloc.rate.norm                           MS932  avgt   15  208.018 ±  0.002    B/op
> decodeCharset                                          ISO-8859-6  avgt   15  156.216 ±  6.522   ns/op
> decodeCharset:·gc.alloc.rate.norm                      ISO-8859-6  avgt   15  112.010 ±  0.001    B/op
> decodeCharsetName                                        US-ASCII  avgt   15  180.752 ±  9.411   ns/op
> decodeCharsetName:·gc.alloc.rate.norm                    US-ASCII  avgt   15  112.010 ±  0.001    B/op
> decodeCharsetName                                      ISO-8859-1  avgt   15  156.170 ±  8.003   ns/op
> decodeCharsetName:·gc.alloc.rate.norm                  ISO-8859-1  avgt   15  112.010 ±  0.001    B/op
> decodeCharsetName                                           UTF-8  avgt   15  370.337 ± 22.199   ns/op
> decodeCharsetName:·gc.alloc.rate.norm                       UTF-8  avgt   15  224.019 ±  0.001    B/op
> decodeCharsetName                                           MS932  avgt   15  312.589 ± 15.067   ns/op
> decodeCharsetName:·gc.alloc.rate.norm                       MS932  avgt   15  208.018 ±  0.002    B/op
> decodeCharsetName                                      ISO-8859-6  avgt   15  173.205 ±  9.647   ns/op
> decodeCharsetName:·gc.alloc.rate.norm                  ISO-8859-6  avgt   15  112.009 ±  0.001    B/op
> decodeDefault                                                 N/A  avgt   15  303.459 ± 16.452   ns/op
> decodeDefault:·gc.alloc.rate.norm                             N/A  avgt   15  224.019 ±  0.001    B/op
> 
> Most variants improve. There's a small added overhead in `String charsetName` variants for some charsets such as `ISO-8859-6` that benefited slightly from the `StringDecoder` cache due avoiding a lookup, but most variants are not helped by this cache and instead see a significant gain from skipping that step. `Charset` variants don't need a lookup and improve across the board.
> 
> Another drawback is that we need to cram more logic into `String` to bypass the `StringCoding.Result` indirection - but getting rid of two commonly used `ThreadLocal` caches and most cases getting a bit better raw throughput in the process I think more than enough makes up for that.
> 
> Testing: tier1-4
Claes Redestad has updated the pull request incrementally with one additional commit since the last revision:
  Reduce code duplication in getBytes/getBytesNoRepl
-------------
Changes:
  - all: https://git.openjdk.java.net/jdk/pull/2102/files
  - new: https://git.openjdk.java.net/jdk/pull/2102/files/6808d4db..2143cb3e
Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2102&range=10
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2102&range=09-10
  Stats: 167 lines in 1 file changed: 66 ins; 93 del; 8 mod
  Patch: https://git.openjdk.java.net/jdk/pull/2102.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/2102/head:pull/2102
PR: https://git.openjdk.java.net/jdk/pull/2102
    
    
More information about the core-libs-dev
mailing list