<i18n dev> RFR: 8302877: Speed up latin1 case conversions

Eirik Bjorsnos duke at openjdk.org
Mon Feb 20 14:47:37 UTC 2023


On Fri, 17 Feb 2023 17:31:09 GMT, Eirik Bjorsnos <duke at openjdk.org> wrote:

> This PR suggests we speed up Character.toUpperCase and Character.toLowerCase for latin1 code points by applying the 'oldest ASCII trick in the book'.
> 
> This takes advantage of the fact that latin1 uppercase code points are always 0x20 lower than their lowercase (with the exception of two code points which uppercase out of latin1).
> 
> To verify the correctness of the new implementation, the test `Latin1CaseConversion` is added with an exhaustive verification of toUpperCase/toLowerCase for all latin1 code points.
> 
> The implementation needs to balance the performance of the various ranges in latin1. An effort has been made to favour operations on ASCII code points, without causing excessive regression for higher code points.
> 
> Performance is benchmarked for 7 chosen sample code points, each representing a range or a special-case.  Results in the first comment.

Benchmark results:

Baseline:


Benchmark                                 (codePoint)  Mode  Cnt  Score   Error  Units
Characters.Latin1CaseConversion.toLowerCase          low  avgt   15  1.267 ± 0.013  ns/op
Characters.Latin1CaseConversion.toLowerCase            A  avgt   15  1.657 ± 0.011  ns/op
Characters.Latin1CaseConversion.toLowerCase            a  avgt   15  1.258 ± 0.005  ns/op
Characters.Latin1CaseConversion.toLowerCase      A-grave  avgt   15  1.656 ± 0.011  ns/op
Characters.Latin1CaseConversion.toLowerCase      a-grave  avgt   15  1.270 ± 0.023  ns/op
Characters.Latin1CaseConversion.toLowerCase           mu  avgt   15  1.261 ± 0.006  ns/op
Characters.Latin1CaseConversion.toLowerCase           yD  avgt   15  1.260 ± 0.005  ns/op
Characters.Latin1CaseConversion.toUpperCase          low  avgt   15  1.284 ± 0.043  ns/op
Characters.Latin1CaseConversion.toUpperCase            A  avgt   15  1.264 ± 0.008  ns/op
Characters.Latin1CaseConversion.toUpperCase            a  avgt   15  1.818 ± 0.016  ns/op
Characters.Latin1CaseConversion.toUpperCase      A-grave  avgt   15  1.261 ± 0.015  ns/op
Characters.Latin1CaseConversion.toUpperCase      a-grave  avgt   15  1.822 ± 0.013  ns/op
Characters.Latin1CaseConversion.toUpperCase           mu  avgt   15  1.823 ± 0.006  ns/op
Characters.Latin1CaseConversion.toUpperCase           yD  avgt   15  1.822 ± 0.008  ns/op


PR:


Benchmark                                 (codePoint)  Mode  Cnt  Score   Error  Units
Characters.Latin1CaseConversion.toLowerCase          low  avgt   15  0.878 ± 0.005  ns/op
Characters.Latin1CaseConversion.toLowerCase            A  avgt   15  1.038 ± 0.009  ns/op
Characters.Latin1CaseConversion.toLowerCase            a  avgt   15  1.036 ± 0.007  ns/op
Characters.Latin1CaseConversion.toLowerCase      A-grave  avgt   15  1.357 ± 0.015  ns/op
Characters.Latin1CaseConversion.toLowerCase      a-grave  avgt   15  1.352 ± 0.003  ns/op
Characters.Latin1CaseConversion.toLowerCase           mu  avgt   15  1.273 ± 0.002  ns/op
Characters.Latin1CaseConversion.toLowerCase           yD  avgt   15  1.352 ± 0.004  ns/op
Characters.Latin1CaseConversion.toUpperCase          low  avgt   15  0.880 ± 0.013  ns/op
Characters.Latin1CaseConversion.toUpperCase            A  avgt   15  0.920 ± 0.071  ns/op
Characters.Latin1CaseConversion.toUpperCase            a  avgt   15  1.055 ± 0.013  ns/op
Characters.Latin1CaseConversion.toUpperCase      A-grave  avgt   15  1.394 ± 0.010  ns/op
Characters.Latin1CaseConversion.toUpperCase      a-grave  avgt   15  1.391 ± 0.009  ns/op
Characters.Latin1CaseConversion.toUpperCase           mu  avgt   15  1.597 ± 0.021  ns/op
Characters.Latin1CaseConversion.toUpperCase           yD  avgt   15  1.354 ± 0.003  ns/op

-------------

PR: https://git.openjdk.org/jdk/pull/12623


More information about the i18n-dev mailing list