<i18n dev> RFR: 8365675: Add String Unicode Case-Folding Support [v7]
Xueming Shen
sherman at openjdk.org
Thu Oct 30 03:03:08 UTC 2025
On Wed, 29 Oct 2025 21:07:03 GMT, Roger Riggs <rriggs at openjdk.org> wrote:
>>> Experimenting with Arrays.mismatch at the beginning of the array iteration as
>>> ...
>>> The benchmark results suggest that it does help 'dramatically' when the compared strings share with the same prefix. For example those "UpperLower" test cases (which shares the same upper cases text prefix. However it is also relatively expensive, with a 20%-ish overhead when the strings do not share the same string text but are case-insensitively equals. I would suggest let's leave it out for now?
>>
>>> ```
>> Ok to leave it out for now. In similar contexts where System.arraycopy or Arrays.mismatch has some overhead I've suggested doing a simple check (like `size < 8`) to avoid the overhead when the strings/byte arrays are short.
>> Thanks for checking.
>
>> The performance is slightly better, but not as good as I would have expected. The access to codepoint from the long looks a little clumsy, but the logic looks smooth. need more work. opinion?
> It does look cleaner without the array indexing in the loops.
> Can the counting of characters (fcnt1,fcnt2) be eliminated by encoding 3 20-bit characters into the long and then checking `f1 != 0` to indicate there are more characters. Its a bit of an odd mix of 16-bit characters vs a single 20-bit char. Are there any 20-bit chars from or to folded replacements in the folding mappings?
Good idea. After removing the fcnt the implementation looks much cleaner and more straightforward. The1:m folding implementation is also faster. Maybe this is good enough to. go :-)
The latest numbers
Benchmark Mode Cnt Score Error Units
StringCompareToFoldCase.asciiLower avgt 15 15.874 ± 1.276 ns/op
StringCompareToFoldCase.asciiLowerEQ avgt 15 9.915 ± 0.242 ns/op
StringCompareToFoldCase.asciiLowerEQFC avgt 15 10.751 ± 0.219 ns/op
StringCompareToFoldCase.asciiLowerFC avgt 15 10.277 ± 0.126 ns/op
StringCompareToFoldCase.asciiUpperLower avgt 15 12.121 ± 0.699 ns/op
StringCompareToFoldCase.asciiUpperLowerEQ avgt 15 10.836 ± 0.746 ns/op
StringCompareToFoldCase.asciiUpperLowerEQFC avgt 15 9.091 ± 0.273 ns/op
StringCompareToFoldCase.asciiUpperLowerFC avgt 15 9.207 ± 0.255 ns/op
StringCompareToFoldCase.asciiWithDFFC avgt 15 38.322 ± 0.975 ns/op
StringCompareToFoldCase.greekLower avgt 15 39.746 ± 0.127 ns/op
StringCompareToFoldCase.greekLowerEQ avgt 15 39.303 ± 0.063 ns/op
StringCompareToFoldCase.greekLowerEQFC avgt 15 20.470 ± 0.329 ns/op
StringCompareToFoldCase.greekLowerFC avgt 15 19.734 ± 0.295 ns/op
StringCompareToFoldCase.greekUpperLower avgt 15 7.084 ± 0.085 ns/op
StringCompareToFoldCase.greekUpperLowerEQ avgt 15 7.472 ± 0.115 ns/op
StringCompareToFoldCase.greekUpperLowerEQFC avgt 15 6.608 ± 0.248 ns/op
StringCompareToFoldCase.greekUpperLowerFC avgt 15 6.573 ± 0.189 ns/op
StringCompareToFoldCase.latin1UTF16 avgt 15 24.407 ± 2.157 ns/op
StringCompareToFoldCase.latin1UTF16EQ avgt 15 22.632 ± 0.131 ns/op
StringCompareToFoldCase.latin1UTF16EQFC avgt 15 29.564 ± 0.655 ns/op
StringCompareToFoldCase.latin1UTF16FC avgt 15 29.273 ± 0.324 ns/op
StringCompareToFoldCase.supLower avgt 15 54.145 ± 0.075 ns/op
StringCompareToFoldCase.supLowerEQ avgt 15 55.545 ± 0.042 ns/op
StringCompareToFoldCase.supLowerEQFC avgt 15 24.788 ± 0.180 ns/op
StringCompareToFoldCase.supLowerFC avgt 15 24.515 ± 0.025 ns/op
StringCompareToFoldCase.supUpperLower avgt 15 14.437 ± 0.127 ns/op
StringCompareToFoldCase.supUpperLowerEQ avgt 15 15.253 ± 0.728 ns/op
StringCompareToFoldCase.supUpperLowerEQFC avgt 15 9.820 ± 0.104 ns/op
StringCompareToFoldCase.supUpperLowerFC avgt 15 9.776 ± 0.127 ns/op
Finished running test 'micro:org.openjdk.bench.java.lang.StringCompareToFoldCase'
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/27628#discussion_r2476267966
More information about the i18n-dev
mailing list