RFR: 8321514: UTF16 string gets constructed incorrectly from codepoints if CompactStrings is not enabled

Mon Dec 11 19:30:23 UTC 2023

On Mon, 11 Dec 2023 13:48:18 GMT, Aleksei Voitylov <avoitylov at openjdk.org> wrote:

> Since JDK-8311906, if CompactStrings is not enabled, index is not considered when calling extractCodepoints from StringUTF16.toBytes(). Because of that the last elements of the source codepoints array are stripped from the resulting UTF16 string, which fires in other places (e.g. during RegEx processing).
>  
> The fix replaces len in extractCodepoints parameters with end that is index + len.

Thanks for tracking down this issue, it exposes a gap in the testing.
The fix looks fine.
The test is very specific to a particular use case.
I suggest adding a new test in PR#17066.
If it is suitable, pull it into the PR.

-------------

PR Review: https://git.openjdk.org/jdk/pull/17057#pullrequestreview-1776011326