RFR: 8356165: System.in in jshell replace supplementary characters with ?? [v3]
Tatsunori Uchino
duke at openjdk.org
Mon May 19 12:16:54 UTC 2025
On Mon, 12 May 2025 16:37:10 GMT, Jan Lahoda <jlahoda at openjdk.org> wrote:
>> When reading from `System.in` in a JShell snippet, JShell first reads the whole line (getting a `String`), and then converts this characters from this `String` to bytes on demand. But, it does not convert multi-surrogate code points correctly, it tries to convert each surrogate separately, which cannot work.
>>
>> The proposal herein is to, when the current character is a high surrogate, peek at the next character, and if it is a low surrogate, convert both the high and low surrogates to bytes together.
>
> Jan Lahoda has updated the pull request incrementally with one additional commit since the last revision:
>
> (Attempting to) fix the test on Windows.
src/jdk.jshell/share/classes/jdk/internal/jshell/tool/ConsoleIOContext.java line 980:
> 978: if (pendingBytes == null || pendingBytes.length <= pendingBytesPointer) {
> 979: char userChar = readUserInputChar();
> 980: StringBuilder dataToConvert = new StringBuilder();
FWIW I think we can avoid using StringBuilder (and make the code more RAM-friendly):
char[] dataToConvert = { useChar, '\0' };
// if (...) {
// ...
// if (...) {
// ...
dataToConvert[1] = lowSurrogate;
// }
// ...
// }
// low-surrogate code unit never be null char
pendingBytes = dataToConvert[1] != '\0' ? String.valueOf(dataToConvert) : String.valueOf(dataToConvert[0]);
The next version of .NET is said to be able to allocate such a tiny array to the stack, instead of the heap, but I don't know whether JVM can do the same optimization.
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/25079#discussion_r2095569619
More information about the kulla-dev
mailing list