RFR: 8356165: System.in in jshell replace supplementary characters with ?? [v3]

Tatsunori Uchino duke at openjdk.org
Mon May 19 12:16:54 UTC 2025


On Mon, 12 May 2025 16:37:10 GMT, Jan Lahoda <jlahoda at openjdk.org> wrote:

>> When reading from `System.in` in a JShell snippet, JShell first reads the whole line (getting a `String`), and then converts this characters from this `String` to bytes on demand. But, it does not convert multi-surrogate code points correctly, it tries to convert each surrogate separately, which cannot work.
>> 
>> The proposal herein is to, when the current character is a high surrogate, peek at the next character, and if it is a low surrogate, convert both the high and low surrogates to bytes together.
>
> Jan Lahoda has updated the pull request incrementally with one additional commit since the last revision:
> 
>   (Attempting to) fix the test on Windows.

src/jdk.jshell/share/classes/jdk/internal/jshell/tool/ConsoleIOContext.java line 980:

> 978:         if (pendingBytes == null || pendingBytes.length <= pendingBytesPointer) {
> 979:             char userChar = readUserInputChar();
> 980:             StringBuilder dataToConvert = new StringBuilder();

FWIW I think we can avoid using StringBuilder (and make the code more RAM-friendly):


char[] dataToConvert = { useChar, '\0' };
// if (...) {
// ...
// if (...) {
// ...
dataToConvert[1] = lowSurrogate;
// }
// ...
// }
// low-surrogate code unit never be null char
pendingBytes = dataToConvert[1] != '\0' ? String.valueOf(dataToConvert) : String.valueOf(dataToConvert[0]);


The next version of .NET is said to be able to allocate such a tiny array to the stack, instead of the heap, but I don't know whether JVM can do the same optimization.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/25079#discussion_r2095569619


More information about the kulla-dev mailing list