RFR: 8369564: Provide a MemorySegment API to read strings with known lengths [v7]
Liam Miller-Cushon
cushon at openjdk.org
Thu Nov 20 08:59:19 UTC 2025
On Wed, 19 Nov 2025 14:52:04 GMT, Maurizio Cimadamore <mcimadamore at openjdk.org> wrote:
>> Liam Miller-Cushon has updated the pull request incrementally with one additional commit since the last revision:
>>
>> Add a dstOffset parameter, stop using StringCharBuffer/CharsetEncoder::encode
>
> src/java.base/share/classes/java/lang/foreign/MemorySegment.java line 2649:
>
>> 2647: *
>> 2648: * @param src the Java string to be written into this segment
>> 2649: * @param dstEncoding the charset used to {@linkplain Charset#newEncoder() encode}
>
> I'm not sure we have a dependency on the charset being standard?
We do not, thanks, fixed.
Although I think the existing `allocateFrom(String, Charset)` method does have an undocumented dependency, because it uses `CharsetKind` to get the terminator char length, which only supports standard Charsets. If we add a fast path for UTF-16 that may need a dependency on a standard Charset (or a standard way to get the code unit size of a charset, if it has one).
> src/java.base/share/classes/java/lang/foreign/SegmentAllocator.java line 194:
>
>> 192: MemorySegment segment;
>> 193: if (StringSupport.bytesCompatible(str, charset, srcIndex, numChars)) {
>> 194: segment = allocateNoInit(numChars);
>
> This also seems to rely on the fact that we end up here only for latin1 strings. Again, I don't think this is correct, but if it's deliberate, we should add an assertion check.
Good point. I think we make a similar assumption in the existing `allocateFrom(String, Charset)`, it does `length + termCharSize` and that should perhaps be `(length + 1) * codeUnitSize`.
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/28043#discussion_r2544835957
PR Review Comment: https://git.openjdk.org/jdk/pull/28043#discussion_r2544862879
More information about the core-libs-dev
mailing list