MemorySegment APIs for reading and writing strings with known lengths
Liam Miller-Cushon
cushon at google.com
Tue Nov 4 13:13:37 UTC 2025
Hi Jorn,
Thanks for the discussion and input!
On Mon, Nov 3, 2025 at 7:47 PM Jorn Vernee <jorn.vernee at oracle.com> wrote:
> About the copy elision, this is something that we have seen being
> visible in benchmarks [1]. Were there any benchmarks in which you've
> seen this difference show up as well, or is this more of a theoretical
> benefit? It would be good to understand how important performance is in
> all of this, or if it's more about API usability. Also, I should note
> that in the case of setString, we only avoid the extra byte[] when the
> source and target encodings are compatible.
>
We had done some earlier benchmarking, I think that was part of the
discussion that led to JDK-8362893:
https://mail.openjdk.org/pipermail/core-libs-dev/2025-July/149189.html. I
also made some draft changes to https://github.com/openjdk/jdk/pull/28043
to add a prototype of setStringWithoutNullTerminator and did some more
microbenchmarking. I updated the PR description with some results.
For use-cases like the protobuf one, the interest is more in getting the
best possible performance, rather than API usability.
>
> It's possible to call `getBytes` on a string and copy the resulting byte[]
into the memory segment as well, but I suppose you want to avoid that
because of the extra copy of the byte[] (although I think perhaps C2 can
elide the extra object in that case)? Have you tried looping over the
> string
and manually copying each character using charAt?
>
I'm not seeing competitive performance with the explicit call to getBytes
in the microbenchmarks, so I wonder if it is perhaps not eliding the copy,
although I haven't verified in the assembly.
I wouldn't have expected looping with charAt to be competitive with the
fast paths in StringSupport where the bytes are compatible, or is that not
right?
I think I'm slowly coming to the conclusion that we should just treat
> Strings as another source and destination format for data, with the
> caveat that we can not modify a String in place, so any read operations
> will have to create a new String instance instead. This creates some
> asymmetry with the existing MemorySegment::copy methods. I think because
> of that restriction, we might have to accept some asymmetry between the
> read and write APIs for strings as well.
Do you have a feeling for how that approach might best be exposed in the
API? Do you think it might look like more variations of getString/setString
in MemorySegment? Or that there might be a missing API primitive that could
encapsulate those String sources and destinations? Or something else?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/panama-dev/attachments/20251104/a9db6bcc/attachment-0001.htm>
More information about the panama-dev
mailing list