[External] : Re: MemorySegment APIs for reading and writing strings with known lengths
Liam Miller-Cushon
cushon at google.com
Wed Nov 12 10:02:16 UTC 2025
Thanks, yes, I think string concat is a good analogy.
Thinking about this more, isn't this use-case an example where the proposed
MemorySegment::ofString approach wouldn't always offer the best possible
performance? In the case where the internal string buffer isn't compatible
with the requested charset it has to make an intermediate copy. In theory
with the alternative of a setString or copy method that took a String and
directly wrote it to the output, the intermediate copy could be avoided.
On Tue, Nov 11, 2025 at 6:18 PM Maurizio Cimadamore <
maurizio.cimadamore at oracle.com> wrote:
> Thanks for the detailed reply.
>
> For resizing, I tend to agree with you -- the problem is that if you don't
> size correctly upfront, then you will have to pay the cost (potentially
> multiple times) to allocate a bigger buffer and move all the contents over
> there.
>
> A bit like how string concat has evolved, where we now have ways to
> "guess" the size of each of the concatenation arguments so we can correctly
> size the byte[] buffer we create to hold the result of the concatenation.
>
> In those cases, I agree, paying a small-ish cost to be able to estimate
> the size of a sub-element of an allocation goes a long way in making
> everything less dynamic and more deterministic.
>
> Maurizio
> On 11/11/2025 17:04, Liam Miller-Cushon wrote:
>
> It seems to me that in this case encoding and length travels together?
>> E.g. you need to encode anyway, at which point you also know the byte size?
>>
>> (I'm a bit unsure that here there's anything to be gained by the method
>> you proposed?)
>>
>> Do you have use cases where you don't want to decode, you just want to
>> know the byte length?
>>
> The main use-cases I've seen do want both the encoding and the length.
>
> I think there is still a benefit to a fast way to get the length first. An
> alternative is to accumulate into a temporary buffer, and potentially have
> to resize it. If there are gigabytes of data it's expensive to have to make
> another copy. Knowing the encoded length up-front allows exactly sizing the
> output buffer and avoids the temporary buffer.
>
> Some slightly more concrete examples:
>
> Building a byte[] with all of the content of a lot of data, sizing the
> byte[] requires knowing the sum of all the lengths you want to put into it
> first and then encoding the strings into it.
>
> Streaming serialization to the network: the top level has to know the
> length of the transitive contents that it's going to be writing out in the
> nested structures. The actual output is streamed, it never constructs a
> byte[] of the complete data in this scenario.
>
> (There are also some public protobuf APIs that just return an encoded byte
> length for the data, but that is a less performance sensitive use-case.)
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/panama-dev/attachments/20251112/4963b1b8/attachment-0001.htm>
More information about the panama-dev
mailing list