[External] : Re: MemorySegment APIs for reading and writing strings with known lengths
Maurizio Cimadamore
maurizio.cimadamore at oracle.com
Tue Nov 11 17:18:22 UTC 2025
Thanks for the detailed reply.
For resizing, I tend to agree with you -- the problem is that if you
don't size correctly upfront, then you will have to pay the cost
(potentially multiple times) to allocate a bigger buffer and move all
the contents over there.
A bit like how string concat has evolved, where we now have ways to
"guess" the size of each of the concatenation arguments so we can
correctly size the byte[] buffer we create to hold the result of the
concatenation.
In those cases, I agree, paying a small-ish cost to be able to estimate
the size of a sub-element of an allocation goes a long way in making
everything less dynamic and more deterministic.
Maurizio
On 11/11/2025 17:04, Liam Miller-Cushon wrote:
>
> It seems to me that in this case encoding and length travels
> together? E.g. you need to encode anyway, at which point you also
> know the byte size?
>
> (I'm a bit unsure that here there's anything to be gained by the
> method you proposed?)
>
> Do you have use cases where you don't want to decode, you just
> want to know the byte length?
>
> The main use-cases I've seen do want both the encoding and the length.
>
> I think there is still a benefit to a fast way to get the length
> first. An alternative is to accumulate into a temporary buffer, and
> potentially have to resize it. If there are gigabytes of data it's
> expensive to have to make another copy. Knowing the encoded length
> up-front allows exactly sizing the output buffer and avoids the
> temporary buffer.
>
> Some slightly more concrete examples:
>
> Building a byte[] with all of the content of a lot of data, sizing the
> byte[] requires knowing the sum of all the lengths you want to put
> into it first and then encoding the strings into it.
>
> Streaming serialization to the network: the top level has to know the
> length of the transitive contents that it's going to be writing out in
> the nested structures. The actual output is streamed, it never
> constructs a byte[] of the complete data in this scenario.
>
> (There are also some public protobuf APIs that just return an encoded
> byte length for the data, but that is a less performance sensitive
> use-case.)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/panama-dev/attachments/20251111/5b1cca11/attachment.htm>
More information about the panama-dev
mailing list