[External] : Re: MemorySegment APIs for reading and writing strings with known lengths

Maurizio Cimadamore maurizio.cimadamore at oracle.com
Tue Nov 11 17:18:22 UTC 2025


Thanks for the detailed reply.

For resizing, I tend to agree with you -- the problem is that if you 
don't size correctly upfront, then you will have to pay the cost 
(potentially multiple times) to allocate a bigger buffer and move all 
the contents over there.

A bit like how string concat has evolved, where we now have ways to 
"guess" the size of each of the concatenation arguments so we can 
correctly size the byte[] buffer we create to hold the result of the 
concatenation.

In those cases, I agree, paying a small-ish cost to be able to estimate 
the size of a sub-element of an allocation goes a long way in making 
everything less dynamic and more deterministic.

Maurizio

On 11/11/2025 17:04, Liam Miller-Cushon wrote:
>
>     It seems to me that in this case encoding and length travels
>     together? E.g. you need to encode anyway, at which point you also
>     know the byte size?
>
>     (I'm a bit unsure that here there's anything to be gained by the
>     method you proposed?)
>
>     Do you have use cases where you don't want to decode, you just
>     want to know the byte length?
>
> The main use-cases I've seen do want both the encoding and the length.
>
> I think there is still a benefit to a fast way to get the length 
> first. An alternative is to accumulate into a temporary buffer, and 
> potentially have to resize it. If there are gigabytes of data it's 
> expensive to have to make another copy. Knowing the encoded length 
> up-front allows exactly sizing the output buffer and avoids the 
> temporary buffer.
>
> Some slightly more concrete examples:
>
> Building a byte[] with all of the content of a lot of data, sizing the 
> byte[] requires knowing the sum of all the lengths you want to put 
> into it first and then encoding the strings into it.
>
> Streaming serialization to the network: the top level has to know the 
> length of the transitive contents that it's going to be writing out in 
> the nested structures. The actual output is streamed, it never 
> constructs a byte[] of the complete data in this scenario.
>
> (There are also some public protobuf APIs that just return an encoded 
> byte length for the data, but that is a less performance sensitive 
> use-case.)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/panama-dev/attachments/20251111/5b1cca11/attachment.htm>


More information about the panama-dev mailing list