[External] : Re: MemorySegment APIs for reading and writing strings with known lengths
Maurizio Cimadamore
maurizio.cimadamore at oracle.com
Wed Nov 12 10:56:03 UTC 2025
On 12/11/2025 10:02, Liam Miller-Cushon wrote:
> Thanks, yes, I think string concat is a good analogy.
>
> Thinking about this more, isn't this use-case an example where the
> proposed MemorySegment::ofString approach wouldn't always offer the
> best possible performance? In the case where the internal string
> buffer isn't compatible with the requested charset it has to make an
> intermediate copy. In theory with the alternative of a setString or
> copy method that took a String and directly wrote it to the output,
> the intermediate copy could be avoided.
Let's leave MS::ofString aside for this discussion (as I agree that
wouldn't be optimal for this use case).
I believe what you mean here is that if I have a string, and I want to
copy to a destination segment I could either:
* if the string buffer is compatible, just bulk-copy that buffer into
the target segment
* if the string buffer is not compatible, encode the string _directly_
into the target segment
Correct? If so, I tend to agree this would be slightly preferrable, as
we'd be touching things only once. And, I believe this can be also done
to the existing setString method?
Cheers
Maurizio
>
> On Tue, Nov 11, 2025 at 6:18 PM Maurizio Cimadamore
> <maurizio.cimadamore at oracle.com> wrote:
>
> Thanks for the detailed reply.
>
> For resizing, I tend to agree with you -- the problem is that if
> you don't size correctly upfront, then you will have to pay the
> cost (potentially multiple times) to allocate a bigger buffer and
> move all the contents over there.
>
> A bit like how string concat has evolved, where we now have ways
> to "guess" the size of each of the concatenation arguments so we
> can correctly size the byte[] buffer we create to hold the result
> of the concatenation.
>
> In those cases, I agree, paying a small-ish cost to be able to
> estimate the size of a sub-element of an allocation goes a long
> way in making everything less dynamic and more deterministic.
>
> Maurizio
>
> On 11/11/2025 17:04, Liam Miller-Cushon wrote:
>>
>> It seems to me that in this case encoding and length travels
>> together? E.g. you need to encode anyway, at which point you
>> also know the byte size?
>>
>> (I'm a bit unsure that here there's anything to be gained by
>> the method you proposed?)
>>
>> Do you have use cases where you don't want to decode, you
>> just want to know the byte length?
>>
>> The main use-cases I've seen do want both the encoding and the
>> length.
>>
>> I think there is still a benefit to a fast way to get the length
>> first. An alternative is to accumulate into a temporary buffer,
>> and potentially have to resize it. If there are gigabytes of data
>> it's expensive to have to make another copy. Knowing the encoded
>> length up-front allows exactly sizing the output buffer and
>> avoids the temporary buffer.
>>
>> Some slightly more concrete examples:
>>
>> Building a byte[] with all of the content of a lot of data,
>> sizing the byte[] requires knowing the sum of all the lengths you
>> want to put into it first and then encoding the strings into it.
>>
>> Streaming serialization to the network: the top level has to know
>> the length of the transitive contents that it's going to be
>> writing out in the nested structures. The actual output is
>> streamed, it never constructs a byte[] of the complete data in
>> this scenario.
>>
>> (There are also some public protobuf APIs that just return an
>> encoded byte length for the data, but that is a less performance
>> sensitive use-case.)
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/panama-dev/attachments/20251112/d5e5e3dc/attachment.htm>
More information about the panama-dev
mailing list