Modeling C-Strings with MemoryLayout?

Maurizio Cimadamore maurizio.cimadamore at oracle.com
Wed Dec 28 22:16:36 UTC 2022


On 28/12/2022 15:57, Gavin Ray wrote:
> Ah, I was afraid the answer might be something like that
>
> My interest was in trying to model the types of Postgres Wire Protocol 
> as MemoryLayout types
> Unfortunately, several of the types have variable-length strength 
> fields in them
>
> If you ctrl+f here, there are some ~30-ish fields in total which are 
> "String" null-terminated C-string types =/
> PostgreSQL: Documentation: 15: 55.7. Message Formats 
> <https://urldefense.com/v3/__https://www.postgresql.org/docs/current/protocol-message-formats.html__;!!ACWV5N9M2RV99hQ!LrohIHwbfxnFmvqcqXjpDsATdrYOHZgc2uhIPKbc1MdBIEkccH2oKCjc8_YD2snFYrcAX0hF-THduftUN1UWTe840Oq3$>

Yeah - this problem has been explored during some early explorations.

If you have a message like this:

source : String
destination : String

And both strings are variable-length - how big is the size of the entire 
message?

You'll quickly discover that layouts are not that helpful when you start 
working with things like these.

Let's say we add another field:

source : String
destination : String
timestamp : Long

What is the offset of the "timestamp" field? Well, it depends on the 
length of the first string *and* on the length of the second string.

Both lengths are buried _somewhere_ inside the string. If you are lucky, 
as in ProtoBuf, the length is encoded as part of the string. If you are 
unlucky, as it seems the case for Postgre, you have to scan both strings 
to figure out what the offset is!

Needless to say, this kind of operation is already very far from what a 
VarHandle can do. Assuming we'd even implement such a monstruosity, how 
would we go about providing atomic access, etc?

The only solution that's kind of sensible here is to admit that the 
string layout has a size hole. So, to access "timestamp" you'd have to 
fill (dynamically) two holes (for the sizes of the two strings).

While this is more doable, it adds complexity to the layout API. While 
complexity is not bad per se, using a dummy layout with a size hole 
seems a bit of roundabout way to do things. That is, the crux of this 
issue is that you don't know the layout, or better, that the layout 
depends on the data you want to pass (dependent layouts). My feelings, 
when looking at this kind of things is that they always seem to fall 
outside the sweet spot of the problems the layout API wants to solve.

Maurizio
> Of course for these specific fields/types exceptions can be made but 
> it triggers my OCD, ha
>
> On Wed, Dec 28, 2022 at 8:40 AM Maurizio Cimadamore 
> <maurizio.cimadamore at oracle.com> wrote:
>
>     Hi Gavin,
>     while something like that is possible, that would not be the best
>     way to
>     use memory layouts. I believe the sweet spot for memory layout is to
>     capture data types whose size can be determined statically.
>     Variable-length data structures don't fall in this case (another
>     relatively common case is C struct which end with a variable-length
>     array). While it would be possible, with some heroics, to enhance var
>     handles to take extra dynamic access coordinates at runtimes, the
>     condition under which this would be possible are very limited.
>
>     Perhaps it would be better to understand what you are trying to
>     achieve
>     by modelling a C string with a memory layout? Often, passing
>     strings as
>     opaque pointer just works. Note also that the FFM API provides some
>     helper functions to allocate and dereference NULL terminated
>     strings -
>     see MemorySegment::get/setUtf8String, as well as
>     SegmentAllocator::allocateUtf8String.
>
>     Cheers
>     Maurizio
>
>
>
>     On 25/12/2022 19:24, Gavin Ray wrote:
>     > Is it possible to model C-Strings with MemoryLayouts?
>     >
>     > I was thinking of using "MemoryLayout.sequenceLayout(0, C_CHAR)"
>     and
>     > computing the
>     > length at runtime, plus manually adding the null-terminator.
>     >
>     > Would this work alright, or is there some better way to do this?
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/panama-dev/attachments/20221228/32e87d7e/attachment.htm>


More information about the panama-dev mailing list