FFM API: questions about reinterpret and MemorySegment

Maurizio Cimadamore maurizio.cimadamore at oracle.com
Thu Oct 3 08:45:05 UTC 2024


On 02/10/2024 23:58, Anastasiya Lisitskaya wrote:
> Hi,
>
> It is very helpful!
>
> So, if I want to use data from the heap without extra copying to 
> off-heap (native MemorySegment), should using String be avoided? It 
> seems there is no way to use a String without copying, as we can't 
> guarantee a trailing null terminator.
I'm afraid that's the case. The Java String API does not concern with 
string terminators because, in Java, all strings have a size. In C 
that's not the case - so in general you need to append a terminator, and 
that will involve some degree of copying.
>
> One thing still concerns me: is processing an unterminated string 
> unpredictable? Only one test from my suite fails (returning this extra 
> symbol or crashing).

Processing an unterminated string leads to undefined behavior. 
Effectively, your program is scanning _past_ the contents of your 
string, in search for a zero. Because of the way some system calls work 
(e.g. malloc) it is likely that a zero will be found more or less where 
expected. But that behavior is OS/platform dependent and absolutely 
cannot be relied upon.

Maurizio

>
> Many thanks!
>
> ср, 2 окт. 2024 г. в 13:11, Maurizio Cimadamore 
> <maurizio.cimadamore at oracle.com>:
>
>     Hi, some replies below:
>
>     On 01/10/2024 20:40, Anastasiya Lisitskaya wrote:
>>     Hi,
>>
>>     I'm trying to use the FFM API (jdk 22) to call my C++ method and
>>     I need to pass a text(java String) and receive a text response.
>>     While implementing this, I encountered several issues:
>>
>>     1.
>>
>>         What are the best practices for defining |newSize| for use in
>>         the |reinterpret(long newSize)| method? Can I use constants
>>         like |Long.MAX_VALUE| or |Integer.MAX_VALUE| as |newSize|, or
>>         could that cause some problems?
>>
>     If the size of the returned string (I assume it's a char*) is
>     known, then use that size. Otherwise, use Long.MAX_VALUE.
>     MemorySegment::getString will read the string bytes up to the null
>     terminator.
>
>
>>     1.
>>
>>         When I tried to use in-heap |MemorySegment| with the
>>         |Linker.Option.critical(true)| and passed
>>         |MemorySegment.ofArray(text.getBytes())|, I started getting
>>         extra symbol like SOH in the response. What am I doing wrong?
>>         (Sample snippets listed below). Changing newSize value in
>>         reinterpret(long newSize) doesn't help
>>
>>     1.
>>         If I inline MemorySegment.ofArray(text.getBytes()) into
>>         invokeExact, I expected : "мое все 123 аи92", but got:
>>
>>             uncaught exception:
>>                 address -> 0x60000120d710
>>                 what() -> "util/charset/wide.h:366: failed to decode
>>             UTF-8 string at pos 25 in string
>>             "\xD0\x9C\xD0\xBE\xD1\x91 \xD0\xB2\xD1\x81\xD1\x91 123
>>             \xD0\x90\xD0\23092\1\xCF\xFD\xBD_""
>>                 type -> yexception
>>
>>     I'm definitely doing something wrong. Please help me figure it
>>     out and understand. Thanks!
>
>     I think your problem is that the segment you are creating has no
>     NULL terminator in the end?
>
>     E.g. you take a Java string, get its byte array, and turn the byte
>     array into a segment.
>
>     To work with string safely, I suggest you use String-accepting
>     allocation/accessor methods. Either Arena::allocateFrom(String),
>     or MemorySegment::setString. Those will add the required terminator.
>
>     I think even your first example looks incorrect (where you use
>     `allocateFrom(JAVA_BYTE, text.getBytes()`), but you are probably
>     saved there by the fact that malloc allocated a bigger chunk of
>     memory and a zero just happens to be at the end of the string bytes?
>
>     You can't pass the byte array of a Java string to a C/C++ function
>     expecting a null-terminated string w/o performing some sort of
>     copy and adding the required trailing terminator. Some C/C++ APIs
>     might work with unterminated strings, in which case they will
>     probably accept a size - e.g. how many characters are expected in
>     the char*. But this doesn't seem to be the case here.
>
>     Hope this helps
>     Maurizio
>
>
>
>
>
> -- 
> С уважением, Лисицкая Настя
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/panama-dev/attachments/20241003/c37bc5a6/attachment.htm>


More information about the panama-dev mailing list