FFM API: questions about reinterpret and MemorySegment

Remi Forax forax at univ-mlv.fr
Mon Oct 7 10:04:25 UTC 2024


> From: "Anastasiya Lisitskaya" <lisnas at gmail.com>
> To: "Maurizio Cimadamore" <maurizio.cimadamore at oracle.com>
> Cc: "panama-dev" <panama-dev at openjdk.org>
> Sent: Thursday, October 3, 2024 12:58:23 AM
> Subject: Re: FFM API: questions about reinterpret and MemorySegment

> Hi,
> It is very helpful!

> So, if I want to use data from the heap without extra copying to off-heap
> (native MemorySegment), should using String be avoided? It seems there is no
> way to use a String without copying, as we can't guarantee a trailing null
> terminator.

> One thing still concerns me: is processing an unterminated string unpredictable?
> Only one test from my suite fails (returning this extra symbol or crashing).
> Many thanks!

At that point, the classical trick is to use the interface CharSequence instead of String, so at worst, you delay until there is a call to toString(), at best, the CharSequence is to used, so there is no copy at all. 

By example if you want to convert a string to an int, you can use 
[ https://docs.oracle.com/en/java/javase/23/docs/api/java.base/java/lang/Integer.html#parseInt(java.lang.CharSequence,int,int,int) | https://docs.oracle.com/en/java/javase/23/docs/api/java.base/java/lang/Integer.html#parseInt(java.lang.CharSequence,int,int,int) ] 
which avoid the String object creation. 

regards, 
Rémi 

> ср, 2 окт. 2024 г. в 13:11, Maurizio Cimadamore < [
> mailto:maurizio.cimadamore at oracle.com | maurizio.cimadamore at oracle.com ] >:

>> Hi, some replies below:
>> On 01/10/2024 20:40, Anastasiya Lisitskaya wrote:

>>> Hi,

>>> I'm trying to use the FFM API (jdk 22) to call my C++ method and I need to pass
>>> a text (java String) and receive a text response . While implementing this, I
>>> encountered several issues:

>>>     1.

>>> What are the best practices for defining newSize for use in the reinterpret(long
>>> newSize) method? Can I use constants like Long.MAX_VALUE or Integer.MAX_VALUE
>>> as newSize , or could that cause some problems?

>> If the size of the returned string (I assume it's a char*) is known, then use
>> that size. Otherwise, use Long.MAX_VALUE. MemorySegment::getString will read
>> the string bytes up to the null terminator.

>>>     1.

>>> When I tried to use in-heap MemorySegment with the Linker.Option.critical(true)
>>> and passed MemorySegment.ofArray(text.getBytes()) , I started getting extra
>>> symbol like SOH in the response. What am I doing wrong? (Sample snippets listed
>>> below). Changing newSize value in reinterpret(long newSize) doesn't help

>>>     1.
>>> If I inline MemorySegment. ofArray (text.getBytes()) into invokeExact, I
>>> expected : "мое все 123 аи92", but got:

>>>> uncaught exception: address -> 0x60000120d710 what() ->
>>>> "util/charset/wide.h:366: failed to decode UTF-8 string at pos 25 in string
>>>> "\xD0\x9C\xD0\xBE\xD1\x91 \xD0\xB2\xD1\x81\xD1\x91 123
>>>> \xD0\x90\xD0\23092\1\xCF\xFD\xBD_"" type -> yexception
>>> I'm definitely doing something wrong. Please help me figure it out and
>>> understand. Thanks!

>> I think your problem is that the segment you are creating has no NULL terminator
>> in the end?

>> E.g. you take a Java string, get its byte array, and turn the byte array into a
>> segment.

>> To work with string safely, I suggest you use String-accepting
>> allocation/accessor methods. Either Arena::allocateFrom(String), or
>> MemorySegment::setString. Those will add the required terminator.

>> I think even your first example looks incorrect (where you use
>> `allocateFrom(JAVA_BYTE, text.getBytes()`), but you are probably saved there by
>> the fact that malloc allocated a bigger chunk of memory and a zero just happens
>> to be at the end of the string bytes?

>> You can't pass the byte array of a Java string to a C/C++ function expecting a
>> null-terminated string w/o performing some sort of copy and adding the required
>> trailing terminator. Some C/C++ APIs might work with unterminated strings, in
>> which case they will probably accept a size - e.g. how many characters are
>> expected in the char*. But this doesn't seem to be the case here.

>> Hope this helps
>> Maurizio

> --
> С уважением, Лисицкая Настя
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/panama-dev/attachments/20241007/523eae0c/attachment.htm>


More information about the panama-dev mailing list