<!DOCTYPE html><html><head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body>
<p><br>
</p>
<div class="moz-cite-prefix">On 02/10/2024 23:58, Anastasiya
Lisitskaya wrote:<br>
</div>
<blockquote type="cite" cite="mid:CAD4WG-tpRrN037wY+A2agiqSZTjkR3vjkNkw9ejDrQ+iK80s6g@mail.gmail.com">
<div dir="ltr">
<div dir="ltr">Hi,
<div><br>
</div>
<div>It is very helpful! </div>
<div><br>
</div>
So, if I want to use data from the heap without extra copying
to off-heap (native MemorySegment), should using String be
avoided? It seems there is no way to use a String without
copying, as we can't guarantee a trailing null terminator.<br>
</div>
</div>
</blockquote>
I'm afraid that's the case. The Java String API does not concern
with string terminators because, in Java, all strings have a size.
In C that's not the case - so in general you need to append a
terminator, and that will involve some degree of copying.<br>
<blockquote type="cite" cite="mid:CAD4WG-tpRrN037wY+A2agiqSZTjkR3vjkNkw9ejDrQ+iK80s6g@mail.gmail.com">
<div dir="ltr">
<div dir="ltr"><br>
One thing still concerns me: is processing an unterminated
string unpredictable? Only one test from my suite fails
(returning this extra symbol or crashing).</div>
</div>
</blockquote>
<p>Processing an unterminated string leads to undefined behavior.
Effectively, your program is scanning _past_ the contents of your
string, in search for a zero. Because of the way some system calls
work (e.g. malloc) it is likely that a zero will be found more or
less where expected. But that behavior is OS/platform dependent
and absolutely cannot be relied upon.</p>
<p>Maurizio<br>
</p>
<blockquote type="cite" cite="mid:CAD4WG-tpRrN037wY+A2agiqSZTjkR3vjkNkw9ejDrQ+iK80s6g@mail.gmail.com">
<div dir="ltr">
<div dir="ltr">
<div><br>
</div>
<div>Many thanks!<br>
</div>
</div>
<br>
<div class="gmail_quote">
<div dir="ltr" class="gmail_attr">ср, 2 окт. 2024 г. в 13:11,
Maurizio Cimadamore <<a href="mailto:maurizio.cimadamore@oracle.com" moz-do-not-send="true" class="moz-txt-link-freetext">maurizio.cimadamore@oracle.com</a>>:<br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div>
<p>Hi, some replies below:<br>
</p>
<div>On 01/10/2024 20:40, Anastasiya Lisitskaya wrote:<br>
</div>
<blockquote type="cite">
<div dir="ltr">
<div><span style="font-family:arial,sans-serif;color:rgb(33,37,41)">Hi,</span><br>
</div>
<div><span style="color:rgb(33,37,41)"><font face="arial, sans-serif"><br>
</font></span></div>
<div><font face="arial, sans-serif"><span style="color:rgb(33,37,41)">I'm trying to use
the FFM API </span></font>(jdk 22)<font face="arial, sans-serif"><span style="color:rgb(33,37,41)"> to call my C++
method and I need to pass a text</span></font><span style="color:rgb(33,37,41);font-family:arial,sans-serif"> </span><span style="color:rgb(33,37,41);font-family:arial,sans-serif">(java String)</span><font face="arial, sans-serif"><span style="color:rgb(33,37,41)"> and receive a text
response</span></font><span style="color:rgb(33,37,41);font-family:arial,sans-serif">. While
implementing this, I encountered several issues:</span>
<ol style="box-sizing:border-box;padding-left:2rem;margin-top:0px;margin-bottom:1rem;color:rgb(33,37,41)">
<li style="box-sizing:border-box">
<p style="box-sizing:border-box;margin-top:0px;margin-bottom:1rem"><font face="arial, sans-serif">What are the best
practices for defining <code style="box-sizing:border-box">newSize</code> for
use in the <code style="box-sizing:border-box">reinterpret(long
newSize)</code> method? Can I use
constants like <code style="box-sizing:border-box">Long.MAX_VALUE</code> or <code style="box-sizing:border-box">Integer.MAX_VALUE</code> as <code style="box-sizing:border-box">newSize</code>,
or could that cause some problems?</font></p>
</li>
</ol>
</div>
</div>
</blockquote>
<p><font face="arial, sans-serif">If the size of the
returned string (I assume it's a char*) is known, then
use that size. Otherwise, use Long.MAX_VALUE.
MemorySegment::getString will read the string bytes up
to the null terminator.</font></p>
<p><font face="arial, sans-serif"><br>
</font></p>
<blockquote type="cite">
<div dir="ltr">
<div>
<div>
<ol style="box-sizing:border-box;padding-left:2rem;margin-top:0px;margin-bottom:1rem;color:rgb(33,37,41)">
<li style="box-sizing:border-box">
<p style="box-sizing:border-box;margin-top:0px;margin-bottom:1rem"><font face="arial, sans-serif">When I tried to
use in-heap <code style="box-sizing:border-box">MemorySegment</code> with
the <code style="box-sizing:border-box">Linker.Option.critical(true)</code>
and passed <code style="box-sizing:border-box">MemorySegment.ofArray(text.getBytes())</code>,
I started getting extra symbol like SOH in
the response. What am I doing wrong?
(Sample snippets listed below). Changing </font><span style="font-family:monospace">newSize</span> value
in <span style="font-family:monospace">reinterpret(long
newSize)</span> doesn't help</p>
</li>
</ol>
</div>
</div>
</div>
</blockquote>
<blockquote type="cite">
<div dir="ltr">
<div>
<div>
<ol style="box-sizing:border-box;padding-left:2rem;margin-top:0px;margin-bottom:1rem;color:rgb(33,37,41)">
<li style="box-sizing:border-box">
<div>If I inline <span style="color:rgb(0,0,0);font-family:"JetBrains Mono",monospace">MemorySegment.</span><span style="color:rgb(0,0,0);font-family:"JetBrains Mono",monospace;font-style:italic">ofArray</span><span style="color:rgb(0,0,0);font-family:"JetBrains Mono",monospace">(text.getBytes())</span> into <font color="#000000"><font face="JetBrains Mono, monospace">invokeExact, </font><font face="arial, sans-serif">I </font></font><span style="color:rgb(34,34,34)">expected : </span><span style="color:rgb(34,34,34);font-family:"JetBrains Mono",monospace"><font color="#000000">"мое все 123 аи92", but</font></span> got:</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">uncaught
exception:<br>
address -> 0x60000120d710<br>
what() -> "util/charset/wide.h:366:
failed to decode UTF-8 string at pos 25 in
string "\xD0\x9C\xD0\xBE\xD1\x91
\xD0\xB2\xD1\x81\xD1\x91 123
\xD0\x90\xD0\23092\1\xCF\xFD\xBD_""<br>
type -> yexception</blockquote>
</li>
</ol>
</div>
<div>I'm definitely doing something wrong. Please
help me figure it out and understand. Thanks! <br>
</div>
</div>
</div>
</blockquote>
<p>I think your problem is that the segment you are
creating has no NULL terminator in the end?</p>
<p>E.g. you take a Java string, get its byte array, and
turn the byte array into a segment.</p>
<p>To work with string safely, I suggest you use
String-accepting allocation/accessor methods. Either
Arena::allocateFrom(String), or
MemorySegment::setString. Those will add the required
terminator.</p>
<p>I think even your first example looks incorrect (where
you use `allocateFrom(JAVA_BYTE, text.getBytes()`), but
you are probably saved there by the fact that malloc
allocated a bigger chunk of memory and a zero just
happens to be at the end of the string bytes?</p>
<p>You can't pass the byte array of a Java string to a
C/C++ function expecting a null-terminated string w/o
performing some sort of copy and adding the required
trailing terminator. Some C/C++ APIs might work with
unterminated strings, in which case they will probably
accept a size - e.g. how many characters are expected in
the char*. But this doesn't seem to be the case here.</p>
<p>Hope this helps<br>
Maurizio<br>
<br>
</p>
<br>
<p><br>
</p>
</div>
</blockquote>
</div>
<br clear="all">
<div><br>
</div>
<span class="gmail_signature_prefix">-- </span><br>
<div dir="ltr" class="gmail_signature">С уважением, Лисицкая
Настя</div>
</div>
</blockquote>
</body>
</html>