[foreign-memaccess+abi] RFR: 8308858: FFM API and strings [v2]

Wed Jun 7 16:56:18 UTC 2023

On Wed, 7 Jun 2023 15:34:30 GMT, Maurizio Cimadamore <mcimadamore at openjdk.org> wrote:

> > that both accept a generic char-set and let the user choose a suitable one as the _name_ of the charset do not really matter much (as long as both sides use the same byte -> char).
> 
> Not sure about this. Yes, we could have a getSingleByteString and getWideString. But it we also support charsets, then we have to worry about the compatibility of the provided charset with the given string method. E.g. what if I pass UTF-8 charset to getWideString? Or Utf-16 charset to getSingleByteString? The approach described here has a single parameter, which controls everything else.

Well one can always do "bad" things (especially on the C side of the chain), **but** one usually try to do "good" things, so if the native code uses a `wchar_t*` and expect it to contain UTF-8 chars with two zero bytes as end it is actually possible (in C).

So this would be more syntactic sugar (and probably a way to optimize access to the internal array if no trans-coding is required) to (Java) `String` > `char*` / wchar_t*`, this just came into my mind because in JNI I often declare String types as `byte[]` on the JNI Interface and have a helper method like this (sorry in advance for this hack):

private static byte[] cstring(String str) {
  if(str == null || str.isEmpty()) {
    return new byte[]{0};
  }
  byte[] bytes = str.getBytes(CHARSET);
  return Arrays.copyOf(bytes, bytes.length + 1);
}

and on the C(++) side:

jbyte* j_str = env->GetByteArrayElements(jniStrBytes, nullptr);
char* c_str = reinterpret_cast<char*>(j_str);

so for FFM something like

setCString(long offset, String str, Charset c)
setWString(long offset, String str, Charset c)

should be able to handle **any** charset (as long as both side agreed on it of course) but that's was just an idea that came into my mind as on C the "standard string" is just a bunch of bytes with one (or two) zero at the end... and as the string-length (or byte length after encoding) and the terminating bates (C = 1, W = 2) is known the user do not need to pass anything more than the above.

-------------

PR Comment: https://git.openjdk.org/panama-foreign/pull/836#issuecomment-1581195120