RFR: 8264774: Implementation of Foreign Function and Memory API (Incubator) [v2]

Wed Apr 28 16:17:53 UTC 2021

On Wed, 28 Apr 2021 13:47:43 GMT, Maurizio Cimadamore <mcimadamore at openjdk.org> wrote:

>> src/jdk.incubator.foreign/share/classes/jdk/incubator/foreign/CLinker.java line 270:
>> 
>>> 268: 
>>> 269:     /**
>>> 270:      * Converts a Java string into a null-terminated C string, using the platform's default charset,
>> 
>> Sorry if this has come up before, but, is the platform's default charset the right choice here? For other areas, we choose UTF-8 as the default. In fact, there is a draft JEP to move the default charset to UTF-8. So if there is an implicit need to match the underlying platform's charset then this may need to be revisited.  For now, I just want to check that this is not an accidental reliance on the platform's default charset, but a deliberate one.
>
> I believe here the goal is to be consistent with `String::getBytes`:
> 
> 
> /**
>      * Encodes this {@code String} into a sequence of bytes using the
>      * platform's default charset, storing the result into a new byte array.
>      *
>      * <p> The behavior of this method when this string cannot be encoded in
>      * the default charset is unspecified.  The {@link
>      * java.nio.charset.CharsetEncoder} class should be used when more control
>      * over the encoding process is required.
>      *
>      * @return  The resultant byte array
>      *
>      * @since      1.1
>      */
>     public byte[] getBytes() {
>         return encode(Charset.defaultCharset(), coder(), value);
>     }
> 
> 
> So, you are right in that there's an element of platform-dependency here - but I think it's a dependency that users learned to "ignore" mostly. If developers want to be precise, and platform independent, they can always use the Charset-accepting method. Of course this could be revisited, but I think there is some consistency in what the API is trying to do. If, in the future, defaultCharset will just be Utf8 - well, that's ok too - as long as the method is specified to be "defaultCharset" dependent, what's behind "defaultCharset" is an implementation detail that the user will have to be aware of one way or another.

Naoto is working on a couple of changes in advance of JEP 400. One of these is to expose a system property with the host charset and I suspect that the CLinker method will want to use that instead of Charset.defaultCharset.

-------------

PR: https://git.openjdk.java.net/jdk/pull/3699