[foreign] road to posix
Samuel Audet
samuel.audet at gmail.com
Wed May 30 01:42:14 UTC 2018
I think that about covers what I wanted to say, thanks John! Although
I'm still concerned about the lack of specifics about how to achieve
these goals (LLVM? something else?).
BTW, about enums, I've recently come up with an optional hack for that
(which is useful for C++ enums because they are scoped just like Java
enums). It looks like this:
https://github.com/bytedeco/javacpp-presets/blob/master/tensorrt/src/main/java/org/bytedeco/javacpp/nvinfer.java#L150
In C/C++, an enum is basically integer, so we need a way to map
arbitrary integers to Java enums as well. Luckily, JNI lets us create
new enum objects with arbitrary values for the fields, but those are not
very useful because toString() returns null and they don't work with
switch statements. So, I've added an optional "intern()" method that
gets us the right object, if any, whose overhead is only incurred when
desired by the user, as with severity = severity.intern() here:
https://github.com/bytedeco/javacpp-presets/blob/master/tensorrt/README.md#sample-usage
Otherwise we just get an integer, just like in C/C++, without overhead,
minus JNI.
Samuel
On 05/30/2018 06:51 AM, John Rose wrote:
> TL;DR: We need a variety of binding strategies to cover the variety
> of macro types and complexities. Complexities range from simple
> constants to wacky throwaways; strategies range from pure metadata
> to machine code snippets. All of this is in reach, and gives us a big
> step towards C++. Also, beware of false friends.
>
> On May 29, 2018, at 7:51 AM, Maurizio Cimadamore <maurizio.cimadamore at oracle.com> wrote:
>>
>> So, to close the loop, it seems like the problem I had with RTLD_DEFAULT was _not_caused by jextract not supporting constant macros - but, rather, with jextract not recognizing that specific kind of macro constant.
>
> So, macro constants are a special case of what C/++ calls "object-like
> macros", as opposed to "function-like macros".
>
> The general case is intractible, of course, because it allows wacky stuff like:
>
> #define BEGIN {
> #define END }
>
> C programs sometimes pull stunts like this to hack the language, including basic
> notions of block structure. Programmer cleverness is unbounded and unrestrained,
> isn't it?
>
> A real and inconvenient example of this programmer cleverness is the traditional
> treatment of fields of struct stat. They are often defined with object-like macros
> as follows:
>
> #define st_atime st_atimespec.tv_sec
> /* source: /usr/include/sys/stat.h on MacOS */
>
> We can't capture this in a systematic way. Following Maurizio's write-up,
> the plan of record for dealing with such things is to ask the jextract user to define
> function-like macros and/or true functions which are canonical uses of such
> oddly-shaped macros, and fold those into the jextract run, as auxiliary functions
> (or macros) explicitly requested by the jextract operator, in a side-file.
>
> (Sometimes the existing header files feature such macros as well; sys/stat.h
> has S_ISBLK(m) for example. In that case, if the user can supply just the
> function type, and link it to the macro, jextract could make the wrapper. This
> is the way I handled S_ISBLK once upon a time, but I think for Panama it
> is an unnecessary refinement. See discussion below of level 2 vs. level 3.)
>
> Capturing such irregular standard macros, as well as auxiliary functions
> added by the jextract user, requires the jextract to wrap such expressions in
> properly-typed code snippets, compile them, and deliver them as machine
> code resources to the binder. This is something we're just beginning to get
> into with Panama, although I and others like Samuel have pulled this trick
> on other projects. This trick works for object-like macros as well as function-like
> macros; simply (as we are discussing) wrap the object-like expression in a
> no-argument code snippet, and invoke it appropriately from the binder
> (either once statically or once per use).
>
> But most object-like macros are much simpler, and we can be correspondingly
> more automatic in handling them. A canonical case is EOF from stdio.h,
> which is usually just (-1).
>
> (I think stdio.h is rich with canonical examples, including varargs functions
> like printf and macro versions of fileno. That, stat.h, and qsort in stdlib.h
> are IMO the trifecta of tricky initial demos for header extraction.)
>
> The special case of an integral constant expression is tractable (although
> there are limitations and pitfalls). The C preprocessor actually defines a
> sub-language for C expressions, which is evaluated by the #if directive.
> It would be reasonable for us to handle that sub-language specially and
> precisely, although the sub-language includes recursive calls to object-like
> and function-like macros (because the cpp macro-expands #if arguments).
> For example:
>
> #define MASK(n) ((1<<(n))-1)
> #define THREE
> #define MASK3 MASK(THREE)
> #if (MASK3 == 7) /* yes, it's true! */
>
> The #if argument evaluates at cpp-time (and jextract time!) to 1.
>
> When we get to non-integral types like NULL and RTLD_DEFAULT we have
> a choice: Try to decide if we can extend the constant-folding logic to deduce
> a pointer bit-pattern plus a type, which we can then capture.
>
> There are three levels of possible support for borderline cases like
> RTLD_SUPPORT:
>
> 1. Inferred metadata: Infer the type and the bitwise value in jextract, and
> store it all in metadata (annotations).
>
> 2. Inferred snippet: Infer the type, don't infer the value, but wrap the
> expression in a snippet for the binder to invoke.
>
> 3. Explicit snippet: Take advice from the user, in the form of an auxiliary
> function, which is wrapped in a snippet for the binder, and accompanied
> by metadata derived from the type of the explicit auxiliary function.
>
> (The fourth possibility of explicit metadata might be useful if there were
> a hard requirement to specify that an expression like (-1) needed a surprising
> type that couldn't be inferred. But I think that corner case can be handled well
> enough as an explicit snippet; just ask the user to specify the whole thing
> and over-package it as a machine code, even if we could transmit it as
> metadata.)
>
> Inferring a type is tricky; sometimes there is ambiguity and sometimes
> you just have to guess. Guessing is OK for very simple macros like max.
> Sometimes you want to be allowed to make several guesses and transmit
> them all (as overloads). Here's max:
>
> #define max(a, b) ((a) > (b) ? (a) : (b)) // int32? int64? void*? double?
>
> Currently we do the inferred metadata (level 1), in a limited way, for simple
> expressions. The next thing we should do, I would say, is the other extreme,
> the explicit snippet (level 3). If that is usable, then we don't need to fiddle
> the the hybrid inferred snippet, for macros.
>
> The middle level (inferred snippet) will really come into its own with C++,
> where almost every API point, for classes rich in inline access functions,
> will have to be an inferred snippet.
>
> Basically, each level is a distinct binding strategy, covering more or less
> general cases, requiring more or less help from the jextract operator,
> and passing information to the binder using a mix of strategies.
>
> A final topic: There needs to be some continuity between levels, in terms
> of how jextract allocates them to carrier interfaces in Java, and how the
> binder attaches them to those interfaces. Small platform configuration
> chages like -U__GNUC__ will shift C API points from one level to
> another, and it would be best if jextract and the binder could hide those
> effects.
>
> This has a very practical implication: We sometimes need to trade
> for continuity over clever ad hoc use of Java features that are peculiar
> to one level (one binding strategy).
>
> I think the most common case where this trade-off is felt is with a simple
> a level 1 object-like macro constant. We might want to use a binding
> strategy of a static interface constant (via ConstantValue attribute in
> the classfile). But this will disrupt Java APIs if there is any chance
> that the macro might be re-extracted as a level 2 or level 3 value
> (a snippet), since there is no way to bind the result of executing the
> snippet to a ConstantValue attribute. Being clever with an ad hoc
> level-specific binding strategy risks make the Java API irregular.
>
> In the case of ConstantValue attributes, we've repeatedly examined
> them and found them inappropriate for Panama. The biggest problem
> is that Java client code might be compiled against one version of an
> extracted interface, and then be used at runtime against a different
> version (on a different platform). In that case, the ConstantValue from
> the first version would be used with code from the second version,
> which would risk very subtle bugs. The cure is to avoid the temptation
> to be too clever in binding C constants to Java constants.
>
> This trade-off tends to push *all* extracted constant values to be
> function-like at the JVM level instead of value-like. I think this is OK;
> we don't have to use every last Java language feature to carry C APIs.
>
> Some Java language features are "false friends"; they look good when
> you first meet them but if you start to work with them, they cause problems.
>
> Examples of false friends spring easily to mind: C enums are so different
> from Java enums that we cannot translate the former into the latter;
> the most we could do is have a very special, rarely used switch for
> the jextract operator to explicitly opt into Java enums. The same
> point holds for C arrays vs. Java arrays. (Maurizio's write-up deals
> with these cases nicely.) In C++ there are other false friends: C++
> constructors are very different from Java constructors, etc.
>
> I think that the very special Java feature of static constants (backed
> by the ConstantValue attribute, a very special JVM feature) is that
> kind of false friend. The observation that switch statements in both
> languages tie nicely to the feature adds to the attraction, but doesn't
> cure the root problem, which is that ConstantValue attributes are an
> out-of-band channel for class APIs that is totally outside of the normal
> runtime linkage paradigm we are using in Panama.
>
> — John
>
> P.S. Maurizio's write-up is here:
> http://cr.openjdk.java.net/~mcimadamore/panama/panama-binder-v3.html
>
> Basic info on object-like vs. function-like macros can be found here:
> https://gcc.gnu.org/onlinedocs/cpp/Object-like-Macros.html
>
More information about the panama-dev
mailing list