[foreign] road to posix

John Rose john.r.rose at oracle.com
Tue May 29 21:51:31 UTC 2018


TL;DR: We need a variety of binding strategies to cover the variety
of macro types and complexities.  Complexities range from simple
constants to wacky throwaways; strategies range from pure metadata
to machine code snippets.  All of this is in reach, and gives us a big
step towards C++.  Also, beware of false friends.

On May 29, 2018, at 7:51 AM, Maurizio Cimadamore <maurizio.cimadamore at oracle.com> wrote:
> 
> So, to close the loop,  it seems like the problem I had with RTLD_DEFAULT was _not_caused by jextract not supporting constant macros - but, rather, with jextract not recognizing that specific kind of macro constant.

So, macro constants are a special case of what C/++ calls "object-like
macros", as opposed to "function-like macros".

The general case is intractible, of course, because it allows wacky stuff like:

  #define BEGIN {
  #define END }

C programs sometimes pull stunts like this to hack the language, including basic
notions of block structure.  Programmer cleverness is unbounded and unrestrained,
isn't it?

A real and inconvenient example of this programmer cleverness is the traditional
treatment of fields of struct stat.  They are often defined with object-like macros
as follows:

  #define st_atime st_atimespec.tv_sec
  /* source: /usr/include/sys/stat.h on MacOS */

We can't capture this in a systematic way.  Following Maurizio's write-up,
the plan of record for dealing with such things is to ask the jextract user to define
function-like macros and/or true functions which are canonical uses of such
oddly-shaped macros, and fold those into the jextract run, as auxiliary functions
(or macros) explicitly requested by the jextract operator, in a side-file.

(Sometimes the existing header files feature such macros as well; sys/stat.h
has S_ISBLK(m) for example.  In that case, if the user can supply just the
function type, and link it to the macro, jextract could make the wrapper.  This
is the way I handled S_ISBLK once upon a time, but I think for Panama it
is an unnecessary refinement.  See discussion below of level 2 vs. level 3.)

Capturing such irregular standard macros, as well as auxiliary functions
added by the jextract user, requires the jextract to wrap such expressions in
properly-typed code snippets, compile them, and deliver them as machine
code resources to the binder.  This is something we're just beginning to get
into with Panama, although I and others like Samuel have pulled this trick
on other projects.  This trick works for object-like macros as well as function-like
macros; simply (as we are discussing) wrap the object-like expression in a
no-argument code snippet, and invoke it appropriately from the binder
(either once statically or once per use).

But most object-like macros are much simpler, and we can be correspondingly
more automatic in handling them.  A canonical case is EOF from stdio.h,
which is usually just (-1).

(I think stdio.h is rich with canonical examples, including varargs functions
like printf and macro versions of fileno.  That, stat.h, and qsort in stdlib.h
are IMO the trifecta of tricky initial demos for header extraction.)

The special case of an integral constant expression is tractable (although
there are limitations and pitfalls).  The C preprocessor actually defines a
sub-language for C expressions, which is evaluated by the #if directive.
It would be reasonable for us to handle that sub-language specially and
precisely, although the sub-language includes recursive calls to object-like
and function-like macros (because the cpp macro-expands #if arguments).
For example:

  #define MASK(n) ((1<<(n))-1)
  #define THREE
  #define MASK3 MASK(THREE)
  #if (MASK3 == 7)  /* yes, it's true! */

The #if argument evaluates at cpp-time (and jextract time!) to 1.

When we get to non-integral types like NULL and RTLD_DEFAULT we have
a choice:  Try to decide if we can extend the constant-folding logic to deduce
a pointer bit-pattern plus a type, which we can then capture.

There are three levels of possible support for borderline cases like
RTLD_SUPPORT:

1. Inferred metadata:  Infer the type and the bitwise value in jextract, and
store it all in metadata (annotations).

2. Inferred snippet:  Infer the type, don't infer the value, but wrap the
expression in a snippet for the binder to invoke.

3. Explicit snippet:  Take advice from the user, in the form of an auxiliary
function, which is wrapped in a snippet for the binder, and accompanied
by metadata derived from the type of the explicit auxiliary function.

(The fourth possibility of explicit metadata might be useful if there were
a hard requirement to specify that an expression like (-1) needed a surprising
type that couldn't be inferred.  But I think that corner case can be handled well
enough as an explicit snippet; just ask the user to specify the whole thing
and over-package it as a machine code, even if we could transmit it as
metadata.)

Inferring a type is tricky; sometimes there is ambiguity and sometimes
you just have to guess.  Guessing is OK for very simple macros like max.
Sometimes you want to be allowed to make several guesses and transmit
them all (as overloads).  Here's max:

  #define max(a, b) ((a) > (b) ? (a) : (b))  // int32? int64? void*? double?

Currently we do the inferred metadata (level 1), in a limited way, for simple
expressions.  The next thing we should do, I would say, is the other extreme,
the explicit snippet (level 3).  If that is usable, then we don't need to fiddle
the the hybrid inferred snippet, for macros.

The middle level (inferred snippet) will really come into its own with C++,
where almost every API point, for classes rich in inline access functions,
will have to be an inferred snippet.

Basically, each level is a distinct binding strategy, covering more or less
general cases, requiring more or less help from the jextract operator,
and passing information to the binder using a mix of strategies.

A final topic:  There needs to be some continuity between levels, in terms
of how jextract allocates them to carrier interfaces in Java, and how the
binder attaches them to those interfaces.  Small platform configuration
chages like -U__GNUC__ will shift C API points from one level to
another, and it would be best if jextract and the binder could hide those
effects.

This has a very practical implication:  We sometimes need to trade
for continuity over clever ad hoc use of Java features that are peculiar
to one level (one binding strategy).

I think the most common case where this trade-off is felt is with a simple
a level 1 object-like macro constant.  We might want to use a binding
strategy of a static interface constant (via ConstantValue attribute in
the classfile).  But this will disrupt Java APIs if there is any chance
that the macro might be re-extracted as a level 2 or level 3 value
(a snippet), since there is no way to bind the result of executing the
snippet to a ConstantValue attribute.  Being clever with an ad hoc
level-specific binding strategy risks make the Java API irregular.

In the case of ConstantValue attributes, we've repeatedly examined
them and found them inappropriate for Panama.  The biggest problem
is that Java client code might be compiled against one version of an
extracted interface, and then be used at runtime against a different
version (on a different platform).  In that case, the ConstantValue from
the first version would be used with code from the second version,
which would risk very subtle bugs.  The cure is to avoid the temptation
to be too clever in binding C constants to Java constants.

This trade-off tends to push *all* extracted constant values to be
function-like at the JVM level instead of value-like.  I think this is OK;
we don't have to use every last Java language feature to carry C APIs.

Some Java language features are "false friends"; they look good when
you first meet them but if you start to work with them, they cause problems.

Examples of false friends spring easily to mind:  C enums are so different
from Java enums that we cannot translate the former into the latter;
the most we could do is have a very special, rarely used switch for
the jextract operator to explicitly opt into Java enums.  The same
point holds for C arrays vs. Java arrays.  (Maurizio's write-up deals
with these cases nicely.) In C++ there are other false friends: C++
constructors are very different from Java constructors, etc.

I think that the very special Java feature of static constants (backed
by the ConstantValue attribute, a very special JVM feature) is that
kind of false friend.  The observation that switch statements in both
languages tie nicely to the feature adds to the attraction, but doesn't
cure the root problem, which is that ConstantValue attributes are an
out-of-band channel for class APIs that is totally outside of the normal
runtime linkage paradigm we are using in Panama.

— John

P.S. Maurizio's write-up is here:
  http://cr.openjdk.java.net/~mcimadamore/panama/panama-binder-v3.html

Basic info on object-like vs. function-like macros can be found here:
  https://gcc.gnu.org/onlinedocs/cpp/Object-like-Macros.html



More information about the panama-dev mailing list