[foreign] RFR 8219470: Use clang API to parse macros

Maurizio Cimadamore maurizio.cimadamore at oracle.com
Wed Feb 20 18:56:22 UTC 2019


Hi,
macro support in jextract was added some time ago, using javac's 
constant folding support to evaluate expressions. While clever, that 
approach has some limitations - namely it is not possible for it to 
understand types that belong to the C language. This will make 
eventually impossible to support constants such as this:

#define PTR (void*)0

Clang offers an 'evaluation' API [1], but unfortunately this API 
inexplicably doesn't work on macros. But it does work on regular 
variable declarations. So here's an idea - given a macro of the kind:

#define NAME VALUE

let's generate a snippet like this:

__auto_type jextract$NAME = NAME;

and see what comes out of clang. The __auto_type extension is a GNU 
extension which is also supported by clang [2]; this is rather handy 
because it allows us to rely on clang to do type inference too!

The problem with this approach is, of course, to speed up the snippet 
recompilation enough - to this extent three measures were taken:

* instead of generating a snippet with an #include - using clang API [3] 
we save the jextract translation unit onto a precompiled header - these 
headers won't change anyway

* we then parse the snippet with -import-pch <precompiler header>; this 
allows to skip all symbols that are defined outside the snippet (to do 
this you have to create a 'local' Index - that's why I exposed that part 
of the clang API)

* instead of writing onto a file over and over, we make use of clang's 
in-memory file support [4]. This allows us to create an empty file once, 
and to keep passing snippets as strings in memory.

The result is quite pleasing - not only we now parse macros 'the right 
way' but performances got a significant bump; on my machine (before/after):

Opengl 5s/3s
Python 6s/3.7s
Ncurses 3s/1.5s

Almost 2x boost - not bad. On top of that, by diffing the --log FINE 
output it seems like the new implementation is able to pick up an 
handful of constants that were left out in the previous implementation.

Note that, this patch retains the previous optimization for special 
casing simple numeric #define - where we just try to parse the number in 
Java. For API such as OpenGL with loads of constants, this is an 
essential optimization.

Webrev:
http://cr.openjdk.java.net/~mcimadamore/panama/8219470/

Cheers
Maurizio

[1] - 
https://clang.llvm.org/doxygen/group__CINDEX__MISC.html#ga6be809ca82538f4a610d9a5b18a10ccb
[2] - https://reviews.llvm.org/D12686
[3] - 
https://clang.llvm.org/doxygen/group__CINDEX__TRANSLATION__UNIT.html#ga3abe9df81f9fef269d737d82720c1d33
[4] - https://clang.llvm.org/doxygen/structCXUnsavedFile.html





More information about the panama-dev mailing list