[foreign] RFR 8219470: Use clang API to parse macros

Wed Feb 20 21:38:28 UTC 2019

On 20/02/2019 20:28, John Rose wrote:
> Very cool use of appropriate technologies.
>
> I have one suggestion.
>
>> On Feb 20, 2019, at 10:56 AM, Maurizio Cimadamore <maurizio.cimadamore at oracle.com> wrote:
>>
>> Note that, this patch retains the previous optimization for special casing simple numeric #define - where we just try to parse the number in Java. For API such as OpenGL with loads of constants, this is an essential optimization.
> Instead of being given to a special case interpreter, those could be handled more uniformly by being handed to the same PCH technique you have created. Because they are unlikely to fail they can be batched, which will probably lead to similar perf as the present technique. Occasional failures can be removed and the remaining batch rerun if needed. The batches can be split at failure points to shake out two-point failures at the cost of a log n multiplier for divide and conquer in the worst case.
>
> Basic idea: handle safe macros batch style and doubtful ones one at a time. Demote safe to doubtful when they fail.
>
> This is complicated but leads to a uniform translation. It probably gives batch performance in most cases. And it allows the ad hoc interpreter to be retired. I think this probably works out to a wash in complexity and small improvements in maintainability and bug resistance.
>
> Sound plausible?  Or is it just the hack of the day?

Separating safe from risky macros is something that did not occurred to 
me - I was thinking about batch processing them all, but I think that is 
just too risky.

For now I think I'd prefer to just go ahead with what I have: this patch 
essentially replaces the javac-based logic with the clang API - we can 
have followup work to get rid of the remaining asymmetries. One reason 
why I went down the current path is that the ad-hoc step is actually 3-4 
lines of code, so it's very cheap, and, while theoretically there could 
be differences w.r.t. clang, it is also very hard to come up with 
concrete examples. When you start batching, as you described, the 
complexity required to handle failures with retrial logic will probably 
end up being considerably more than that of the fast path we have now. 
E.g. I kind of disagree that failure handling of the kind you described 
would be a wash complexity-wise.

That said, batching is an obvious way to get even better performances 
out of this machinery - the problem is to come up with a quick and 
mistake-free classification. For instance, if we had a very simple 
regex-based parser we could easily recognize almost all kind of numeric 
expressions (even with parenthesis and binary operators), and classify 
them as safe for batching. If the classifiication logic is mistake-free, 
you give up some performance gain (as you might miss something that was 
in fact batch-ready), but then you don't pay any cost for recovery. And 
such an approach would be extensible - we could start simple, and batch 
just numeric tokens and string literals. Then we can add support for 
binary operators, references to other macros...

One implementation detail I wanted to say in the email but I forgot: 
this patch assumes that two header files won't declare the same #define 
constant with different values. If you do that, the semantics of the 
program is unspecified, as per C spec (and all compilers warn against 
that). So, while I could take a more 'correct' approach and evaluate a 
snippet only in the context of the header in which its corresponding 
macro is defined, I decided not to do that, because that seems excessive 
(e.g. in the OpenGL case that amounts at generating 20-30 new 
compilation units and precompiled headers), for very little gain, since 
the behavior is not spec'd anyway.

Maurizio
> — John
>
>
>