[foreign] RFR 8219470: Use clang API to parse macros
Maurizio Cimadamore
maurizio.cimadamore at oracle.com
Wed Feb 20 21:38:28 UTC 2019
On 20/02/2019 20:28, John Rose wrote:
> Very cool use of appropriate technologies.
>
> I have one suggestion.
>
>> On Feb 20, 2019, at 10:56 AM, Maurizio Cimadamore <maurizio.cimadamore at oracle.com> wrote:
>>
>> Note that, this patch retains the previous optimization for special casing simple numeric #define - where we just try to parse the number in Java. For API such as OpenGL with loads of constants, this is an essential optimization.
> Instead of being given to a special case interpreter, those could be handled more uniformly by being handed to the same PCH technique you have created. Because they are unlikely to fail they can be batched, which will probably lead to similar perf as the present technique. Occasional failures can be removed and the remaining batch rerun if needed. The batches can be split at failure points to shake out two-point failures at the cost of a log n multiplier for divide and conquer in the worst case.
>
> Basic idea: handle safe macros batch style and doubtful ones one at a time. Demote safe to doubtful when they fail.
>
> This is complicated but leads to a uniform translation. It probably gives batch performance in most cases. And it allows the ad hoc interpreter to be retired. I think this probably works out to a wash in complexity and small improvements in maintainability and bug resistance.
>
> Sound plausible? Or is it just the hack of the day?
Separating safe from risky macros is something that did not occurred to
me - I was thinking about batch processing them all, but I think that is
just too risky.
For now I think I'd prefer to just go ahead with what I have: this patch
essentially replaces the javac-based logic with the clang API - we can
have followup work to get rid of the remaining asymmetries. One reason
why I went down the current path is that the ad-hoc step is actually 3-4
lines of code, so it's very cheap, and, while theoretically there could
be differences w.r.t. clang, it is also very hard to come up with
concrete examples. When you start batching, as you described, the
complexity required to handle failures with retrial logic will probably
end up being considerably more than that of the fast path we have now.
E.g. I kind of disagree that failure handling of the kind you described
would be a wash complexity-wise.
That said, batching is an obvious way to get even better performances
out of this machinery - the problem is to come up with a quick and
mistake-free classification. For instance, if we had a very simple
regex-based parser we could easily recognize almost all kind of numeric
expressions (even with parenthesis and binary operators), and classify
them as safe for batching. If the classifiication logic is mistake-free,
you give up some performance gain (as you might miss something that was
in fact batch-ready), but then you don't pay any cost for recovery. And
such an approach would be extensible - we could start simple, and batch
just numeric tokens and string literals. Then we can add support for
binary operators, references to other macros...
One implementation detail I wanted to say in the email but I forgot:
this patch assumes that two header files won't declare the same #define
constant with different values. If you do that, the semantics of the
program is unspecified, as per C spec (and all compilers warn against
that). So, while I could take a more 'correct' approach and evaluate a
snippet only in the context of the header in which its corresponding
macro is defined, I decided not to do that, because that seems excessive
(e.g. in the OpenGL case that amounts at generating 20-30 new
compilation units and precompiled headers), for very little gain, since
the behavior is not spec'd anyway.
Maurizio
> — John
>
>
>
More information about the panama-dev
mailing list