RFR(XL): 8185640: Thread-local handshakes

Thu Oct 26 17:00:11 UTC 2017

Hi Andrew,

> On 26 Oct 2017, at 18:05, Andrew Haley <aph at redhat.com> wrote:
> 
>> On 26/10/17 15:39, Erik Österlund wrote:
>> 
>> The reason we do not poll the page in the interpreter is that we
>> need to generate appropriate relocation entries in the code blob for
>> the PCs that we poll on, so that we in the signal handler can look
>> up the code blob, walk the relocation entries, and find precisely
>> why we got the trap, i.e. due to the poll, and precisely what kind
>> of poll, so we know what trampoline needs to be taken into the
>> runtime.
> 
> Not really, no.  If we know that we're in the interpreter and the
> faulting address is the safepoint poll, then we can read all of the
> context we need from the interpreter registers and the frame.

That sounds like what I said. As I said, it is definitely possible to dig out that it was an interpreter safepoint poll causing the trap given the execution context in the interpreter (and appropriate metadata generated for the trapping PC), and send the trapping thread back to a trampoline that saves state appropriately and calls into the runtime to yield to the safepoint synchronizer, like we do for the JIT-compiled code. 

But the cost of the conditional branch is empirically (this was attempted and measured a while ago) approximately the same as the indirect load during "normal circumstances". The indirect load was only marginally better. Therefore that added complexity with the signal handler dance was simply not warranted for the interpreter. It was only warranted when polling in the absolutely most performance critical code, i.e. JIT compiled code.

> 
>> While constructing something that does that is indeed possible, it
>> simply did not seem worth the trouble compared to using a branch in
>> these paths. The same reasoning applies for the poll performed in
>> the native wrapper when waking up from native and transitioning into
>> Java. It performs a conditional branch instead of indirect load to
>> avoid signal handler logic for polls that are not performance
>> critical.
> 
> If we're talking about performance, the existing bytecode interpreter
> is exquisitely carefully coded, even going to the extent of having
> multiple dispatch tables for safepoint- and non-safepoint cases.
> Clearly the original authors weren't thinking that code was not
> performance critical or they wouldn't have done what they did.  I
> suppose, though, that the design we have is from the early days when
> people diligently strove to make the interpreter as fast as possible.

On the other hand, branches have become a lot faster in "recent" years, and this one is particularly trivial to predict. Therefore I prefer to base design decisions on empirical measurements. And introducing that complexity for an close to insignificantly faster interpreter poll does not seem encouraging to me. Do you agree?

Thanks,
/Erik

> -- 
> Andrew Haley
> Java Platform Lead Engineer
> Red Hat UK Ltd. <https://www.redhat.com>
> EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671