RFR(XL): 8185640: Thread-local handshakes

Thu Oct 26 17:19:06 UTC 2017

On 26/10/17 18:00, Erik Osterlund wrote:
> Hi Andrew,
> 
>> On 26 Oct 2017, at 18:05, Andrew Haley <aph at redhat.com> wrote:
>>
>>> On 26/10/17 15:39, Erik Österlund wrote:
>>>
>>> The reason we do not poll the page in the interpreter is that we
>>> need to generate appropriate relocation entries in the code blob for
>>> the PCs that we poll on, so that we in the signal handler can look
>>> up the code blob, walk the relocation entries, and find precisely
>>> why we got the trap, i.e. due to the poll, and precisely what kind
>>> of poll, so we know what trampoline needs to be taken into the
>>> runtime.
>>
>> Not really, no.  If we know that we're in the interpreter and the
>> faulting address is the safepoint poll, then we can read all of the
>> context we need from the interpreter registers and the frame.
> 
> That sounds like what I said.

Not exactly.  We do not need to generate any more relocation entries.

> But the cost of the conditional branch is empirically (this was
> attempted and measured a while ago) approximately the same as the
> indirect load during "normal circumstances". The indirect load was
> only marginally better.

That's interesting.  The cost of the SEGV trap going through the
kernel is fairly high, and I'm now wondering if, for very fast
safepoint responses, we'd be better off not doing it.  The cost of the
write protect, given that it probably involves an IPI on all
processors, isn't cheap either.

>>> While constructing something that does that is indeed possible, it
>>> simply did not seem worth the trouble compared to using a branch in
>>> these paths. The same reasoning applies for the poll performed in
>>> the native wrapper when waking up from native and transitioning into
>>> Java. It performs a conditional branch instead of indirect load to
>>> avoid signal handler logic for polls that are not performance
>>> critical.
>>
>> If we're talking about performance, the existing bytecode interpreter
>> is exquisitely carefully coded, even going to the extent of having
>> multiple dispatch tables for safepoint- and non-safepoint cases.
>> Clearly the original authors weren't thinking that code was not
>> performance critical or they wouldn't have done what they did.  I
>> suppose, though, that the design we have is from the early days when
>> people diligently strove to make the interpreter as fast as possible.
> 
> On the other hand, branches have become a lot faster in "recent"
> years, and this one is particularly trivial to predict. Therefore I
> prefer to base design decisions on empirical measurements. And
> introducing that complexity for an close to insignificantly faster
> interpreter poll does not seem encouraging to me. Do you agree?

Perhaps.  It's interesting that the result falls one way in compiled
code and the other in interpreted code.  If the choice is so very
finely balanced, though, it sort-of makes sense.

-- 
Andrew Haley
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671