RFR(XL): 8185640: Thread-local handshakes
Robbin Ehn
robbin.ehn at oracle.com
Wed Oct 25 13:35:55 UTC 2017
Thanks Coleen, Robbin
On 2017-10-25 15:19, coleen.phillimore at oracle.com wrote:
>
> Hi Robbin,
> This change (with the addition of the poll at wide_ret) looks good. It came out
> nicely in the code.
> thanks,
> Coleen
>
> On 10/24/17 10:54 AM, Robbin Ehn wrote:
>> Hi,
>>
>> I did a fix for the interpreter performance regression, it's plain and simple,
>> I kept the polling code inside dispatch_base but made it optional as the
>> verify oop.
>>
>> Incremental:
>> http://cr.openjdk.java.net/~rehn/8185640/v5/Interpreter-Poll-7/webrev/index.html
>>
>> Manual tested with jstack and it passes: hotspot_tier1, hotspot_handshake
>>
>> It reduces the polling cost of 80%, sensitive benchmarks shows -0.44%
>> regression vs TLH off. More insensitive benchmark show no regression.
>>
>> Thanks, Robbin
>>
>> On 2017-10-23 17:58, Karen Kinnear wrote:
>>> Works for me
>>>
>>> Thanks,
>>> Karen
>>>
>>>> On Oct 23, 2017, at 8:40 AM, Doerr, Martin <martin.doerr at sap.com> wrote:
>>>>
>>>> Hi Coleen and Robbin,
>>>>
>>>> I'm ok with putting it into a separate RFE. I understand that there are more
>>>> fun activities than rebasing this XL change for a long time :-)
>>>> So you don't need to delay it. It's acceptable for me.
>>>>
>>>> Thanks, Coleen, for sharing your proposal. I appreciate it.
>>>>
>>>> Best regards,
>>>> Martin
>>>>
>>>>
>>>> -----Original Message-----
>>>> From: Robbin Ehn [mailto:robbin.ehn at oracle.com]
>>>> Sent: Montag, 23. Oktober 2017 17:26
>>>> To: Doerr, Martin <martin.doerr at sap.com>; hotspot-dev developers
>>>> <hotspot-dev at openjdk.java.net>
>>>> Subject: Re: RFR(XL): 8185640: Thread-local handshakes
>>>>
>>>> Hi Martin,
>>>>
>>>>> On 2017-10-18 16:05, Doerr, Martin wrote:
>>>>> Hi Robbin,
>>>>>
>>>>> thanks for the quick reply and for doing additional benchmarks.
>>>>> Please note that t->does_dispatch() was just a first idea, but doesn't
>>>>> really fit for the purpose because it's false for conditional branch
>>>>> bytecodes for example. I just didn't find an appropriate quick check in the
>>>>> existing code.
>>>>> I guess you will notice a performance impact when benchmarking with -Xint.
>>>>> (I don't know if Oracle usually runs startup performance benchmarks.)
>>>>
>>>> Yes, we are seeing a performance regression, 2.5%-6% depending on benchmark.
>>>> We are committed to fix this, but it might come as separate RFE/bug
>>>> depending on
>>>> the JEP's timeline.
>>>>
>>>> (If the fix, very unlikely, would not be done before next release, we would
>>>> change the default to off)
>>>>
>>>> I hope this is an acceptable path?
>>>>
>>>> Thanks, Robbin
>>>>
>>>>>
>>>>> Best regards,
>>>>> Martin
>>>>>
>>>>>
>>>>> -----Original Message-----
>>>>> From: Robbin Ehn [mailto:robbin.ehn at oracle.com]
>>>>> Sent: Mittwoch, 18. Oktober 2017 15:58
>>>>> To: Doerr, Martin <martin.doerr at sap.com>; hotspot-dev developers
>>>>> <hotspot-dev at openjdk.java.net>
>>>>> Subject: Re: RFR(XL): 8185640: Thread-local handshakes
>>>>>
>>>>> Hi Martin,
>>>>>
>>>>>> On 2017-10-18 12:11, Doerr, Martin wrote:
>>>>>> Hi Robbin,
>>>>>>
>>>>>> so you would like to push your version first (as it does not break other
>>>>>> platforms) and then help us to push non-Oracle platform implementations
>>>>>> which change shared code again?
>>>>>> I'd be fine with that, too.
>>>>>
>>>>> Yes, great!
>>>>>
>>>>>>
>>>>>> While thinking a little longer about the interpreter implementation, a new
>>>>>> idea came into my mind.
>>>>>> I think we could significantly reduce impact on interpreter code size and
>>>>>> performance by using safepoint polls only in a subset of bytecodes. E.g.,
>>>>>> we could use only bytecodes which perform any kind of jump by implementing
>>>>>> something like
>>>>>> if (SafepointMechanism::uses_thread_local_poll() && t->does_dispatch())
>>>>>> generate_safepoint_poll();
>>>>>> in TemplateInterpreterGenerator::generate_and_dispatch.
>>>>>
>>>>> We have not seen any performance regression in simple benchmark with this.
>>>>> I will do a better benchmark and compare what difference it makes.
>>>>>
>>>>> Thanks, Robbin
>>>>>
>>>>>>
>>>>>> Best regards,
>>>>>> Martin
>>>>>>
>>>>>>
>>>>>> -----Original Message-----
>>>>>> From: Robbin Ehn [mailto:robbin.ehn at oracle.com]
>>>>>> Sent: Mittwoch, 18. Oktober 2017 11:07
>>>>>> To: Doerr, Martin <martin.doerr at sap.com>; hotspot-dev developers
>>>>>> <hotspot-dev at openjdk.java.net>
>>>>>> Subject: Re: RFR(XL): 8185640: Thread-local handshakes
>>>>>>
>>>>>> Thanks for looking at this.
>>>>>>
>>>>>>> On 2017-10-17 19:58, Doerr, Martin wrote:
>>>>>>> Hi Robbin,
>>>>>>>
>>>>>>> my first impression is very good. Thanks for providing the webrev.
>>>>>>
>>>>>> Great!
>>>>>>
>>>>>>>
>>>>>>> I only don't like that "poll_page_val | poll_bit()" is used in shared
>>>>>>> code. I'd prefer to use either one or the other mechanism.
>>>>>>> Would it be ok to move the decision between what to use to platform code?
>>>>>>> (Some platforms could still use both if this is beneficial.)
>>>>>>>
>>>>>>> E.g. on PPC64, we'd like to use conditional trap instructions with
>>>>>>> special bit patterns if UseSIGTRAP is on. Would be excellent if we could
>>>>>>> implement set functions for _poll_armed_value and _poll_disarmed_value in
>>>>>>> platform code. poll_bit() also fits better into platform code in my opinion.
>>>>>>
>>>>>> I see no issue with this.
>>>>>> Maybe SafepointMechanism::local_poll_armed should be possibly platform
>>>>>> specific.
>>>>>> Can we do this incremental when adding the platform support for PPC64?
>>>>>>
>>>>>> Thanks, Robbin
>>>>>>
>>>>>>>
>>>>>>> Best regards,
>>>>>>> Martin
>>>>>>>
>>>>>>>
>>>>>>> -----Original Message-----
>>>>>>> From: hotspot-dev [mailto:hotspot-dev-bounces at openjdk.java.net] On Behalf
>>>>>>> Of Robbin Ehn
>>>>>>> Sent: Mittwoch, 11. Oktober 2017 15:38
>>>>>>> To: hotspot-dev developers <hotspot-dev at openjdk.java.net>
>>>>>>> Subject: RFR(XL): 8185640: Thread-local handshakes
>>>>>>>
>>>>>>> Hi all,
>>>>>>>
>>>>>>> Starting the review of the code while JEP work is still not completed.
>>>>>>>
>>>>>>> JEP: https://bugs.openjdk.java.net/browse/JDK-8185640
>>>>>>>
>>>>>>> This JEP introduces a way to execute a callback on threads without
>>>>>>> performing a global VM safepoint. It makes it both possible and cheap to
>>>>>>> stop individual threads and not
>>>>>>> just all threads or none.
>>>>>>>
>>>>>>> Entire changeset:
>>>>>>> http://cr.openjdk.java.net/~rehn/8185640/v0/flat/
>>>>>>>
>>>>>>> Divided into 3-parts,
>>>>>>> SafepointMechanism abstraction:
>>>>>>> http://cr.openjdk.java.net/~rehn/8185640/v0/SafepointMechanism-0/
>>>>>>> Consolidating polling page allocation:
>>>>>>> http://cr.openjdk.java.net/~rehn/8185640/v0/PollingPage-1/
>>>>>>> Handshakes:
>>>>>>> http://cr.openjdk.java.net/~rehn/8185640/v0/Handshakes-2/
>>>>>>>
>>>>>>> A handshake operation is a callback that is executed for each JavaThread
>>>>>>> while that thread is in a safepoint safe state. The callback is executed
>>>>>>> either by the thread
>>>>>>> itself or by the VM thread while keeping the thread in a blocked state.
>>>>>>> The big difference between safepointing and handshaking is that the per
>>>>>>> thread operation will be
>>>>>>> performed on all threads as soon as possible and they will continue to
>>>>>>> execute as soon as it’s own operation is completed. If a JavaThread is
>>>>>>> known to be running, then a
>>>>>>> handshake can be performed with that single JavaThread as well.
>>>>>>>
>>>>>>> The current safepointing scheme is modified to perform an indirection
>>>>>>> through a per-thread pointer which will allow a single thread's execution
>>>>>>> to be forced to trap on the
>>>>>>> guard page. In order to force a thread to yield the VM updates the
>>>>>>> per-thread pointer for the corresponding thread to point to the guarded
>>>>>>> page.
>>>>>>>
>>>>>>> Example of potential use-cases:
>>>>>>> -Biased lock revocation
>>>>>>> -External requests for stack traces
>>>>>>> -Deoptimization
>>>>>>> -Async exception delivery
>>>>>>> -External suspension
>>>>>>> -Eliding memory barriers
>>>>>>>
>>>>>>> All of these will benefit the VM moving towards becoming more low-latency
>>>>>>> friendly by reducing the number of global safepoints.
>>>>>>> Platforms that do not yet implement the per JavaThread poll, a fallback
>>>>>>> to normal safepoint is in place. HandshakeOneThread will then be a normal
>>>>>>> safepoint. The supported
>>>>>>> platforms are Linux x64 and Solaris SPARC.
>>>>>>>
>>>>>>> Tested heavily with various test suits and comes with a few new tests.
>>>>>>>
>>>>>>> Performance testing using standardized benchmark show no signification
>>>>>>> changes, the latest number was -0.7% on Linux x64 and +1.5% Solaris SPARC
>>>>>>> (not statistically
>>>>>>> ensured). A minor regression for the load vs load load on x64 is expected
>>>>>>> and a slight increase on SPARC due to the cost of ‘materializing’ the
>>>>>>> page vs load load.
>>>>>>> The time to trigger a safepoint was measured on a large machine to not be
>>>>>>> an issue. The looping over threads and arming the polling page will
>>>>>>> benefit from the work on
>>>>>>> JavaThread life-cycle (8167108 - SMR and JavaThread Lifecycle:
>>>>>>> http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2017-October/024773.html)
>>>>>>> which puts all
>>>>>>> JavaThreads in an array instead of a linked list.
>>>>>>>
>>>>>>> Thanks, Robbin
>>>>>>>
>>>
>
More information about the hotspot-dev
mailing list