RFR(XL): 8185640: Thread-local handshakes

Karen Kinnear karen.kinnear at oracle.com
Mon Oct 23 15:58:55 UTC 2017


Works for me

Thanks,
Karen

> On Oct 23, 2017, at 8:40 AM, Doerr, Martin <martin.doerr at sap.com> wrote:
> 
> Hi Coleen and Robbin,
> 
> I'm ok with putting it into a separate RFE. I understand that there are more fun activities than rebasing this XL change for a long time :-)
> So you don't need to delay it. It's acceptable for me.
> 
> Thanks, Coleen, for sharing your proposal. I appreciate it.
> 
> Best regards,
> Martin
> 
> 
> -----Original Message-----
> From: Robbin Ehn [mailto:robbin.ehn at oracle.com] 
> Sent: Montag, 23. Oktober 2017 17:26
> To: Doerr, Martin <martin.doerr at sap.com>; hotspot-dev developers <hotspot-dev at openjdk.java.net>
> Subject: Re: RFR(XL): 8185640: Thread-local handshakes
> 
> Hi Martin,
> 
>> On 2017-10-18 16:05, Doerr, Martin wrote:
>> Hi Robbin,
>> 
>> thanks for the quick reply and for doing additional benchmarks.
>> Please note that t->does_dispatch() was just a first idea, but doesn't really fit for the purpose because it's false for conditional branch bytecodes for example. I just didn't find an appropriate quick check in the existing code.
>> I guess you will notice a performance impact when benchmarking with -Xint. (I don't know if Oracle usually runs startup performance benchmarks.)
> 
> Yes, we are seeing a performance regression, 2.5%-6% depending on benchmark.
> We are committed to fix this, but it might come as separate RFE/bug depending on 
> the JEP's timeline.
> 
> (If the fix, very unlikely, would not be done before next release, we would 
> change the default to off)
> 
> I hope this is an acceptable path?
> 
> Thanks, Robbin
> 
>> 
>> Best regards,
>> Martin
>> 
>> 
>> -----Original Message-----
>> From: Robbin Ehn [mailto:robbin.ehn at oracle.com]
>> Sent: Mittwoch, 18. Oktober 2017 15:58
>> To: Doerr, Martin <martin.doerr at sap.com>; hotspot-dev developers <hotspot-dev at openjdk.java.net>
>> Subject: Re: RFR(XL): 8185640: Thread-local handshakes
>> 
>> Hi Martin,
>> 
>>> On 2017-10-18 12:11, Doerr, Martin wrote:
>>> Hi Robbin,
>>> 
>>> so you would like to push your version first (as it does not break other platforms) and then help us to push non-Oracle platform implementations which change shared code again?
>>> I'd be fine with that, too.
>> 
>> Yes, great!
>> 
>>> 
>>> While thinking a little longer about the interpreter implementation, a new idea came into my mind.
>>> I think we could significantly reduce impact on interpreter code size and performance by using safepoint polls only in a subset of bytecodes. E.g., we could use only bytecodes which perform any kind of jump by implementing something like
>>> if (SafepointMechanism::uses_thread_local_poll() && t->does_dispatch()) generate_safepoint_poll();
>>> in TemplateInterpreterGenerator::generate_and_dispatch.
>> 
>> We have not seen any performance regression in simple benchmark with this.
>> I will do a better benchmark and compare what difference it makes.
>> 
>> Thanks, Robbin
>> 
>>> 
>>> Best regards,
>>> Martin
>>> 
>>> 
>>> -----Original Message-----
>>> From: Robbin Ehn [mailto:robbin.ehn at oracle.com]
>>> Sent: Mittwoch, 18. Oktober 2017 11:07
>>> To: Doerr, Martin <martin.doerr at sap.com>; hotspot-dev developers <hotspot-dev at openjdk.java.net>
>>> Subject: Re: RFR(XL): 8185640: Thread-local handshakes
>>> 
>>> Thanks for looking at this.
>>> 
>>>> On 2017-10-17 19:58, Doerr, Martin wrote:
>>>> Hi Robbin,
>>>> 
>>>> my first impression is very good. Thanks for providing the webrev.
>>> 
>>> Great!
>>> 
>>>> 
>>>> I only don't like that "poll_page_val | poll_bit()" is used in shared code. I'd prefer to use either one or the other mechanism.
>>>> Would it be ok to move the decision between what to use to platform code?
>>>> (Some platforms could still use both if this is beneficial.)
>>>> 
>>>> E.g. on PPC64, we'd like to use conditional trap instructions with special bit patterns if UseSIGTRAP is on. Would be excellent if we could implement set functions for _poll_armed_value and _poll_disarmed_value in platform code. poll_bit() also fits better into platform code in my opinion.
>>> 
>>> I see no issue with this.
>>> Maybe SafepointMechanism::local_poll_armed should be possibly platform specific.
>>> Can we do this incremental when adding the platform support for PPC64?
>>> 
>>> Thanks, Robbin
>>> 
>>>> 
>>>> Best regards,
>>>> Martin
>>>> 
>>>> 
>>>> -----Original Message-----
>>>> From: hotspot-dev [mailto:hotspot-dev-bounces at openjdk.java.net] On Behalf Of Robbin Ehn
>>>> Sent: Mittwoch, 11. Oktober 2017 15:38
>>>> To: hotspot-dev developers <hotspot-dev at openjdk.java.net>
>>>> Subject: RFR(XL): 8185640: Thread-local handshakes
>>>> 
>>>> Hi all,
>>>> 
>>>> Starting the review of the code while JEP work is still not completed.
>>>> 
>>>> JEP: https://bugs.openjdk.java.net/browse/JDK-8185640
>>>> 
>>>> This JEP introduces a way to execute a callback on threads without performing a global VM safepoint. It makes it both possible and cheap to stop individual threads and not
>>>> just all threads or none.
>>>> 
>>>> Entire changeset:
>>>> http://cr.openjdk.java.net/~rehn/8185640/v0/flat/
>>>> 
>>>> Divided into 3-parts,
>>>> SafepointMechanism abstraction:
>>>> http://cr.openjdk.java.net/~rehn/8185640/v0/SafepointMechanism-0/
>>>> Consolidating polling page allocation:
>>>> http://cr.openjdk.java.net/~rehn/8185640/v0/PollingPage-1/
>>>> Handshakes:
>>>> http://cr.openjdk.java.net/~rehn/8185640/v0/Handshakes-2/
>>>> 
>>>> A handshake operation is a callback that is executed for each JavaThread while that thread is in a safepoint safe state. The callback is executed either by the thread
>>>> itself or by the VM thread while keeping the thread in a blocked state. The big difference between safepointing and handshaking is that the per thread operation will be
>>>> performed on all threads as soon as possible and they will continue to execute as soon as it’s own operation is completed. If a JavaThread is known to be running, then a
>>>> handshake can be performed with that single JavaThread as well.
>>>> 
>>>> The current safepointing scheme is modified to perform an indirection through a per-thread pointer which will allow a single thread's execution to be forced to trap on the
>>>> guard page. In order to force a thread to yield the VM updates the per-thread pointer for the corresponding thread to point to the guarded page.
>>>> 
>>>> Example of potential use-cases:
>>>> -Biased lock revocation
>>>> -External requests for stack traces
>>>> -Deoptimization
>>>> -Async exception delivery
>>>> -External suspension
>>>> -Eliding memory barriers
>>>> 
>>>> All of these will benefit the VM moving towards becoming more low-latency friendly by reducing the number of global safepoints.
>>>> Platforms that do not yet implement the per JavaThread poll, a fallback to normal safepoint is in place. HandshakeOneThread will then be a normal safepoint. The supported
>>>> platforms are Linux x64 and Solaris SPARC.
>>>> 
>>>> Tested heavily with various test suits and comes with a few new tests.
>>>> 
>>>> Performance testing using standardized benchmark show no signification changes, the latest number was -0.7% on Linux x64 and +1.5% Solaris SPARC (not statistically
>>>> ensured). A minor regression for the load vs load load on x64 is expected and a slight increase on SPARC due to the cost of ‘materializing’ the page vs load load.
>>>> The time to trigger a safepoint was measured on a large machine to not be an issue. The looping over threads and arming the polling page will benefit from the work on
>>>> JavaThread life-cycle (8167108 - SMR and JavaThread Lifecycle: http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2017-October/024773.html) which puts all
>>>> JavaThreads in an array instead of a linked list.
>>>> 
>>>> Thanks, Robbin
>>>> 



More information about the hotspot-dev mailing list