RFR(m): 8220351: Cross-modifying code
Doerr, Martin
martin.doerr at sap.com
Tue Mar 19 14:04:05 UTC 2019
Hi Robbin,
I just tried a small stress test [1] with (+UseNewCode) and without added CPUID instruction in native wrapper [2].
jdk/bin/java -XX:+UnlockDiagnosticVMOptions StressJVMEntries
result: 2113929216, duration: 40.1181962
jdk/bin/java -XX:+UnlockDiagnosticVMOptions -XX:+UseNewCode StressJVMEntries
result: 2113929216, duration: 76.0759127
Executed on 40 core 2-socket Intel(R) Xeon(R) CPU E5-2660 v3 @ 2.60GHz.
So I guess nobody wants CPUID on the regular path if it can be avoided.
Best regards,
Martin
[1] StressJVMEntries.java:
public class StressJVMEntries{
static long sum = 0;
static final int loop_cnt = 10000000;
public static void main(String args[]){
Runtime rt = Runtime.getRuntime();
long duration = System.nanoTime();
for (int x=0; x<loop_cnt; x++) sum += rt.totalMemory();
duration = System.nanoTime() - duration;
System.out.println("result: " + sum/loop_cnt + ", duration: " + (double)duration/loop_cnt);
}
}
[2] patch:
diff -r 3827cd66e788 src/hotspot/cpu/x86/sharedRuntime_x86_64.cpp
--- a/src/hotspot/cpu/x86/sharedRuntime_x86_64.cpp Mon Mar 18 16:04:23 2019 +0100
+++ b/src/hotspot/cpu/x86/sharedRuntime_x86_64.cpp Tue Mar 19 15:02:31 2019 +0100
@@ -2607,6 +2607,18 @@
}
__ bind(Continue);
+ if (UseNewCode) {
+ __ push(rax);
+ __ push(rbx);
+ __ push(rcx);
+ __ push(rdx);
+ __ xorl(rax, rax);
+ __ cpuid();
+ __ pop(rdx);
+ __ pop(rcx);
+ __ pop(rbx);
+ __ pop(rax);
+ }
}
// change thread state
-----Original Message-----
From: Robbin Ehn <robbin.ehn at oracle.com>
Sent: Freitag, 15. März 2019 13:32
To: David Holmes <david.holmes at oracle.com>; Reingruber, Richard <richard.reingruber at sap.com>; Doerr, Martin <martin.doerr at sap.com>; Andrew Haley <aph at redhat.com>; hotspot-dev at openjdk.java.net
Subject: Re: RFR(m): 8220351: Cross-modifying code
>>
>> The last point was rather exploring the space of legal solutions than a
>> suggestion to solve the issue. I'd favor your approach if you manage to do the
>> serialization conditionally (BTW: a mechanism similar to
>> JavaThread::*deopt_suspend*
>> would be yet another way to do that).
FYI: In the long run I'm removing the suspend-flags and turning suspend/deopt/..
into handshakes instead. So we only have the poll to check.
I'm testing a change where I removed the changes to the native wrappers,
leaving native threads armed (as Martin suggested) and in the poll path notice
that we are still armed (when returning from a safepoint/handshake we should be
disarmed). Disarm yourself, re-check safepoint/handshake, execute instruction
barrier.
So we don't need any changes in any native wrapper, we handle the barrier in the
polling code instead.
I have just one issue to sort-out with that and benchmarking.
The v2 patch (not yet published) didn't show to bad of a result when we always
emit cpuid, but there is no super JNI heavy test in that suite.
/Robbin
>>
>> Cheers, Richard.
>>
>> [1] 8.1.3 Handling Self- and Cross-Modifying Code
>> in Intel(R) 64 and IA-32 ArchitecturesSoftware Developer's Manual. Volume 3
>> (3A, 3B, 3C & 3D): System Programming Guide
>>
>> https://software.intel.com/sites/default/files/managed/a4/60/325384-sdm-vol-3abcd.pdf
>>
>> on https://software.intel.com/en-us/articles/intel-sdm
>>
>> -----Original Message-----
>> From: Robbin Ehn <robbin.ehn at oracle.com>
>> Sent: Donnerstag, 14. März 2019 13:24
>> To: Reingruber, Richard <richard.reingruber at sap.com>; Doerr, Martin
>> <martin.doerr at sap.com>; David Holmes <david.holmes at oracle.com>;
>> hotspot-dev at openjdk.java.net
>> Subject: Re: RFR(m): 8220351: Cross-modifying code
>>
>> Hi Richard,
>>
>> On 3/14/19 11:12 AM, Reingruber, Richard wrote:
>>> Hi Robbin,
>>>
>>> > the reason I heard for not doing invalidate_range/word in nmethod::oops_do
>>> > is that it was deemed that a safepoint is 'long enough'.
>>>
>>> This doesn't seem to be a very good reason. I did a little modification to your
>>> initial figure:
>>
>> That's why we are adding the instruction barrier!
>>
>>>
>>> JavaThread: | VMThread
>>> StoreLoad |
>>> | update <immediate oop>
>>> | ? Global ICache Invalidate ?
>>> | disarm
>>> load thread_poll |
>>>
>>> Question would be: does an ICache flush happen after the immediate oop update?
>>> I would consider it problematic if not. Even if the safepoint was very long.
>>
>> The icache flush assumes it can stop speculation and flush instruction cache in
>> patchee thread.
>> Correct me if anything is wrong:
>> - sparc needs flush <adr> in all patchee threads.
>> - arm/aarch64 needs isb + ___clear_cache(adr,..) in all patchee threads.
>> - ppc needs isync in all patchee threads?
>> - s390 needs zarch_sync all in patchee threads?
>> - x86 needs cpuid in all in patchee threads.
>>
>> So a "Global ICache Invalidate" done by patcher thread (VMThread in this case)
>> have no effect in patchee threads!?
>>
>> Adding a few words on x86 ICache::Invalidate/word.
>> We have TSO, so if the patchee thread can see poll is disarmed all other stores
>> are visible. I have seen no case ICache::Invalidate/word would make a difference
>> from a StoreStore/compiler barrier (still talking about platform code for x86) ?
>> The problem is if we prefect an instruction over the load of the poll.
>> If poll is disarmed this perfected instruction can be fully executed.
>> If that happens to contain an immediate oop, we don't know if it's the patched
>> oop or not.
>>
>> /Robbin
>>
>>>
>>> Cheers Richard.
>>>
>>> -----Original Message-----
>>> From: Robbin Ehn <robbin.ehn at oracle.com>
>>> Sent: Donnerstag, 14. März 2019 10:34
>>> To: Doerr, Martin <martin.doerr at sap.com>; David Holmes
>>> <david.holmes at oracle.com>; hotspot-dev at openjdk.java.net
>>> Cc: Reingruber, Richard <richard.reingruber at sap.com>
>>> Subject: Re: RFR(m): 8220351: Cross-modifying code
>>>
>>> Hi Martin,
>>>
>>> On 3/14/19 10:25 AM, Doerr, Martin wrote:
>>>> Hi Robbin,
>>>>
>>>>> But that is whole new subject and let's keep that separate from this :)
>>>> Sure, we can keep the modification side discussion separate. This change is
>>>> already difficult enough
>>>>
>>>>
>>>> Btw. I haven't sent the ppc/s390 instruction cache barriers. Here they are:
>>>>
>>>> orderAccess_aix/linux_ppc.hpp:
>>>> inline void OrderAccess::instruction_pipeline() { inlasm_isync(); }
>>>>
>>>> orderAccess_linux_s390.hpp:
>>>> inline void OrderAccess::instruction_pipeline() { inlasm_zarch_sync(); }
>>>>
>>>
>>> Thanks!
>>>
>>>>> nmethod::oops_do calls the closure f->do_oop(r->oop_addr()) for oops
>>>>> encoded into the instruction stream.
>>>>> I couldn't find a call to AbstractICache::invalidate_range.
>>>
>>> I think I miss-understood you here, the reason I heard for not doing
>>> invalidate_range/word in nmethod::oops_do is that it was deemed that a
>>> safepoint is 'long enough'.
>>>
>>> /Robbin
>>>
>>>>
>>>> Look for ICache::invalidate_range, I see 11 calls on x64.
>>>> And one call to AbstractICache/ICache::invalidate_word.
>>>> Btw I have spent some time looking at x86/icache_x86.cpp...
>>>> But that is whole new subject and let's keep that separate from this :)
>>>>
>>>> I'll create issues for some of these topics, which if nothing else should be
>>>> discussed.
>>>>
>>>> /Robbin
>>>>
>>>>>
>>>>> Does anybody know where it is or if it's missing?
>>>>>
>>>>> Best regards,
>>>>> Martin
>>>>>
More information about the hotspot-dev
mailing list