RFR(m): 8220351: Cross-modifying code

Tue Mar 19 14:04:05 UTC 2019

Hi Robbin,

I just tried a small stress test [1] with (+UseNewCode) and without added CPUID instruction in native wrapper [2].

jdk/bin/java -XX:+UnlockDiagnosticVMOptions StressJVMEntries
result: 2113929216, duration: 40.1181962
jdk/bin/java -XX:+UnlockDiagnosticVMOptions -XX:+UseNewCode StressJVMEntries
result: 2113929216, duration: 76.0759127

Executed on 40 core 2-socket Intel(R) Xeon(R) CPU E5-2660 v3 @ 2.60GHz.
So I guess nobody wants CPUID on the regular path if it can be avoided.

Best regards,
Martin



[1] StressJVMEntries.java:
public class StressJVMEntries{

    static long sum = 0;
    static final int loop_cnt = 10000000;

    public static void main(String args[]){
        Runtime rt = Runtime.getRuntime();
        long duration = System.nanoTime();
        for (int x=0; x<loop_cnt; x++) sum += rt.totalMemory();
        duration = System.nanoTime() - duration;
        System.out.println("result: " + sum/loop_cnt + ", duration: " + (double)duration/loop_cnt);
    }
}


[2] patch:
diff -r 3827cd66e788 src/hotspot/cpu/x86/sharedRuntime_x86_64.cpp

--- a/src/hotspot/cpu/x86/sharedRuntime_x86_64.cpp      Mon Mar 18 16:04:23 2019 +0100
+++ b/src/hotspot/cpu/x86/sharedRuntime_x86_64.cpp      Tue Mar 19 15:02:31 2019 +0100
@@ -2607,6 +2607,18 @@
     }

     __ bind(Continue);
+    if (UseNewCode) {
+      __ push(rax);
+      __ push(rbx);
+      __ push(rcx);
+      __ push(rdx);
+      __ xorl(rax, rax);
+      __ cpuid();
+      __ pop(rdx);
+      __ pop(rcx);
+      __ pop(rbx);
+      __ pop(rax);
+    }
   }

   // change thread state



-----Original Message-----
From: Robbin Ehn <robbin.ehn at oracle.com> 
Sent: Freitag, 15. März 2019 13:32
To: David Holmes <david.holmes at oracle.com>; Reingruber, Richard <richard.reingruber at sap.com>; Doerr, Martin <martin.doerr at sap.com>; Andrew Haley <aph at redhat.com>; hotspot-dev at openjdk.java.net
Subject: Re: RFR(m): 8220351: Cross-modifying code

>>
>> The last point was rather exploring the space of legal solutions than a
>> suggestion to solve the issue. I'd favor your approach if you manage to do the
>> serialization conditionally (BTW: a mechanism similar to 
>> JavaThread::*deopt_suspend*
>> would be yet another way to do that).

FYI: In the long run I'm removing the suspend-flags and turning suspend/deopt/.. 
into handshakes instead. So we only have the poll to check.

I'm testing a change where I removed the changes to the native wrappers,
leaving native threads armed (as Martin suggested) and in the poll path notice 
that we are still armed (when returning from a safepoint/handshake we should be 
disarmed). Disarm yourself, re-check safepoint/handshake, execute instruction 
barrier.
So we don't need any changes in any native wrapper, we handle the barrier in the 
polling code instead.

I have just one issue to sort-out with that and benchmarking.
The v2 patch (not yet published) didn't show to bad of a result when we always 
emit cpuid, but there is no super JNI heavy test in that suite.

/Robbin

>>
>> Cheers, Richard.
>>
>> [1] 8.1.3 Handling Self- and Cross-Modifying Code
>>      in Intel(R) 64 and IA-32 ArchitecturesSoftware Developer's Manual. Volume 3
>>      (3A, 3B, 3C & 3D): System Programming Guide
>>      
>> https://software.intel.com/sites/default/files/managed/a4/60/325384-sdm-vol-3abcd.pdf 
>>
>>      on https://software.intel.com/en-us/articles/intel-sdm
>>
>> -----Original Message-----
>> From: Robbin Ehn <robbin.ehn at oracle.com>
>> Sent: Donnerstag, 14. März 2019 13:24
>> To: Reingruber, Richard <richard.reingruber at sap.com>; Doerr, Martin 
>> <martin.doerr at sap.com>; David Holmes <david.holmes at oracle.com>; 
>> hotspot-dev at openjdk.java.net
>> Subject: Re: RFR(m): 8220351: Cross-modifying code
>>
>> Hi Richard,
>>
>> On 3/14/19 11:12 AM, Reingruber, Richard wrote:
>>> Hi Robbin,
>>>
>>>     > the reason I heard for not doing invalidate_range/word in nmethod::oops_do
>>>     > is that it was deemed that a safepoint is 'long enough'.
>>>
>>> This doesn't seem to be a very good reason. I did a little modification to your
>>> initial figure:
>>
>> That's why we are adding the instruction barrier!
>>
>>>
>>> JavaThread:             | VMThread
>>> StoreLoad               |
>>>                           | update <immediate oop>
>>>                           | ? Global ICache Invalidate ?
>>>                           | disarm
>>> load thread_poll        |
>>>
>>> Question would be: does an ICache flush happen after the immediate oop update?
>>> I would consider it problematic if not. Even if the safepoint was very long.
>>
>> The icache flush assumes it can stop speculation and flush instruction cache in
>> patchee thread.
>> Correct me if anything is wrong:
>> - sparc needs flush <adr> in all patchee threads.
>> - arm/aarch64 needs isb + ___clear_cache(adr,..) in all patchee threads.
>> - ppc needs isync in all patchee threads?
>> - s390 needs zarch_sync all in patchee threads?
>> - x86 needs cpuid in all in patchee threads.
>>
>> So a "Global ICache Invalidate" done by patcher thread (VMThread in this case)
>> have no effect in patchee threads!?
>>
>> Adding a few words on x86 ICache::Invalidate/word.
>> We have TSO, so if the patchee thread can see poll is disarmed all other stores
>> are visible. I have seen no case ICache::Invalidate/word would make a difference
>> from a StoreStore/compiler barrier (still talking about platform code for x86) ?
>> The problem is if we prefect an instruction over the load of the poll.
>> If poll is disarmed this perfected instruction can be fully executed.
>> If that happens to contain an immediate oop, we don't know if it's the patched
>> oop or not.
>>
>> /Robbin
>>
>>>
>>> Cheers Richard.
>>>
>>> -----Original Message-----
>>> From: Robbin Ehn <robbin.ehn at oracle.com>
>>> Sent: Donnerstag, 14. März 2019 10:34
>>> To: Doerr, Martin <martin.doerr at sap.com>; David Holmes 
>>> <david.holmes at oracle.com>; hotspot-dev at openjdk.java.net
>>> Cc: Reingruber, Richard <richard.reingruber at sap.com>
>>> Subject: Re: RFR(m): 8220351: Cross-modifying code
>>>
>>> Hi Martin,
>>>
>>> On 3/14/19 10:25 AM, Doerr, Martin wrote:
>>>> Hi Robbin,
>>>>
>>>>> But that is whole new subject and let's keep that separate from this :)
>>>> Sure, we can keep the modification side discussion separate. This change is 
>>>> already difficult enough ��
>>>>
>>>>
>>>> Btw. I haven't sent the ppc/s390 instruction cache barriers. Here they are:
>>>>
>>>> orderAccess_aix/linux_ppc.hpp:
>>>> inline void OrderAccess::instruction_pipeline() { inlasm_isync(); }
>>>>
>>>> orderAccess_linux_s390.hpp:
>>>> inline void OrderAccess::instruction_pipeline() { inlasm_zarch_sync(); }
>>>>
>>>
>>> Thanks!
>>>
>>>>> nmethod::oops_do calls the closure f->do_oop(r->oop_addr()) for oops 
>>>>> encoded into the instruction stream.
>>>>> I couldn't find a call to AbstractICache::invalidate_range.
>>>
>>> I think I miss-understood you here, the reason I heard for not doing
>>> invalidate_range/word in nmethod::oops_do is that it was deemed that a
>>> safepoint is 'long enough'.
>>>
>>> /Robbin
>>>
>>>>
>>>> Look for ICache::invalidate_range, I see 11 calls on x64.
>>>> And one call to AbstractICache/ICache::invalidate_word.
>>>> Btw I have spent some time looking at x86/icache_x86.cpp...
>>>> But that is whole new subject and let's keep that separate from this :)
>>>>
>>>> I'll create issues for some of these topics, which if nothing else should be
>>>> discussed.
>>>>
>>>> /Robbin
>>>>
>>>>>
>>>>> Does anybody know where it is or if it's missing?
>>>>>
>>>>> Best regards,
>>>>> Martin
>>>>>