RFR(m): 8220351: Cross-modifying code

Mon Mar 18 12:33:57 UTC 2019

For x86, this part in Intel/(AMD) the manual I'm following:

"
To write cross-modifying code and ensure that it is compliant with current and
future versions of the IA-32 architecture, the following processor
synchronization algorithm must be implemented:

(* Action of Modifying Processor *)
Memory_Flag ← 0; (* Set Memory_Flag to value other than 1 *)
Store modified code (as data) into code segment;
Memory_Flag ← 1;

(* Action of Executing Processor *)
WHILE (Memory_Flag ≠ 1)
Wait for code to update;
ELIHW;
Execute serializing instruction; (* For example, CPUID instruction *)
Begin executing modified code;
"

For this patch the poll is the Memory_Flag we are changing state on.

There is no mentioning of another compliant way. Even if the pointer publishing
would work on a current CPUs it is not future proof.

On OS level, membarrier(...) do only guarantee smp_mb() semantics.
If it would have stronger guarantees doing OS membar from the patcher would be
a second method.

/Robbin

On 3/18/19 12:36 PM, Reingruber, Richard wrote:
> Totally accepted.
> 
> I was only trying to understand the rules for cross-modifying code on x86, which
> is a prerequisite for the issue addressed by Robbins patch.
> 
> Thanks, Richard.
> 
> -----Original Message-----
> From: Erik Österlund <erik.osterlund at oracle.com>
> Sent: Montag, 18. März 2019 12:31
> To: Robbin Ehn <robbin.ehn at oracle.com>; Reingruber, Richard <richard.reingruber at sap.com>; David Holmes <david.holmes at oracle.com>; Doerr, Martin <martin.doerr at sap.com>; Andrew Haley <aph at redhat.com>; hotspot-dev at openjdk.java.net
> Subject: Re: RFR(m): 8220351: Cross-modifying code
> 
> Hi,
> 
> Safely publishing JIT-compiled code is outside the scope of this patch,
> and a completely different can of worms. This patch only intends to make
> the code patching we do in HotSpot safer. To safely publish JIT-compiled
> code, some more effort will be required, and I think that discussion can
> wait. Just pretend it makes sense for now.
> 
> Thanks,
> /Erik
> 
> On 2019-03-18 12:00, Robbin Ehn wrote:
>>> Let's assume J2's modification makes it into J1's instruction stream.
>>> How is it
>>> guaranteed that the new compiled code for methodB from the JIT makes
>>> it into
>>> J1's instruction stream, too, if no serializing instruction is
>>> executed by J1?
>>
>> In short, no, there are no such guarantee AFAIK.
>> If methodB is generate in memory with a valid instruction stream, we
>> may execute the old one.
>> And J1 may see the unresolved call and also try to resolve it, seem
>> much more likely.
>>
>> ErikÖ have looked more into these cases. (I don't plan to address them
>> in this patch)
>>
>> /Robbin
>>
>>> My explanation was that the call instruction has the serializing
>>> effect, just
>>> like in OPTION 1. Note as well the difference between OPTION 1 and 2:
>>> if the new
>>> code is not reached by a jump then a serializing instruction is
>>> required as
>>> well.
>>>
>>>     > That's the "accidental success" that is being avoided by this fix
>>>
>>> Which part of the fix would that be?
>>>
>>> Thanks, Richard.
>>>
>>> -----Original Message-----
>>> From: David Holmes <david.holmes at oracle.com>
>>> Sent: Montag, 18. März 2019 03:07
>>> To: Reingruber, Richard <richard.reingruber at sap.com>; Robbin Ehn
>>> <robbin.ehn at oracle.com>; Doerr, Martin <martin.doerr at sap.com>; Andrew
>>> Haley <aph at redhat.com>; hotspot-dev at openjdk.java.net
>>> Subject: Re: RFR(m): 8220351: Cross-modifying code
>>>
>>> On 15/03/2019 11:04 pm, Reingruber, Richard wrote:
>>>>      > OPTION 1 is for self-modifying code, not cross-modifying code.
>>>>
>>>> I know. I wanted to point out that it seems to work for
>>>> cross-modifying code,
>>>> too, in the sense that patchee java threads don't execute a
>>>> serializing instruction
>>>> before jumping to compiled code or before executing a compiled call
>>>> another java
>>>> thread just resolved.
>>>
>>> That's the "accidental success" that is being avoided by this fix. Today
>>> it all seems to work just fine because the "distance" between the
>>> actions of the patching thread and the patchee thread that wants to use
>>> the new code is "vast".
>>>
>>> Cheers,
>>> David
>>>
>>>> Cheers, Richard.
>>>>
>>>> -----Original Message-----
>>>> From: David Holmes <david.holmes at oracle.com>
>>>> Sent: Freitag, 15. März 2019 12:55
>>>> To: Reingruber, Richard <richard.reingruber at sap.com>; Robbin Ehn
>>>> <robbin.ehn at oracle.com>; Doerr, Martin <martin.doerr at sap.com>;
>>>> Andrew Haley <aph at redhat.com>; hotspot-dev at openjdk.java.net
>>>> Subject: Re: RFR(m): 8220351: Cross-modifying code
>>>>
>>>> On 15/03/2019 9:08 pm, Reingruber, Richard wrote:
>>>>> Hi Robbin,
>>>>>
>>>>> I'm referring to x86 here.
>>>>>
>>>>> My comment/question about the requirement of an ICache invalidation
>>>>> in the
>>>>> patcher thread was a little "undereducated". After reading recent
>>>>> manuals [1]:
>>>>>
>>>>>       1. I think your right that a serializing instruction after the
>>>>> safepoint poll
>>>>>          is necessary.
>>>>>
>>>>>       2. I doubt that cache flushing is required at all, according
>>>>> to [1]. At least
>>>>>          SPECjvm2008 runs with an AbstractICache::_flush_icache_stub
>>>>> that
>>>>>          immediately returns.
>>>>>
>>>>> In the sense of [1] JIT compilers and JIT runtimes are
>>>>> cross-modifying code,
>>>>> too, when they compile bytecode and resolve compiled calls. But
>>>>> afaik the java
>>>>> threads don't execute a serializing instruction before executing
>>>>> that code. So,
>>>>> if I'm not mistaken, (* OPTION 1 *) of [1] where the patchee
>>>>> threads jump to the
>>>>> modified code is legal as well.
>>>>
>>>> OPTION 1 is for self-modifying code, not cross-modifying code.
>>>>
>>>> David
>>>> -----
>>>>
>>>>> So at least theoretically you should be able to replace the
>>>>> serializing
>>>>> instruction with a call to an address you would load from the
>>>>> thread. The
>>>>> address should lead to the runtime, where the thread would block if
>>>>> the
>>>>> safepoint is still active or just return if it is not.
>>>>>
>>>>> The last point was rather exploring the space of legal solutions
>>>>> than a
>>>>> suggestion to solve the issue. I'd favor your approach if you
>>>>> manage to do the
>>>>> serialization conditionally (BTW: a mechanism similar to
>>>>> JavaThread::*deopt_suspend*
>>>>> would be yet another way to do that).
>>>>>
>>>>> Cheers, Richard.
>>>>>
>>>>> [1] 8.1.3 Handling Self- and Cross-Modifying Code
>>>>>         in Intel(R) 64 and IA-32 ArchitecturesSoftware Developer's
>>>>> Manual. Volume 3
>>>>>         (3A, 3B, 3C & 3D): System Programming Guide
>>>>> https://software.intel.com/sites/default/files/managed/a4/60/325384-sdm-vol-3abcd.pdf
>>>>>         on https://software.intel.com/en-us/articles/intel-sdm
>>>>>
>>>>> -----Original Message-----
>>>>> From: Robbin Ehn <robbin.ehn at oracle.com>
>>>>> Sent: Donnerstag, 14. März 2019 13:24
>>>>> To: Reingruber, Richard <richard.reingruber at sap.com>; Doerr, Martin
>>>>> <martin.doerr at sap.com>; David Holmes <david.holmes at oracle.com>;
>>>>> hotspot-dev at openjdk.java.net
>>>>> Subject: Re: RFR(m): 8220351: Cross-modifying code
>>>>>
>>>>> Hi Richard,
>>>>>
>>>>> On 3/14/19 11:12 AM, Reingruber, Richard wrote:
>>>>>> Hi Robbin,
>>>>>>
>>>>>>        > the reason I heard for not doing invalidate_range/word in
>>>>>> nmethod::oops_do
>>>>>>        > is that it was deemed that a safepoint is 'long enough'.
>>>>>>
>>>>>> This doesn't seem to be a very good reason. I did a little
>>>>>> modification to your
>>>>>> initial figure:
>>>>>
>>>>> That's why we are adding the instruction barrier!
>>>>>
>>>>>>
>>>>>> JavaThread:             | VMThread
>>>>>> StoreLoad               |
>>>>>>                              | update <immediate oop>
>>>>>>                              | ? Global ICache Invalidate ?
>>>>>>                              | disarm
>>>>>> load thread_poll        |
>>>>>>
>>>>>> Question would be: does an ICache flush happen after the immediate
>>>>>> oop update?
>>>>>> I would consider it problematic if not. Even if the safepoint was
>>>>>> very long.
>>>>>
>>>>> The icache flush assumes it can stop speculation and flush
>>>>> instruction cache in
>>>>> patchee thread.
>>>>> Correct me if anything is wrong:
>>>>> - sparc needs flush <adr> in all patchee threads.
>>>>> - arm/aarch64 needs isb + ___clear_cache(adr,..) in all patchee
>>>>> threads.
>>>>> - ppc needs isync in all patchee threads?
>>>>> - s390 needs zarch_sync all in patchee threads?
>>>>> - x86 needs cpuid in all in patchee threads.
>>>>>
>>>>> So a "Global ICache Invalidate" done by patcher thread (VMThread in
>>>>> this case)
>>>>> have no effect in patchee threads!?
>>>>>
>>>>> Adding a few words on x86 ICache::Invalidate/word.
>>>>> We have TSO, so if the patchee thread can see poll is disarmed all
>>>>> other stores
>>>>> are visible. I have seen no case ICache::Invalidate/word would make
>>>>> a difference
>>>>> from a StoreStore/compiler barrier (still talking about platform
>>>>> code for x86) ?
>>>>> The problem is if we prefect an instruction over the load of the poll.
>>>>> If poll is disarmed this perfected instruction can be fully executed.
>>>>> If that happens to contain an immediate oop, we don't know if it's
>>>>> the patched
>>>>> oop or not.
>>>>>
>>>>> /Robbin
>>>>>
>>>>>>
>>>>>> Cheers Richard.
>>>>>>
>>>>>> -----Original Message-----
>>>>>> From: Robbin Ehn <robbin.ehn at oracle.com>
>>>>>> Sent: Donnerstag, 14. März 2019 10:34
>>>>>> To: Doerr, Martin <martin.doerr at sap.com>; David Holmes
>>>>>> <david.holmes at oracle.com>; hotspot-dev at openjdk.java.net
>>>>>> Cc: Reingruber, Richard <richard.reingruber at sap.com>
>>>>>> Subject: Re: RFR(m): 8220351: Cross-modifying code
>>>>>>
>>>>>> Hi Martin,
>>>>>>
>>>>>> On 3/14/19 10:25 AM, Doerr, Martin wrote:
>>>>>>> Hi Robbin,
>>>>>>>
>>>>>>>> But that is whole new subject and let's keep that separate from
>>>>>>>> this :)
>>>>>>> Sure, we can keep the modification side discussion separate. This
>>>>>>> change is already difficult enough ��
>>>>>>>
>>>>>>>
>>>>>>> Btw. I haven't sent the ppc/s390 instruction cache barriers. Here
>>>>>>> they are:
>>>>>>>
>>>>>>> orderAccess_aix/linux_ppc.hpp:
>>>>>>> inline void OrderAccess::instruction_pipeline() { inlasm_isync(); }
>>>>>>>
>>>>>>> orderAccess_linux_s390.hpp:
>>>>>>> inline void OrderAccess::instruction_pipeline() {
>>>>>>> inlasm_zarch_sync(); }
>>>>>>>
>>>>>>
>>>>>> Thanks!
>>>>>>
>>>>>>>> nmethod::oops_do calls the closure f->do_oop(r->oop_addr()) for
>>>>>>>> oops encoded into the instruction stream.
>>>>>>>> I couldn't find a call to AbstractICache::invalidate_range.
>>>>>>
>>>>>> I think I miss-understood you here, the reason I heard for not doing
>>>>>> invalidate_range/word in nmethod::oops_do is that it was deemed
>>>>>> that a
>>>>>> safepoint is 'long enough'.
>>>>>>
>>>>>> /Robbin
>>>>>>
>>>>>>>
>>>>>>> Look for ICache::invalidate_range, I see 11 calls on x64.
>>>>>>> And one call to AbstractICache/ICache::invalidate_word.
>>>>>>> Btw I have spent some time looking at x86/icache_x86.cpp...
>>>>>>> But that is whole new subject and let's keep that separate from
>>>>>>> this :)
>>>>>>>
>>>>>>> I'll create issues for some of these topics, which if nothing
>>>>>>> else should be
>>>>>>> discussed.
>>>>>>>
>>>>>>> /Robbin
>>>>>>>
>>>>>>>>
>>>>>>>> Does anybody know where it is or if it's missing?
>>>>>>>>
>>>>>>>> Best regards,
>>>>>>>> Martin
>>>>>>>>
>