RFR: 8210498: nmethod entry barriers

Tue Oct 9 18:58:53 UTC 2018

Hi Vladimir,

Thanks for the review!

/Erik

> On 9 Oct 2018, at 19:19, Vladimir Kozlov <vladimir.kozlov at oracle.com> wrote:
> 
> Looks good.
> 
> Thanks,
> Vladimir
> 
>> On 10/9/18 8:52 AM, Erik Österlund wrote:
>> Hi Vladimir,
>> Sounds like we are in agreement. Thanks for looking into this change.
>> I updated the webrev with the added assembly printing for C2 as requested.
>> Incremental:
>> http://cr.openjdk.java.net/~eosterlund/8210498/webrev.01_02/
>> Full:
>> http://cr.openjdk.java.net/~eosterlund/8210498/webrev.02/
>> Thanks,
>> /Erik
>>> On 2018-10-09 16:42, Vladimir Kozlov wrote:
>>>> On 10/9/18 12:24 AM, Erik Österlund wrote:
>>>> Hi Vladimir,
>>>> 
>>>> Thank you for reviewing this.
>>>> 
>>>>> On 2018-10-08 19:57, Vladimir Kozlov wrote:
>>>>> First, all nmethods have verified entry. MachPrologNode is generated by C2 for all nmethods including OSR (we need stack bang in all cases). OSR does not have *Unverified* entry represented by MachUEPNode. C1 compiled OSR nmethods are different - you may need your changes for it but I am not sure - ask C1 experts.
>>>>> 
>>>>> There could be complications if an OSR nmethods is not executed (after barrier check) due to state of stack on OSR entry but it is different issue.
>>>> 
>>>> I should have been more precise. OSR nmethods have a prologue for building frames, yes. But OSR entries do not have a verified entry offset, in the sense that in e.g. c1_LIRAssembler.cpp, no CodeOffsets::Verified_Entry entry is set for OSR entries (the lir_osr_entry case). It is set only for lir_std_entry. Similarly for C2, while they both have a prologue, there is no CodeOffsets::Verified_Entry set for OSR nmethods. Read for example opto/compile.cpp:
>>>> 
>>>>      if (is_osr_compilation()) {
>>>>        _code_offsets.set_value(CodeOffsets::Verified_Entry, 0);
>>>>        _code_offsets.set_value(CodeOffsets::OSR_Entry, _first_block_size);
>>>>      } else {
>>>>        _code_offsets.set_value(CodeOffsets::Verified_Entry, _first_block_size);
>>>>        _code_offsets.set_value(CodeOffsets::OSR_Entry, 0);
>>>>      }
>>>> 
>>>> The different treatment of OSR methods for the entry barrier is analogous to the different treatment of OSR methods when making them not entrant. OSR nmethods are made not entrant by detaching them from the Method* (so the runtime won't find them), while non-OSR nmethods are made not entrant by patching a jump to the SharedRuntime::get_handle_wrong_method_stub() into the verified entry. And that handler stub has particular ABI requirements on the call site, that do not match the OSR call site.
>>>> 
>>>> Similarly for nmethod entry barriers, if the callee is an OSR nmethod, it is much easier to check in the runtime before the call if said nmethod is toast or not, than it is to check when the nmethod has entered, and then figure out how to deoptimize from inside of the nmethod.
>>> 
>>> Good. Note, I did not argued about changes for OSR nmethods, I was surprise by statement. Your explanation cleared it.
>>> 
>>>> 
>>>>> I am curious why you are using next 2 lines everywhere?
>>>>> 
>>>>> +  BarrierSetAssembler* bs = BarrierSet::barrier_set()->barrier_set_assembler();
>>>>> +  bs->nmethod_entry_barrier(this);
>>>>> 
>>>>> instead of one call:
>>>>> 
>>>>> BarrierSet::barrier_set()->nmethod_entry_barrier(this);
>>>> 
>>>> Because we never make straight calls to BarrierSet for inserting barriers. Instead, we use one out of the (currently) 4 helpers that have their own responsibility. The BarrierSetAssembler is responsible for inserting barriers into assembly code.
>>> 
>>> So it is by GC Interface design? Okay.
>>> 
>>>> 
>>>>> The same question about duplicated code in interpreterRuntime.cpp
>>>> 
>>>> Same answer. ;)
>>>> 
>>>>> Would be nice if you also update MachPrologNode::format() for case when barrier is used. It is used in PrintOptoAssembly.
>>>> 
>>>> Yes, that is a good idea. I will adjust the printing code.
>>> 
>>> Thanks,
>>> Vladimir
>>> 
>>>> 
>>>> Thanks,
>>>> /Erik
>>>> 
>>>>> Thanks,
>>>>> Vladimir
>>>>> 
>>>>>> On 10/8/18 1:19 AM, Erik Österlund wrote:
>>>>>> Hi Per and Roman,
>>>>>> 
>>>>>> Thank you for having a look at this.
>>>>>> I went with Roman's suggestion and made this an explicit opt-out instead. But I let ModRef pass in NULL, as none of the barrier sets in that category currently use nmethod entry barriers. When they start to do so, that can be easily changed.
>>>>>> 
>>>>>> Incremental:
>>>>>> http://cr.openjdk.java.net/~eosterlund/8210498/webrev.00_01/
>>>>>> 
>>>>>> Full:
>>>>>> http://cr.openjdk.java.net/~eosterlund/8210498/webrev.01/
>>>>>> 
>>>>>> Thanks,
>>>>>> /Erik
>>>>>> 
>>>>>>> On 2018-10-08 08:48, Roman Kennke wrote:
>>>>>>> I agree. I wouldn't mind to keep it simple and pass NULL though. I'm not
>>>>>>> a big fan of default arguments anyway.
>>>>>>> 
>>>>>>> Roman
>>>>>>> 
>>>>>>>>> On 2018-10-05 17:07, Erik Österlund wrote:
>>>>>>>>> Hi,
>>>>>>>>> 
>>>>>>>>> In order to implement concurrent class unloading for ZGC, a mechanism
>>>>>>>>> will be required to arm nmethods in a safepoint such that the first
>>>>>>>>> entry into an nmethod after the pause triggers a barrier before
>>>>>>>>> continuing.
>>>>>>>>> 
>>>>>>>>> The barrier will be used for:
>>>>>>>>>     * Patching immediate oops, and keeping phantomly reachable oops
>>>>>>>>> alive through the nmethod entry
>>>>>>>>>     * Catching calls from stale IC caches (due to class unloading) into
>>>>>>>>> nmethods that are being unloaded (containing broken oops and
>>>>>>>>> metadata), and lazily clean the IC cache and re-resolve the call.
>>>>>>>>> 
>>>>>>>>> This implementation is for x86_64 and is based on the prototyping work
>>>>>>>>> of Rickard Bäckman. So big thanks to Rickard, who will co-author of
>>>>>>>>> this patch.
>>>>>>>>> The way it works is that there is a cmp #offset(r15_thread),
>>>>>>>>> simm32_epoch); je #continuation; inserted into the verified entry of
>>>>>>>>> nmethods that conditionally calls a stub trampolining into the VM.
>>>>>>>>> When the simm32_epoch is not equal to some TLS-local field, then a
>>>>>>>>> slow path is taken to run some barrier inside of the VM.
>>>>>>>>> By changing this TLS-local field in the safepoint, all verified
>>>>>>>>> entries will take the slow path into the VM when run, where the
>>>>>>>>> barrier is eventually disarmed by patching the simm32_epoch value of
>>>>>>>>> the instruction stream to the current globally correct value.
>>>>>>>>> 
>>>>>>>>> In terms of abstractions, to utilize this mechanism, you need to
>>>>>>>>> create your BarierSet with an BarrierSetNMethod helper object, just
>>>>>>>>> like the BarrierSetAssembler, BarrierSetC1 and BarrierSetC2. The
>>>>>>>>> BarrierSetNMethod base class handles the barrier itself. It has a
>>>>>>>>> virtual function, nmethod_entry_barrier, which gets called by the slow
>>>>>>>>> paths. So each user of this mechanism inherits from BarrierSetNMethod
>>>>>>>>> and puts its barrier in there.
>>>>>>>>> 
>>>>>>>>> The OSR nmethods do not have a verified entry. Therefore, the barrier
>>>>>>>>> is run explicitly in the runtime before calling into OSR nmethods to
>>>>>>>>> achieve the same effect of trapping new calls into nmethods.
>>>>>>>>> 
>>>>>>>>> The nmethod entry barrier returns true/false depending on whether the
>>>>>>>>> called nmethod may be entered or not. This allows lazily cleaning
>>>>>>>>> nmethod IC caches during unloading, which invokes the same helper
>>>>>>>>> method that is patched into the verified entry when made not entrant.
>>>>>>>>> 
>>>>>>>>> Eventually, I would like to also use this mechanism to remove the
>>>>>>>>> nmethod hotness counter maintenance we today perform in safepoint
>>>>>>>>> cleanup instead. These nmethod entry barriers can provide better
>>>>>>>>> heuristics for whether nmethods are used or not. But that would
>>>>>>>>> require buy-in from maintainers of other platforms. So let's take one
>>>>>>>>> step at a time.
>>>>>>>>> 
>>>>>>>>> For the curious reader thinking that this might be an expensive
>>>>>>>>> operation, it is not. This check in the nmethod verified entry is very
>>>>>>>>> predictable and cheap. And because of use of inlining, the checks
>>>>>>>>> become more infrequent. In my evaluation, it has gone completely under
>>>>>>>>> the radar in benchmarking (running this with my current version of
>>>>>>>>> concurrent class unloading in ZGC).
>>>>>>>>> 
>>>>>>>>> Webrev:
>>>>>>>>> http://cr.openjdk.java.net/~eosterlund/8210498/webrev.00/
>>>>>>>> I reviewed this before Erik sent it out, so my comments have already
>>>>>>>> been addressed. Just found one additional thing, the BarrierSet
>>>>>>>> constructor now looks like this:
>>>>>>>> 
>>>>>>>> 96   BarrierSet(BarrierSetAssembler* barrier_set_assembler,
>>>>>>>> 97              BarrierSetC1* barrier_set_c1,
>>>>>>>> 98              BarrierSetC2* barrier_set_c2,
>>>>>>>> 99              const FakeRtti& fake_rtti,
>>>>>>>> 100             BarrierSetNMethod* barrier_set_nmethod = NULL) :
>>>>>>>> 101     _fake_rtti(fake_rtti),
>>>>>>>> 102     _barrier_set_assembler(barrier_set_assembler),
>>>>>>>> 103     _barrier_set_c1(barrier_set_c1),
>>>>>>>> 104     _barrier_set_c2(barrier_set_c2),
>>>>>>>> 105     _barrier_set_nmethod(barrier_set_nmethod) {}
>>>>>>>> 
>>>>>>>> I find it odd that the fake_rtti argument comes in the middle of the
>>>>>>>> barrier_set_* arguments. I can see that you don't want GC barrier sets
>>>>>>>> to have to supply NULL if they don't have a nmethod barrier set, but
>>>>>>>> maybe we can have two constructors instead, like:
>>>>>>>> 
>>>>>>>> BarrierSet(BarrierSetAssembler* barrier_set_assembler,
>>>>>>>>             BarrierSetC1* barrier_set_c1,
>>>>>>>>             BarrierSetC2* barrier_set_c2,
>>>>>>>>             const FakeRtti& fake_rtti) :
>>>>>>>>      _fake_rtti(fake_rtti),
>>>>>>>>      _barrier_set_assembler(barrier_set_assembler),
>>>>>>>>      _barrier_set_c1(barrier_set_c1),
>>>>>>>>      _barrier_set_c2(barrier_set_c2),
>>>>>>>>      _barrier_set_nmethod(NULL) {}
>>>>>>>> 
>>>>>>>> BarrierSet(BarrierSetAssembler* barrier_set_assembler,
>>>>>>>>             BarrierSetC1* barrier_set_c1,
>>>>>>>>             BarrierSetC2* barrier_set_c2,
>>>>>>>>             BarrierSetNMethod* barrier_set_nmethod,
>>>>>>>>             const FakeRtti& fake_rtti,) :
>>>>>>>>      _fake_rtti(fake_rtti),
>>>>>>>>      _barrier_set_assembler(barrier_set_assembler),
>>>>>>>>      _barrier_set_c1(barrier_set_c1),
>>>>>>>>      _barrier_set_c2(barrier_set_c2),
>>>>>>>>      _barrier_set_nmethod(barrier_set_nmethod) {}
>>>>>>>> 
>>>>>>>> What do you think?
>>>>>>>> 
>>>>>>>> Other than that, this looks good to me.
>>>>>>>> 
>>>>>>>> cheers,
>>>>>>>> Per
>>>>>>>> 
>>>>>>>>> Bug ID:
>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8210498
>>>>>>>>> 
>>>>>>>>> Thanks,
>>>>>>>>> /Erik
>>>>>> 
>>>>