RFR: 8199417: Modularize interpreter GC barriers
coleen.phillimore at oracle.com
coleen.phillimore at oracle.com
Wed Apr 4 20:05:41 UTC 2018
Thank you for keeping load_heap_oop in the macroAssembler, I like it and
think it should be consistent with the rest of the platforms.
http://cr.openjdk.java.net/~eosterlund/8199417/webrev.01/src/hotspot/cpu/x86/templateTable_x86.cpp.udiff.html
+ do_oop_store(_masm, Address(rdx, 0), rax, IN_HEAP_ARRAY);
What happens if someone misses IN_HEAP_ARRAY and puts IN_HEAP instead?
http://cr.openjdk.java.net/~eosterlund/8199417/webrev.01/src/hotspot/cpu/aarch64/templateTable_aarch64.cpp.udiff.html
+ do_oop_store(_masm, element_address, noreg, IN_HEAP | IN_HEAP_ARRAY);
Here it has both for aarch64.
I reviewed the x86 interpreter code, and will review the rest when/if
people say that load_heap_oop is good.
Thanks,
Coleen
On 4/4/18 9:38 AM, Erik Österlund wrote:
> Hi Kim,
>
> Thank you for reviewing this.
>
> I have made a new webrev with proposed changes to the x86/x64 code
> reflecting the concerns you and Coleen have.
> I thought we should settle for something we like in the x86/x64 code
> before going ahead and changing the other platforms too much (I don't
> want to bug Martin and Roman too much before).
>
> New full webrev:
> http://cr.openjdk.java.net/~eosterlund/8199417/webrev.01/
>
> Incremental:
> http://cr.openjdk.java.net/~eosterlund/8199417/webrev.00_01/
>
> On 2018-04-01 05:12, Kim Barrett wrote:
>>> On Mar 26, 2018, at 5:31 AM, Erik Österlund
>>> <erik.osterlund at oracle.com> wrote:
>>>
>>> Hi,
>>>
>>> The GC barriers for the interpreter are not as modular as they could
>>> be. They currently use switch statements to check which GC barrier
>>> set is being used, and call this or that barrier based on that, in a
>>> way that assumes GCs only use write barriers.
>>>
>>> This patch modularizes this by generating accesses in the
>>> interpreter with declarative semantics. Accesses to the heap may now
>>> use store_at and load_at functions of the BarrierSetAssembler,
>>> passing along appropriate arguments and decorators. Each concrete
>>> BarrierSetAssembler can override the access completely or sprinkle
>>> some appropriate GC barriers as necessary.
>>>
>>> Big thanks go to Martin Doerr and Roman Kennke, who helped plugging
>>> this into S390, PPC and AArch64 respectively.
>>>
>>> Webrev:
>>> http://cr.openjdk.java.net/~eosterlund/8199417/webrev.00/
>>>
>>> Bug:
>>> https://bugs.openjdk.java.net/browse/JDK-8199417
>>>
>>> Thanks,
>>> /Erik
>> I've only looked at the x86 and generic code so far. I have some
>> comments, but also a bunch of questions. I think I'm missing some
>> things somewhere. I'm sending what I've got so far anyway, and will
>> continue studying.
>>
>> ------------------------------------------------------------------------------
>>
>> src/hotspot/cpu/x86/gc/g1/g1BarrierSetAssembler_x86.cpp
>> 354 // flatten object address if needed
>> 355 // We do it regardless of precise because we need the registers
>>
>> There is no "precise" in this context.
>
> I thought "precise" was referring to the concept of precise card
> marking, as opposed to the variable 'precise' that used to exist. I
> have reworded the comment to reflect that.
>
>> ------------------------------------------------------------------------------
>>
>> src/hotspot/cpu/x86/gc/g1/g1BarrierSetAssembler_x86.cpp
>> 364 #ifndef _LP64
>> 365 InterpreterMacroAssembler *imasm =
>> static_cast<InterpreterMacroAssembler*>(masm);
>> 366 #endif
>>
>> In the old code, the function received an InterpreterMacroAssembler*.
>>
>> What is there about this context that lets us know the downcast is
>> safe here? Is oop_store_at only ever called with an IMA*? If so,
>> then why isn't that the parameter type. If not, then what?
>
> Yes, this is indeed only used by the InterpreterMacroAssembler. But I
> wanted the interface to use MacroAssembler. Why?
>
> 1) It's only 32 bit x86 that relies on this property, and I hope it
> will go away soon, and the save bcp hack with it.
> 2) previous load_heap_oop is used not only in the
> InterpreterMacroAssembler, and I wanted the same assembler in load_at
> and store_at.
>
> So for now it is only used in InterpreterMacroAssembler, and I doubt
> that will change any time soon. I am hoping 32 bit support will be
> dropped before that, and the hack will go away. For now, I have added
> a virtual is_interpreter_masm() call that I use in an assert to make
> sure this is not accidentally used in the wrong context until 32 bit
> support is gone.
>
>> ------------------------------------------------------------------------------
>>
>> src/hotspot/cpu/x86/templateInterpreterGenerator_x86.cpp
>> 753 bs->load_at(_masm, IN_HEAP | ON_WEAK_OOP_REF, T_OBJECT,
>> 754 rax, field_address, /*tmp1*/ rbx, /*tmp_thread*/ rdx);
>>
>> This needs to distinguish between WEAK and PHANTOM references, else it
>> will break ZGC's concurrent reference processing. (At least so long as
>> we still have to deal with finalization.)
>
> It does. This is ON_WEAK_OOP_REF. The get intrinsic is generated for
> WeakReference (and SoftReference) only. PhantomReference does not have
> a getter. Therefore, in all contexts this is intrinsified, it will
> refer to an ON_WEAK_OOP_REF.
>
>> ------------------------------------------------------------------------------
>>
>> src/hotspot/cpu/x86/templateInterpreterGenerator_x86.cpp
>> 698 address
>> TemplateInterpreterGenerator::generate_Reference_get_entry(void) {
>>
>> Almost all the code here (and much of the commentary) was/is specific
>> to G1 (or SATB collectors). Shouldn't it all be in the barrier_set's
>> load_at? As it is now, if (say) we're using Parallel GC then I think
>> there's a bunch of additional code being generated and executed here
>> that wasn't present before.
>
> It is true that interpreter Reference.get() intrinsics with
> Parallel/CMS/Serial currently run a few instructions more than they
> need to pay for the abstraction. There is no doubt in my mind that
> this abstraction is worth the cost. Reference.get in the interpreter
> is not a hot path, and does not need to optimize every cycle.
>
> I changed the G1 comments to more general comments instead.
>
>> ------------------------------------------------------------------------------
>>
>> src/hotspot/cpu/x86/methodHandles_x86.cpp
>> 170 __ load_heap_oop(method_temp, Address(recv,
>> NONZERO(java_lang_invoke_MethodHandle::form_offset_in_bytes())));
>> =>
>> 175 bs->load_at(_masm, IN_HEAP, T_OBJECT, method_temp,
>> Address(recv,
>> NONZERO(java_lang_invoke_MethodHandle::form_offset_in_bytes())), t
>>
>> This doesn't look like an improvement in code readability to me. Why
>> can't bs->load_at be the implementation of "__ load_heap_oop" ?
>>
>> And the lines are getting really long; so long that webrev is cutting
>> off the right end, so I can't review the new versions of the lines.
>
> Due to popular demand, I have done what you (and Coleen) both
> proposed: retained the load_heap_oop abstraction.
>
> Before:
> * load_heap_oop was the "raw" variant, similar to
> oopDesc::load_decode_heap_oop, which no longer exists.
>
> Now:
> * load_heap_oop calls MacroAssembler::access_load_at
> * access_load_at grabs the active BarrierSetAssembler and calls
> load_at on it.
> ... same idea for stores.
> * What used to be the raw versions for oop loads and stores, has moved
> to BarrierSetAssembler::load_at/store_at.
>
> These accessors now optionally take temporary register and decorator
> parameters than you don't have to supply:
> * supplying temprary register is polite to the GC so its accesses can
> be a bit better
> * supplying decorators can be done if it is not a default access. For
> example, AS_RAW can now be used to perform a raw load instead.
>
>> ------------------------------------------------------------------------------
>>
>> src/hotspot/cpu/x86/methodHandles_x86.cpp
>> 363 __ load_heap_oop(temp2_defc, member_clazz);
>> =>
>> 368 BarrierSetAssembler* bs =
>> BarrierSet::barrier_set()->barrier_set_assembler();
>> 369 Register rthread = LP64_ONLY(r15_thread) NOT_LP64(noreg);
>> 370 bs->load_at(_masm, IN_HEAP, T_OBJECT, temp2_defc,
>> member_clazz, noreg, rthread);
>>
>> Similarly, this doesn't seem like an improvement to me. And there are
>> more like this.
>
> Fixed.
>
>> ------------------------------------------------------------------------------
>>
>> src/hotspot/cpu/x86/macroAssembler_x86.hpp
>>
>> It would reduce the clutter in this change to leave the as_Address
>> declarations where they were, since that place is now public, rather
>> than moving them to a different public section.
>
> Fixed.
>
>> ------------------------------------------------------------------------------
>>
>>
>> Generally, it seems like it might be nice for the MacroAssembler class
>> to have a reference to the barrier_set, or maybe even the bs's
>> assembler, rather than looking them up in various places, and in
>> different ways. For example, I see each of
>>
>> Universe::heap()->barrier_set()
>> BarrerSet::barrier_set()
>>
>> used in various different places *in new code*. Some consistency
>> would be nice.
>
> I think it seems fine to just fetch them only in the
> access_load/store_at wrappers.
>
>> ------------------------------------------------------------------------------
>>
>> src/hotspot/cpu/x86/macroAssembler_x86.cpp
>> 5249 bs->load_at(this, IN_ROOT | ON_PHANTOM_OOP_REF, T_OBJECT,
>> 5250 value, Address(value,
>> -JNIHandles::weak_tag_value), tmp, thread);
>> 5251 verify_oop(value);
>>
>> This puts the verify_oop after the G1 pre-barrier, where in the old
>> code verify_oop was before the pre-barrier. I remember the old order
>> being intentional, though don't off-hand recall how important that
>> was. If for no other reason, it seems like the new order could rarely
>> (due to bad timing) have a bad oop reported somewhere far away during
>> SATB buffer processing, rather than at the point of the load.
>>
>> With the new version's order there could be one verify after the done
>> label. That's just a minor nit.
>
> Conversely, I think the new order is better and more GC agnostic.
> It avoids, for example, sending in bad oops for verification with ZGC.
> The more GC-agnostic interface is that after the load_at
>
>> ------------------------------------------------------------------------------
>>
>> src/hotspot/cpu/x86/gc/g1/g1BarrierSetAssembler_x86.cpp
>> 230 #ifdef _LP64
>> 231 if (c_rarg1 != thread) {
>> 232 __ mov(c_rarg1, thread);
>> 233 }
>> 234 if (c_rarg0 != pre_val) {
>> 235 __ mov(c_rarg0, pre_val);
>> 236 }
>> 237 #else
>> 238 __ push(thread);
>> 239 __ push(pre_val);
>> 240 #endif
>>
>> Old code used:
>>
>> __ pass_arg1(thread);
>> __ pass_arg0(pre_val);
>>
>> Why is this being changed? Oh, pass_argN are static helpers not
>> available outside macroAssembler_x86.cpp. Maybe they should be
>> exposed in some way, like public helpers in the x86 version of the
>> MacroAssembler class, rather than copied inline here (and maybe other
>> places?).
>
> Fixed. (added a wrapper for this to MacroAssembler, as suggested)
>
>> ------------------------------------------------------------------------------
>>
>> src/hotspot/cpu/x86/interp_masm_x86.cpp
>> Deleted these lines:
>> 519 // The resulting oop is null if the reference is not yet resolved.
>> 520 // It is Universe::the_null_sentinel() if the reference
>> resolved to NULL via condy.
>>
>> I don't think those comment lines should be deleted.
>
> Fixed.
>
>> ------------------------------------------------------------------------------
>>
>> src/hotspot/cpu/x86/gc/shared/modRefBarrierSetAssembler_x86.cpp
>> 86 void ModRefBarrierSetAssembler::store_at(MacroAssembler* masm,
>> DecoratorSet decorators, BasicType type,
>> 87 Address dst, Register val,
>> Register tmp1, Register tmp2) {
>> 88 if (type == T_OBJECT || type == T_ARRAY) {
>> 89 oop_store_at(masm, decorators, type, dst, val, tmp1, tmp2);
>> 90 } else {
>> 91 BarrierSetAssembler::store_at(masm, decorators, type, dst,
>> val, tmp1, tmp2);
>> 92 }
>> 93 }
>>
>> Doesn't the conditional conversion from store_at to oop_store_at
>> actually indicate a bug in the caller? That is, calling store_at with
>> a oop (T_OBJECT or T_ARRAY) value seems like a bug, and should be
>> asserted against, rather than translating.
>>
>> And oop_store_at should assert that the value *is* an oop value.
>>
>> And should there be any verification that the decorators and the type
>> are consistent?
>>
>> But maybe I'm not understanding the protocol around oop_store_at?
>> There being no documentation, it seems entirely possible that I've
>> misunderstood something here. Maybe I'm being confused by
>> similarities in naming with Access:: that don't hold?
>
> The confusion roots in that Access and BarrierSetAssembler are not
> structured exactly the same.
>
> Access provides store_at and oop_store_at. The reason for providing
> both of these, is that I could not safely infer whether the passed in
> parameters were oops or not without changing oop and narrowOop to
> always be value objects, which I did not dare to do as the generated
> code was worse.
>
> In BarrierSetAssembler, there is no such restriction. The type for the
> access is always passed in to store_at/load_at as BasicType. It is the
> responsibility of ModRefBarrierSetAssembler to filter away non-oop
> references, and call oop_store_at for relevant accesses for ModRef.
> This follows the same pattern that the arraycopy stubs used. This
> allows, e.g. CardTableModRef to only override oop_store_at. This is
> similar to how ModRefBarrierSet::AccessBarrier<decorators,
> BarrierSetT>::oop_store_in_heap calls the concrete barriersets for
> help generating that access. I hope this helps understanding the
> structuring.
>
> This is also the reason why G1BarrierSetAssembler::load_at checks
> whether it is an oop that is loaded or not, because loads are not
> riding on the ModRef layer, which only knows about oop stores.
>
>> ------------------------------------------------------------------------------
>>
>> src/hotspot/cpu/x86/gc/shared/cardTableBarrierSetAssembler_x86.cpp
>> 88 void CardTableBarrierSetAssembler::store_check(MacroAssembler*
>> masm, Register obj, Address dst) {
>> 89 // Does a store check for the oop in register obj. The content of
>> 90 // register obj is destroyed afterwards.
>>
>> The comment states obj is an oop, but when called by the precise case
>> of oop_store_at, it is not an oop, but rather a derived pointer. I
>> think it's the comment that's wrong.
>
> I changed the comment to refer to obj as an oop* instead of an oop.
>
>> ------------------------------------------------------------------------------
>>
>>
>> I was surprised that oop_store_at is initially declared at the
>> ModRefBarrierSetAssembler level, rather than at the
>> BarrierSetAssembler level.
>>
>> I was also surprised to not see an oop_load_at.
>>
>> In Access we have _at suffixed operations paired with non-_at suffixed
>> operations, the difference being the _at suffixed operations have a
>> base object, to allow dealing with the brooks-ptr. I'm not finding any
>> such distinction in the xxxAssembler API. Presumably I've missed
>> something somewhere. How is that handled?
>
> I think I previously already explained this one. ModRef does not know
> about oop loads, only stores. Therefore, G1 must override load_at, and
> not oop_load_at.
>
> Thanks,
> /Erik
>
>> ------------------------------------------------------------------------------
>>
>>
>>
>
More information about the hotspot-dev
mailing list