Request for reviews (L): 7063628: Use cbcond on T4
Tom Rodriguez
tom.rodriguez at oracle.com
Thu Jul 14 16:59:29 PDT 2011
On Jul 14, 2011, at 4:43 PM, Vladimir Kozlov wrote:
> Thank you, Tom
>
> Tom Rodriguez wrote:
>> assembler_sparc.hpp:
>> If we wanted to be clever about the delay slot issue we could add an function like maybe_delayed() that would emit the instruction in a side buffer and place it in the right place for these instructions. Basically you could emit it as if it were not delayed and then move it into the delay slot if you actually have one. So instead of:
>> __ br_null(G3_scratch, false, __ pt, skip_fixup, false);
>> __ delayed()->ld_ptr(G5_method, in_bytes(methodOopDesc::interpreter_entry_offset()), G3_scratch);
>> you do __ maybe_delayed()->ld_ptr(G5_method, in_bytes(methodOopDesc::interpreter_entry_offset()), G3_scratch);
>> __ br_null(G3_scratch, false, __ pt, skip_fixup);
>> It's a little tweaky but it would allow cbcond to be used in more places.
>
> I will file RFE for that.
>
>> You need some comments explaining when the instruction emits the delay slot for you. I would think this should only be done for names which are clearly not real instruction names. I don't like the extra argument that controls whether the delay slot is emitted or not. It's a bit mysterious in a casual reading. Maybe the false version should be _no_delay? I'm not sure how I'd want to this work but it seems a bit subtle at a first glance.
>
> Originally I thought to add second version of macroassembler branch instructions with suffix _short. Then old instructions will stay the same and _short will generate cbcond or branch with delayed nop. But then I decided to go with passing additional flag. But I see what you mean, may be I should go to the original idea.
I think being a little more explicit is better.
>
>> Get rid of the emit_delayed_nop stuff in ba. It's just a confusing way to emit "br(always, false, pt, L)" so callers should just use that directly. Maybe instead of being called ba is should br_always so it's clear that it's a macro instruction. You can then leave the existing ba alone.
>
> I am also using emit_delayed_nop flag to control when cbcond could be emited. For the example bellow *update_done is in far distance but it is forward branch and during ba() generation the distance is unknown. In such case the distance set to 0 and cbcond will be emitted.
>
> I will leave the original ba and add new ba_short.
Oh I see. Yes, a ba_short to parallel jccb would make sense. Is there benefit to a short form in other cases?
>
>> cbc doesn't match the docs which use cbcond to refer to them generically. It's also slightly odd that the real instructions are called cwb and cxb but we don't have any way to emit those directly but I guess the generic forms take care of it. I'm not a huge fan of the instructions that take CC as an argument. I might prefer to see cwb, cxb and cb_ptr to cover the ptr_cc case but given the limited number of uses I can live with what you've done.
>
> I will rename cbc to cbcond.
>
>> Why did you remove the Condition argument from br_zero? It can work with the new encoding.
>
> I think it is confusing to have name br_zero and have conditions. And in most cases it was indeed check for zero. With condition we need different name, I think: cmp_zero_and_br?
I guess it was confusingly named. cmp_zero_and_br sounds good.
>
>> c1_LIRAssembler_sparc.cpp:
>> Why isn't this:
>> ! __ ba(false, *update_done);
>> ! __ ba(*update_done, false);
>> __ delayed()->nop();
>> simply
>> __ ba(*update_done);
>> That actually applies to a bunch of other places too.
>
> I answered it above: emit_delayed_nop flag controls when cbcond is emited.
>
>> sharedRuntime_sparc.cpp:
>> I think the annulling here is wrong or at least meaningless. ! __ cmp_and_brx(temp_reg, G5_inline_cache_reg, Assembler::equal, true, Assembler::pt, L);
>> __ brx(Assembler::equal, true, Assembler::pt, L);
>> __ delayed()->nop();
>
> Why? This code trys to not execute nop if branch is not taken.
Annulling doesn't skip the execution, it just keeps the side effects from occurring. In either case it's meaningless when applied to a nop. It was probably just a copy/paste from some other place where it was meaningful.
tom
>
>> cmp_and_brx shouldn't even have an annul argument. Actually none of these new macro instructions should.
>
> Agree.
>
> Thanks,
> Vladimir
>
>> compile.cpp:
>> + if (n->is_Branch() && n->as_Mach()->ideal_Opcode() != Op_Jump) {
>> + // Fake label for branch instruction.
>> + MacroAssembler masm(&buf);
>> + Label fakeL;
>> + masm.bind(fakeL);
>> + n->as_Mach()->label_set(fakeL, 0);
>> + }
>> What guarantees do we get about the lifetime of the storage for fakeL since it doesn't appear to be used? I would assume that at a minimum it should be declared outside of the if.
>> You can clean up all the "l ? (l->loc_pos() - (cbuf.insts_size() + 1)) : 0;" code in x86_*.ad after this fix too. We can even use the MacroAssembler for these instrutions now if we want to though that's probably for another day.
>> Should labelOper::label have an assert that it's non-null when called? That would cover all the other asserts you added though I'm not sure it must be true.
>> The overall change looks good.
>> tom
>> On Jul 13, 2011, at 8:44 PM, Vladimir Kozlov wrote:
>>> http://cr.openjdk.java.net/~kvn/7063628/webrev
>>>
>>> Fixed 7063628: Use cbcond on T4
>>>
>>> Added new MacroAssembler instructions cmp_and_br() and old branch instructions are modified to use cbcond on T4 if distance is small (2K bytes). Most of the rest changes are usage of these new branch instructions in Interpreter and C1.
>>>
>>> The prototype was done by Tom and I took some of his additional fixes. The formssel.cpp change is a bug fix where the MatchNode equality test wasn't recursing so it would mistakenly return true for complex matches. Added a fake label for branches generated in temp buffer by MachNode::emit_size(), added assert into .ad files to check label.
>>>
>>> There was problem in is_in_wdisp16_range() (and in initial implementation of use_cbc()) which calls target(L) before emitting branch. Non-bound (forward branch) labels record current pc() as a branch address and later try to patch instruction (which could be 'cmp') at that address.
>>>
>>> Removed unused code in check_klass_subtype_fast_path().
>>>
>>> Modified vm_version string: has_ prefix is removed and only v8 or v9 will be printed as well only niagara_plus or niagara.
>>>
>>> Tested on T4 with CTW, nsk, VM and java invoke regression tests.
More information about the hotspot-compiler-dev
mailing list