RFR(XS): 8015437: SPARC cbcond offset value out of range
Vladimir Kozlov
vladimir.kozlov at oracle.com
Fri Jun 7 10:29:44 PDT 2013
On 6/7/13 10:11 AM, David Chase wrote:
> My v9 manual might be a little dated, but according to it, annulling the delay slot instruction for an always-taken branch guarantees that the instruction following the branch will not be executed as a consequence of executing the branch (except for ba,a .+4). There's no need to put a nop there. This was certainly, absolutely true on V8 -- I did spend several years working on a code generator for that architecture, though it was 20 years ago.
"Not executed" does not mean we can remove delay slot (4 bytes after
branch). Put it this way - all sparc branch instructions are 8 bytes
long (except cbcond). From "Control-Transfer Instructions" in manual:
"Most control transfers are of the delayed variety. The instruction
following a delayed control-transfer instruction is said to be in the
delay slot of the control-transfer instruction."
The difference between v8 and v9 is in fetching of annulled instruction
in delay slot, you still have to have something in delay slot even on v9
(could be 0):
"The SPARC V8 architecture specified that the delay instruction was
always fetched, even if annulled, and that an annulled instruction could
not cause any traps. The SPARC V9 architecture does not require the
delay instruction to be fetched if it is annulled."
And it does not matter what you put in delay slot when it is not
executed. We decided to put nop because until now we did supported v8.
Vladimir
>
> David
>
> On 2013-06-07, at 12:22 PM, Vladimir Kozlov <vladimir.kozlov at oracle.com> wrote:
>
>> On 6/7/13 9:15 AM, David Chase wrote:
>>> I think I'm confused. If it's a ba,a (or the newer BPA,a,pt), there's no need for a NOP, since it is never executed.
>>> We can save 4 whole bytes of code space/instruction cache.
>>
>> ALL SPARC branch instructions, except new cbcond, HAVE delay slot instruction. Period! You only can control execution of delayed instruction with annulled bit.
>> That is why on T4 cbcond was added. Traces showed that most instructions in delay slot are nops.
>>
>> Vladimir
>>
>>>
>>> David
>>>
>>> On 2013-06-07, at 12:08 PM, Vladimir Kozlov <vladimir.kozlov at oracle.com> wrote:
>>>
>>>> On 6/7/13 8:56 AM, David Chase wrote:
>>>>> Wouldn't we want to call it ba_without_delay or something like that?
>>>>
>>>> It could be misleading since it could be interpreted as without delay slot. Our case is nop in delay slot which we should not execute.
>>>> We can use neutral name ba_long() as opposite to ba_short().
>>>>
>>>> Vladimir
>>>>
>>>>>
>>>>> David
>>>>>
>>>>> On 2013-06-07, at 11:47 AM, Vladimir Kozlov <vladimir.kozlov at oracle.com> wrote:
>>>>>
>>>>>> David is right.
>>>>>>
>>>>>> We have a lot of places in sparc code where we are sloppy with annulled bit for direct branch and nop() in delay slot. I count about 10 cases which br() instruction and 15 with ba() which is not really bad.
>>>>>>
>>>>>> We need separate macroassembler instruction for such case.
>>>>>> ba_with_nop()?
>>>>>>
>>>>>> Good starter task :)
>>>>>>
>>>>>> Thanks,
>>>>>> Vladimir
>>>>>>
>>>>>> On 6/7/13 4:59 AM, David Chase wrote:
>>>>>>> Huh. ba,a (annulled branch -- if I recall, the sense of the annulled bit is reversed for ba) ought to get the job done without the delay-slot nop.
>>>>>>>
>>>>>>> David
>>>>>>>
>>>>>>> On 2013-06-07, at 6:54 AM, Morris Meyer <morris.meyer at oracle.com> wrote:
>>>>>>>
>>>>>>>> I needed to have the delayed()->nop() in there as ba() crashed for me.
>>>>>>>>
>>>>>>>>
>>>>>>>> --mm
>>>>>>>>
>>>>>>>>
>>>>>>>> On Jun 7, 2013, at 12:35 AM, Vladimir Kozlov <vladimir.kozlov at oracle.com> wrote:
>>>>>>>>
>>>>>>>>> Morris,
>>>>>>>>>
>>>>>>>>> Why you used br(always, false, pt, slow_case) instead of ba(slow_case)?
>>>>>>>>>
>>>>>>>>> Which is the same but simpler and it was specially added to avoid messing with parameters of br() instruction.
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Vladimir
>>>>>>>>>
>>>>>>>>> On 6/6/13 9:22 PM, Morris Meyer wrote:
>>>>>>>>>> Folks,
>>>>>>>>>>
>>>>>>>>>> Could I get a review for this issue? The problem exists that we
>>>>>>>>>> optimistically assign forward branch labels and re-patch later. When we
>>>>>>>>>> emit a cbcond instruction - we check if the Label out of bounds. If it
>>>>>>>>>> is already bound the check is fine - if not bound a zero offset is
>>>>>>>>>> emitted and the label is patched later
>>>>>>>>>>
>>>>>>>>>> With Vladimir Kozlov's urgings I put about 4,000 lines of changes to add
>>>>>>>>>> __FILE__ and __LINE__ parameters to every assembler.bind() call and
>>>>>>>>>> checked the return of every patched branch to find out the line and
>>>>>>>>>> location of the offending bind. The bind distance from
>>>>>>>>>> NewInstanceStub::emit_code() in c1_CodeStubs_sparc.cpp was to far from
>>>>>>>>>> __ allocate_object macro assembler call in LIRGenerator::new_instance
>>>>>>>>>> which uses the short branch in MacroAssembler::eden_allocate()
>>>>>>>>>>
>>>>>>>>>> This change has been through JPRT.
>>>>>>>>>>
>>>>>>>>>> --morris
>>>>>>>>>>
>>>>>>>>>> WEBREV - http://cr.openjdk.java.net/~morris/JDK-8015437.01
>>>>>>>>>> JBS - https://jbs.oracle.com/bugs/browse/JDK-8015437
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>
>>>>>
>>>
>
More information about the hotspot-compiler-dev
mailing list