code movement from slow path into fast path

Thu Mar 27 18:40:06 UTC 2014

The initial dominating placement is due to commoning of constants.  Rematerialization isn’t disabled as far as I can see.  The implementation has a few problems though since there’s no def-use info in linear scan.  For instance the initial materialization can hang around even if all the uses have picked up other materialized copies.  That should be fixable and is the source of some ugly looking generated code.

When last this was being worked on we were also switching back and forth between ConstantNodeRecordsUsages which behaves differently as well and may have lead to our current state.  When ConstantNodeRecordsUsages is false it enable a sort of lazy commoning of constants which I think is probably a better initial cut.  It might work best if it could ensure that it doesn’t share uncommon constants and force their materialization in a common path as in Tom’s example.

So I think smarter sharing logic in LIRGenerator along with some rematerialization fixes should get us where we want to be.

tom

On Mar 27, 2014, at 11:24 AM, Krystal Mok <rednaxelafx at gmail.com> wrote:

> Hi Tom,
> 
> Just curious, is that behavior related to the (disabled) rematerialization of constants in LinearScan?
> 
> Thanks,
> Kris
> 
> 
> On Thu, Mar 27, 2014 at 10:58 AM, Tom Rodriguez <tom.rodriguez at oracle.com> wrote:
> Constant placement is somewhat suboptimal currently.  We’re trying to share them when we can so initially they are place at a point that dominates it’s usages.  There are probably later uses though we probably don’t end up sharing them because of spilling.  I’ve seen some much worse cases and I’ve looked at it a bit, but I think we need to revisit how we handle them both during LIR generation and in the register allocator.  It’s definitely wrong for your example.
> 
> tom
> 
> On Mar 27, 2014, at 6:17 AM, Deneau, Tom <tom.deneau at amd.com> wrote:
> 
> > question about code movement and fast_path_probability:
> >
> > My snippet looks like this...
> >
> >        Word thread = thread();
> >        Word top = atomicGetAndAddTlabTop(thread, size);
> >        Word end = readTlabEnd(thread);
> >        Word newTop = top.add(size);
> >        if (useTLAB() && probability(FAST_PATH_PROBABILITY, newTop.belowOrEqual(end))) {
> >            // writeTlabTop(thread, newTop) was done by the atomicGetAndAdd
> >            result = formatObject(hub, size, top, prototypeMarkWord, fillContents);
> >        } else {
> >            // slow path requiring eden access, etc.
> >        }
> >
> > The generated hsail is shown below.  Why would the moves to $d8 and $d9 registers which are used only on the slow path be moved before the compare instruction?
> >
> >       atomic_add_global_u64   $d4, [$d20 + 136], 24;     // $d20 = thread register
> >       ld_global_s64 $d5, [$d20 + 152];                   // readTlabEnd
> >       add_s64 $d6, $d4, 0x18;                            // newTop = top + size
> >       mov_b64 $d7, 0x100102d58;                          // class info for the class being allocated
> >       mov_b64 $d8, 0x7f001c0223b0;                       // eden-related pointer used only on the slow path
> >       mov_b64 $d9, 0x7f001c022388;                       // ditto
> >       cmp_lt_b1_u64 $c0, $d5, $d6;                       // newTop.belowOrEqual(end)
> >       cbr $c0, @L10;                                     // @L10 = slow path
> > @L26:
> >       ld_global_s64 $d5, [$d7 + 176];                    // fast path object format, etc.
> >       st_global_s64 $d5, [$d4 + 0];
> >       st_global_s32 537003435, [$d4 + 8];
> >       st_global_s32 0, [$d4 + 12];
> >       st_global_s64 0, [$d4 + 16];
> >       [...]
> >
> > -- Tom
> >
> 
>