code movement from slow path into fast path

Thu Mar 27 13:17:55 UTC 2014

question about code movement and fast_path_probability:

My snippet looks like this...

        Word thread = thread();
        Word top = atomicGetAndAddTlabTop(thread, size);
        Word end = readTlabEnd(thread);
        Word newTop = top.add(size);
        if (useTLAB() && probability(FAST_PATH_PROBABILITY, newTop.belowOrEqual(end))) {
            // writeTlabTop(thread, newTop) was done by the atomicGetAndAdd
            result = formatObject(hub, size, top, prototypeMarkWord, fillContents);
        } else {
            // slow path requiring eden access, etc.
        }

The generated hsail is shown below.  Why would the moves to $d8 and $d9 registers which are used only on the slow path be moved before the compare instruction?

	atomic_add_global_u64   $d4, [$d20 + 136], 24;     // $d20 = thread register
	ld_global_s64 $d5, [$d20 + 152];                   // readTlabEnd  
	add_s64 $d6, $d4, 0x18;                            // newTop = top + size 
	mov_b64 $d7, 0x100102d58;                          // class info for the class being allocated
	mov_b64 $d8, 0x7f001c0223b0;                       // eden-related pointer used only on the slow path
	mov_b64 $d9, 0x7f001c022388;                       // ditto
	cmp_lt_b1_u64 $c0, $d5, $d6;                       // newTop.belowOrEqual(end)
	cbr $c0, @L10;                                     // @L10 = slow path
@L26:
	ld_global_s64 $d5, [$d7 + 176];                    // fast path object format, etc.
	st_global_s64 $d5, [$d4 + 0];
	st_global_s32 537003435, [$d4 + 8];
	st_global_s32 0, [$d4 + 12];
	st_global_s64 0, [$d4 + 16];
       [...]

-- Tom