RFR (S): CR 8009120: Fuzz instruction scheduling in Hotspot compilers

Thu Feb 28 06:35:29 PST 2013

Thanks, Vladimir! (applies to both of you :)

On 02/28/2013 12:12 PM, Vladimir Kozlov wrote:
> I was a little confused about this "improvement". You should have said
> "stress testing of instructions scheduling in C2".

Yes, I should have mentioned that. I wasn't using the words
"improvement" anywhere, only CR seems to be listed as RFE. I'll make
this more explicit next time.

> Factor out your new code in one function to calculate fuzz result and
> call it from lcm and gcm (pass fuzz_div as parameter). Then you don't
> need to hijack the current code so drastically. 

Done.

> Your change in lcm is
> incorrect - you missed (choice == n_choice &&). So, please, try to keep
> original code (in lcm and gcm) as it was. There are some hidden
> dependencies we are always forgetting.

Whoa. Thanks, not sure what I was thinking omitting that clause. And
yes, the question about hidden dependencies is what I need your
expertise for.

> You can define FuzzInstructionScheduling as intx with 0 (switched off)
> default value instead of defining it as local constant. It will allow to
> do experiments with different values.

I think there is the confusion over the fuzzing domain constant.
Hopefully I made it clear in the next patch why do we need the domain
constant. I don't think there's much of the value to make this
user-settable (it fact, I'd make it MAX_INT).

> Add FuzzInstructionScheduling flag into opto/c2_globals.hpp instead of
> runtime/globals.hpp since you use it only in C2.

Done.

> There is an other old bug which you may hit if you randomize placement
> in gcm: 6831314. I still did not find a solution which does not
> introduce performance regression. What saves us is placing loads into
> "cheaper" low frequency block (which is most nested block where load's
> result is used).

Ok, on these grounds, and the fact my current concurrency tests are
failing even with LCM changes, I split the FuzzInstructionScheduling
into FuzzLCM and FuzzGCM, so we can fuzz these parts on their own.

Also, GCM selector honors the same condition for "cheaper" low frequency
block to avoid 6831314.

> self->is_iteratively_computed() is not correctness check. It is
> performance check to prevent using 2 registers: one for previous IV
> (loop's increment variable) value (since there will be its uses after
> increment) and one for IV incremented value. I need to look on history
> of this code (before mercurial) to decide if we should skip hoisting for
> it as you suggested.

Ok, thanks. Nevertheless, I reverted the condition back. When we do the
fuzzing, we don't particularly care about performance, so that check is
not done.

Here's the updated webrev:
 http://cr.openjdk.java.net/~shade/8009120/webrev.01/

(works OK on my x86_64 laptop, running JPRT with various fuzzing modes
as we speak; code style issues are welcome).

Is there the preprocessor pragma similar to NOT_PRODUCT, but able to
emit the default value? That way we can do these guys cleaner:

 false NOT_PRODUCT(|| (expression)) -> NOT_PRODUCT(expression, false)
  true NOT_PRODUCT(&& (expression)) -> NOT_PRODUCT(expression, true)

Thanks,
-Aleksey.