RFR (S): CR 8009120: Fuzz instruction scheduling in Hotspot compilers

Thu Feb 28 00:12:20 PST 2013

I was a little confused about this "improvement". You should have said 
"stress testing of instructions scheduling in C2".

Factor out your new code in one function to calculate fuzz result and 
call it from lcm and gcm (pass fuzz_div as parameter). Then you don't 
need to hijack the current code so drastically. Your change in lcm is 
incorrect - you missed (choice == n_choice &&). So, please, try to keep 
original code (in lcm and gcm) as it was. There are some hidden 
dependencies we are always forgetting.

You can define FuzzInstructionScheduling as intx with 0 (switched off) 
default value instead of defining it as local constant. It will allow to 
do experiments with different values.

Add FuzzInstructionScheduling flag into opto/c2_globals.hpp instead of 
runtime/globals.hpp since you use it only in C2.

There is an other old bug which you may hit if you randomize placement 
in gcm: 6831314. I still did not find a solution which does not 
introduce performance regression. What saves us is placing loads into 
"cheaper" low frequency block (which is most nested block where load's 
result is used).

self->is_iteratively_computed() is not correctness check. It is 
performance check to prevent using 2 registers: one for previous IV 
(loop's increment variable) value (since there will be its uses after 
increment) and one for IV incremented value. I need to look on history 
of this code (before mercurial) to decide if we should skip hoisting for 
it as you suggested.

Thanks,
Vladimir

On 2/27/13 1:35 AM, Aleksey Shipilev wrote:
> Hi,
>
> Haven't got any response from the last note, so here's the more complete
> changeset. Ping me if something should be done to match the integration
> requirements, control flow rewired to match usual conventions, etc.
>
> The idea for the fuzzer is quite simple:
>    - LCM schedules the instructions within the basic block, choosing the
> schedule based on the node input count, latency, etc; we can hijack the
> selection process and randomize the choice.
>    - GCM selects the basic block up the dominator tree with the best
> frequency/latency to fit the Node in; we can hijack this and randomize
> the placement.
>
> Webrev:
>    http://cr.openjdk.java.net/~shade/8009120/webrev.00/
>
> I have a question about self->is_iteratively_computed() in GCM though.
> This condition seems to ensure correctness, and should be enforced even
> when LCA_freq < least_freq?
>
> Testing:
>    - full HotSpot JPRT cycle in oob mode (still running, ~80%, no issues)
>    - full HotSpot JPRT cycle with forced -XX:+FuzzInstructionScheduling
>    - ad-hoc java-concurrency-torture with -XX:+FuzzInstructionScheduling
>
> The observation is that all tests are fine the fuzzing either turned
> on/off, so intra-thread semantics is not broken. Concurrency torture
> tests start to fail with fuzzing, marking the inter-thread semantics is
> violated by compiler (these are the cases we want to catch in that
> tests), these failures are similar to CR 8007898:
>    http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=8007898
>
> Thanks,
> -Aleksey.
>