RFR (S): CR 8009120: Fuzz instruction scheduling in Hotspot compilers
Vladimir Kozlov
vladimir.kozlov at oracle.com
Thu Feb 28 00:12:20 PST 2013
I was a little confused about this "improvement". You should have said
"stress testing of instructions scheduling in C2".
Factor out your new code in one function to calculate fuzz result and
call it from lcm and gcm (pass fuzz_div as parameter). Then you don't
need to hijack the current code so drastically. Your change in lcm is
incorrect - you missed (choice == n_choice &&). So, please, try to keep
original code (in lcm and gcm) as it was. There are some hidden
dependencies we are always forgetting.
You can define FuzzInstructionScheduling as intx with 0 (switched off)
default value instead of defining it as local constant. It will allow to
do experiments with different values.
Add FuzzInstructionScheduling flag into opto/c2_globals.hpp instead of
runtime/globals.hpp since you use it only in C2.
There is an other old bug which you may hit if you randomize placement
in gcm: 6831314. I still did not find a solution which does not
introduce performance regression. What saves us is placing loads into
"cheaper" low frequency block (which is most nested block where load's
result is used).
self->is_iteratively_computed() is not correctness check. It is
performance check to prevent using 2 registers: one for previous IV
(loop's increment variable) value (since there will be its uses after
increment) and one for IV incremented value. I need to look on history
of this code (before mercurial) to decide if we should skip hoisting for
it as you suggested.
Thanks,
Vladimir
On 2/27/13 1:35 AM, Aleksey Shipilev wrote:
> Hi,
>
> Haven't got any response from the last note, so here's the more complete
> changeset. Ping me if something should be done to match the integration
> requirements, control flow rewired to match usual conventions, etc.
>
> The idea for the fuzzer is quite simple:
> - LCM schedules the instructions within the basic block, choosing the
> schedule based on the node input count, latency, etc; we can hijack the
> selection process and randomize the choice.
> - GCM selects the basic block up the dominator tree with the best
> frequency/latency to fit the Node in; we can hijack this and randomize
> the placement.
>
> Webrev:
> http://cr.openjdk.java.net/~shade/8009120/webrev.00/
>
> I have a question about self->is_iteratively_computed() in GCM though.
> This condition seems to ensure correctness, and should be enforced even
> when LCA_freq < least_freq?
>
> Testing:
> - full HotSpot JPRT cycle in oob mode (still running, ~80%, no issues)
> - full HotSpot JPRT cycle with forced -XX:+FuzzInstructionScheduling
> - ad-hoc java-concurrency-torture with -XX:+FuzzInstructionScheduling
>
> The observation is that all tests are fine the fuzzing either turned
> on/off, so intra-thread semantics is not broken. Concurrency torture
> tests start to fail with fuzzing, marking the inter-thread semantics is
> violated by compiler (these are the cases we want to catch in that
> tests), these failures are similar to CR 8007898:
> http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=8007898
>
> Thanks,
> -Aleksey.
>
More information about the hotspot-compiler-dev
mailing list