RFR (S): CR 8009120: Fuzz instruction scheduling in Hotspot compilers

Thu Feb 28 12:46:04 PST 2013

Aleksey,

First, about your fuzzing code. fuzz_accept() will always return 'true' 
for count == 1, after that probability of 'true' will decrease with each 
count's increment. So you will always select first candidate which is 
not random. Why you don't want a uniform distribution? Also why you want 
only one candidate to be selected (if I understand code correctly)?

Could you change "fuzzing", "fuzz" to "stressing", "stress" in flags 
names and comments. It will be in the line with other stress flags/code 
we have already.

I also think fuzz_accept() could be different name, may be 
fuzz_selected(cand) or something?

To avoid next complex code

+        ( (false NOT_PRODUCT(|| ( FuzzGCM && fuzz_accept(cand_cnt))))) 
|| // Or fuzzing is enabled
+        ( (true  NOT_PRODUCT(&& (!FuzzGCM))) && // Or fuzzing is 
disabled, then:

define flags as 'develop' (not 'notproduct') so they will be accessible 
(but not modifiable) in product code. Pre-calculate fuzz candidate 
before check expression:

bool fuzz_candidate = FuzzGCM && fuzz_selected(++fuzz_candidate_cnt);

       if (LCA_freq < least_freq     || // Better Frequency
+         fuzz_candidate ||
           ( !in_latency          &&    // No block containing latency

You don't care about performance so it should be fine to precalculate 
fuzz even if it will not be used.

The same in lcm:

bool fuzz_candidate = FuzzLCM && fuzz_selected(++fuzz_candidate_cnt);

           ( choice == n_choice &&
+           ( fuzz_candidate ||
               latency < n_latency ||

You don't need second check (true NOT_PRODUCT(&& !FuzzLCM)) && in such 
case.

Phase class is bad place for fuzz_accept() method. We usually put such 
methods into Compile class if you want to use this method in other 
places in C2.

Why you need C2_PHASE_ in C2_PHASE_FUZZ_DOMAIN name? Do we have other 
FUZZ_DOMAIN? You don't need to worry about that if you move method body 
and FUZZ_DOMAIN definition into .cpp file.

Also why not choose 2^n value so that '%' could be simple masking.

Thanks,
Vladimir

On 2/28/13 6:35 AM, Aleksey Shipilev wrote:
> Thanks, Vladimir! (applies to both of you :)
>
> On 02/28/2013 12:12 PM, Vladimir Kozlov wrote:
>> I was a little confused about this "improvement". You should have said
>> "stress testing of instructions scheduling in C2".
>
> Yes, I should have mentioned that. I wasn't using the words
> "improvement" anywhere, only CR seems to be listed as RFE. I'll make
> this more explicit next time.
>
>> Factor out your new code in one function to calculate fuzz result and
>> call it from lcm and gcm (pass fuzz_div as parameter). Then you don't
>> need to hijack the current code so drastically.
>
> Done.
>
>> Your change in lcm is
>> incorrect - you missed (choice == n_choice &&). So, please, try to keep
>> original code (in lcm and gcm) as it was. There are some hidden
>> dependencies we are always forgetting.
>
> Whoa. Thanks, not sure what I was thinking omitting that clause. And
> yes, the question about hidden dependencies is what I need your
> expertise for.
>
>> You can define FuzzInstructionScheduling as intx with 0 (switched off)
>> default value instead of defining it as local constant. It will allow to
>> do experiments with different values.
>
> I think there is the confusion over the fuzzing domain constant.
> Hopefully I made it clear in the next patch why do we need the domain
> constant. I don't think there's much of the value to make this
> user-settable (it fact, I'd make it MAX_INT).
>
>> Add FuzzInstructionScheduling flag into opto/c2_globals.hpp instead of
>> runtime/globals.hpp since you use it only in C2.
>
> Done.
>
>> There is an other old bug which you may hit if you randomize placement
>> in gcm: 6831314. I still did not find a solution which does not
>> introduce performance regression. What saves us is placing loads into
>> "cheaper" low frequency block (which is most nested block where load's
>> result is used).
>
> Ok, on these grounds, and the fact my current concurrency tests are
> failing even with LCM changes, I split the FuzzInstructionScheduling
> into FuzzLCM and FuzzGCM, so we can fuzz these parts on their own.
>
> Also, GCM selector honors the same condition for "cheaper" low frequency
> block to avoid 6831314.
>
>> self->is_iteratively_computed() is not correctness check. It is
>> performance check to prevent using 2 registers: one for previous IV
>> (loop's increment variable) value (since there will be its uses after
>> increment) and one for IV incremented value. I need to look on history
>> of this code (before mercurial) to decide if we should skip hoisting for
>> it as you suggested.
>
> Ok, thanks. Nevertheless, I reverted the condition back. When we do the
> fuzzing, we don't particularly care about performance, so that check is
> not done.
>
> Here's the updated webrev:
>   http://cr.openjdk.java.net/~shade/8009120/webrev.01/
>
> (works OK on my x86_64 laptop, running JPRT with various fuzzing modes
> as we speak; code style issues are welcome).
>
> Is there the preprocessor pragma similar to NOT_PRODUCT, but able to
> emit the default value? That way we can do these guys cleaner:
>
>   false NOT_PRODUCT(|| (expression)) -> NOT_PRODUCT(expression, false)
>    true NOT_PRODUCT(&& (expression)) -> NOT_PRODUCT(expression, true)
>
> Thanks,
> -Aleksey.
>