RFR (S): CR 8009120: Fuzz instruction scheduling in Hotspot compilers
Vladimir Kozlov
vladimir.kozlov at oracle.com
Thu Feb 28 12:46:04 PST 2013
Aleksey,
First, about your fuzzing code. fuzz_accept() will always return 'true'
for count == 1, after that probability of 'true' will decrease with each
count's increment. So you will always select first candidate which is
not random. Why you don't want a uniform distribution? Also why you want
only one candidate to be selected (if I understand code correctly)?
Could you change "fuzzing", "fuzz" to "stressing", "stress" in flags
names and comments. It will be in the line with other stress flags/code
we have already.
I also think fuzz_accept() could be different name, may be
fuzz_selected(cand) or something?
To avoid next complex code
+ ( (false NOT_PRODUCT(|| ( FuzzGCM && fuzz_accept(cand_cnt)))))
|| // Or fuzzing is enabled
+ ( (true NOT_PRODUCT(&& (!FuzzGCM))) && // Or fuzzing is
disabled, then:
define flags as 'develop' (not 'notproduct') so they will be accessible
(but not modifiable) in product code. Pre-calculate fuzz candidate
before check expression:
bool fuzz_candidate = FuzzGCM && fuzz_selected(++fuzz_candidate_cnt);
if (LCA_freq < least_freq || // Better Frequency
+ fuzz_candidate ||
( !in_latency && // No block containing latency
You don't care about performance so it should be fine to precalculate
fuzz even if it will not be used.
The same in lcm:
bool fuzz_candidate = FuzzLCM && fuzz_selected(++fuzz_candidate_cnt);
( choice == n_choice &&
+ ( fuzz_candidate ||
latency < n_latency ||
You don't need second check (true NOT_PRODUCT(&& !FuzzLCM)) && in such
case.
Phase class is bad place for fuzz_accept() method. We usually put such
methods into Compile class if you want to use this method in other
places in C2.
Why you need C2_PHASE_ in C2_PHASE_FUZZ_DOMAIN name? Do we have other
FUZZ_DOMAIN? You don't need to worry about that if you move method body
and FUZZ_DOMAIN definition into .cpp file.
Also why not choose 2^n value so that '%' could be simple masking.
Thanks,
Vladimir
On 2/28/13 6:35 AM, Aleksey Shipilev wrote:
> Thanks, Vladimir! (applies to both of you :)
>
> On 02/28/2013 12:12 PM, Vladimir Kozlov wrote:
>> I was a little confused about this "improvement". You should have said
>> "stress testing of instructions scheduling in C2".
>
> Yes, I should have mentioned that. I wasn't using the words
> "improvement" anywhere, only CR seems to be listed as RFE. I'll make
> this more explicit next time.
>
>> Factor out your new code in one function to calculate fuzz result and
>> call it from lcm and gcm (pass fuzz_div as parameter). Then you don't
>> need to hijack the current code so drastically.
>
> Done.
>
>> Your change in lcm is
>> incorrect - you missed (choice == n_choice &&). So, please, try to keep
>> original code (in lcm and gcm) as it was. There are some hidden
>> dependencies we are always forgetting.
>
> Whoa. Thanks, not sure what I was thinking omitting that clause. And
> yes, the question about hidden dependencies is what I need your
> expertise for.
>
>> You can define FuzzInstructionScheduling as intx with 0 (switched off)
>> default value instead of defining it as local constant. It will allow to
>> do experiments with different values.
>
> I think there is the confusion over the fuzzing domain constant.
> Hopefully I made it clear in the next patch why do we need the domain
> constant. I don't think there's much of the value to make this
> user-settable (it fact, I'd make it MAX_INT).
>
>> Add FuzzInstructionScheduling flag into opto/c2_globals.hpp instead of
>> runtime/globals.hpp since you use it only in C2.
>
> Done.
>
>> There is an other old bug which you may hit if you randomize placement
>> in gcm: 6831314. I still did not find a solution which does not
>> introduce performance regression. What saves us is placing loads into
>> "cheaper" low frequency block (which is most nested block where load's
>> result is used).
>
> Ok, on these grounds, and the fact my current concurrency tests are
> failing even with LCM changes, I split the FuzzInstructionScheduling
> into FuzzLCM and FuzzGCM, so we can fuzz these parts on their own.
>
> Also, GCM selector honors the same condition for "cheaper" low frequency
> block to avoid 6831314.
>
>> self->is_iteratively_computed() is not correctness check. It is
>> performance check to prevent using 2 registers: one for previous IV
>> (loop's increment variable) value (since there will be its uses after
>> increment) and one for IV incremented value. I need to look on history
>> of this code (before mercurial) to decide if we should skip hoisting for
>> it as you suggested.
>
> Ok, thanks. Nevertheless, I reverted the condition back. When we do the
> fuzzing, we don't particularly care about performance, so that check is
> not done.
>
> Here's the updated webrev:
> http://cr.openjdk.java.net/~shade/8009120/webrev.01/
>
> (works OK on my x86_64 laptop, running JPRT with various fuzzing modes
> as we speak; code style issues are welcome).
>
> Is there the preprocessor pragma similar to NOT_PRODUCT, but able to
> emit the default value? That way we can do these guys cleaner:
>
> false NOT_PRODUCT(|| (expression)) -> NOT_PRODUCT(expression, false)
> true NOT_PRODUCT(&& (expression)) -> NOT_PRODUCT(expression, true)
>
> Thanks,
> -Aleksey.
>
More information about the hotspot-compiler-dev
mailing list