Request for reviews (M): 6614597: Performance variability in jvm2008 xml.validation

Vladimir Kozlov Vladimir.Kozlov at Sun.COM
Fri Jan 22 20:59:50 PST 2010


On other hand with this fix we may never get bimorphic code on multi-cores machines.
But without it and with Reason_bimorphic we will compile at least once bimorphic code
optimistically and may really hit bimorphic case.
I see xml.validation score lower with this fix then with Reason_bimorphic.

Vladimir

Vladimir Kozlov wrote:
> Tom,
> 
> I agree with you that we need to cleanup deopt reasons table but it is 
> for an other bug
> unless you want me to fix it in these changes.
> 
> After spending today on investigating the bimorphic problem I found next.
> The reason we always recompile the call site as bimorphic is not atomic 
> increment
> of MDO counters and incorrect morphism set in such case.
> 
> com/sun/tools/javac/code/Types$5.visitClassType()
> 
> uncommon traps in bci 49:
> 
> 49 invokevirtual 50 <getEnclosingType>; <()Lcom/sun/tools/javac/code/Type;>
>   240 bci: 49   VirtualCallData     trap(class_check recompiled) 
> flags(192) count(20871188) entries(2)
>                                     
> "com/sun/tools/javac/code/Type$ClassType"(18296976)
>                                     
> "com/sun/tools/javac/code/Type$ClassType$1"(3378474)
> 
> As you see, total counter (20871188) < (rec0_count (18296976) + 
> rec1_count (3378474)).
> In such case iMethod::call_profile_at_bci(int bci) will incorrectly
> set morphism to 2 and replaces total counter with the sum of 
> rec0_count+rec1_count
> so that counters numbers in LogCompilation looks fine:
> 
> <call method='697' count='21617491' prof_factor='1' virtual='1' 
> inline='1' receiver='667' receiver_count='18249149' receiver2='678' 
> receiver2_count='3368342'/>
> 
> Here is call_profile_at_bci() code:
> 
>         // Determine call site's morphism.
>         // The call site count could be == (receivers_count_total + 1)
>         // not only in the case of a polymorphic call but also in the case
>         // when a method data snapshot is taken after the site count was 
> updated
>         // but before receivers counters were updated.
>         if (morphism == result._limit) {
>            // There were no array klasses and morphism <= MorphismLimit.
>            if (morphism <  ciCallProfile::MorphismLimit ||
>                morphism == ciCallProfile::MorphismLimit &&
>                (receivers_count_total+1) >= count) {  <<<<<<<<<<<!!!!! 
> this causing the problem
>              result._morphism = morphism;
>            }
>         }
>         // Make the count consistent if this is a call profile. If count is
>         // zero or less, presume that this is a typecheck profile and
>         // do nothing.  Otherwise, increase count to be the sum of all
>         // receiver's counts.
>         if (count > 0) {
>           if (count < receivers_count_total) {
>             count = receivers_count_total;
>           }
>         }
>       }
>       result._count = count;
> 
> I did it because I never imaging that total counter will be less then 
> the sum.
> 
> So the fix, I think, should be next to allow receivers_count_total be only
> in the range [count-1, count+1] for bimorphic case:
> 
>                morphism == ciCallProfile::MorphismLimit &&
> -              (receivers_count_total+1) >= count) {
> +              (count <= receivers_count_total+1 && 
> receivers_count_total <= count+1)) {
>              result._morphism = morphism;
> 
> And we don't need Reason_bimorphic with this fix.
> 
> Thanks,
> Vladimir
> 
> 
> Tom Rodriguez wrote:
>> But if I understand reason_recorded_per_bytecode_if_any correctly, its 
>> purpose is to use a counter which is recorded per bci for one that 
>> isn't.  The same bytecode couldn't have both a div0 and null check so 
>> it's use the null check count for the div0 count.  If you convert 
>> Reason_bimorphic to Reason_class_check then it's like you don't a 
>> Reason_bimorphic counter anymore since you'll be reading the 
>> Reason_class_check counter and we're back where we started.  The 
>> compiler needs to be able to distinguish failure of the mono case from 
>> failure of the bi case.  If you want to smear it to a different 
>> counter then why not use a counter which isn't used at call sites at 
>> all like Reason_range_check?
>>
>> Actually I think Reason_RECORDED_LIMIT is computed wrong.  
>> Reason_unloaded is currently given a slot in _trap_hist._array because 
>> of this requirement:
>>
>> assert(DS_REASON_MASK >= Reason_RECORDED_LIMIT, "enough bits");
>>
>>     Reason_RECORDED_LIMIT = Reason_unloaded   // some are not recorded 
>> per 
>> bc                                                                               
>>
>> DS_REASON_MASK is 7 and Reason_unloaded is currently 7 but why would 
>> we want to record Reason_unloaded per_bci?  Why isn't 
>> Reason_RECORDED_LIMIT = Reason_intrinsic?  For that matter why can't 
>> we does Reason_none get a counter?  If we corrected the limits of the 
>> per bci counter then we could really give Reason_bimorphic it's own 
>> per bci counter.
>>
>> tom
>>
>> On Jan 22, 2010, at 3:19 PM, Vladimir Kozlov wrote:
>>
>>> Tom,
>>>
>>> John does not suggest to remove Reason_bimorphic. He suggests next:
>>>
>>> src/share/vm/runtime/deoptimization.hpp
>>>
>>>  static DeoptReason reason_recorded_per_bytecode_if_any(DeoptReason 
>>> reason) {
>>>    if (reason_is_recorded_per_bytecode(reason))
>>>      return reason;
>>>    else if (reason == Reason_div0_check) // null check due to 
>>> divide-by-zero?
>>>      return Reason_null_check;           // recorded per BCI as a 
>>> null check
>>> +   else if (reason == Reason_bimorphic)  // Piggy back 
>>> Reason_bimorphic on the
>>> +     return Reason_class_check;          // Reason_class_check per 
>>> bytecode count.
>>>    else
>>>      return Reason_none;
>>>  }
>>>
>>> Which will allow only to remove the next new code in deoptimization.cpp:
>>>
>>> +        if (reason == Reason_bimorphic) {
>>> +          // This isn't recorded per bci because of MDO limitations
>>> +          // but lets piggy back the invalidation on the
>>> +          // Reason_class_check count.
>>> +          uint prior_trap_count = 
>>> trap_mdo->trap_count(Reason_bimorphic);
>>> +          if (prior_trap_count >= (uint)PerBytecodeTrapLimit) {
>>> +            make_not_entrant = true;
>>> +          }
>>> +        } else {
>>>
>>> But since reason_recorded_per_bytecode_if_any() is used in other 
>>> places I am looking if it is OK there also.
>>>
>>> Vladimir
>>>
>>> John Rose wrote:
>>>> On Jan 22, 2010, at 2:59 PM, Tom Rodriguez wrote:
>>>>> I don't see how this will work right.  The whole point of 
>>>>> introducing a new reason was to get a separate counter to 
>>>>> distinguish between the monomorphic and bimorphic predicted call 
>>>>> cases.  Otherwise you can never make the mono to bi transition.  
>>>>> Maybe I don't understand what changing 
>>>>> reason_recorded_per_bytecode_if_any in this way would do.  I'd 
>>>>> prefer to have a per bytecode counter for bimorphic but we didn't 
>>>>> have any left.
>>>> It's a little better than the previous code, because at least the 
>>>> state changes are distinct now, because of the reasons.
>>>> There is also a small distinction between counter states.  If the 
>>>> per-nmethod counter for Reason_bimorphic is zero, then the 
>>>> per-bytecode count is disregarded, even though it is non-zero due to 
>>>> aliased Reason_class_check failures.  Note that maybe_prior_trap 
>>>> returned by query_update_method_data will be false even though the 
>>>> per-bytecode count (which is a single bit) is non-zero.
>>>> This allows one free bimorphism deopt per nmethod.  If this isn't 
>>>> enough, maybe we can downgrade one of the other one-bit counts to a 
>>>> per-nmethod count.
>>>> -- John
>>


More information about the hotspot-compiler-dev mailing list