Request for reviews (M): 6614597: Performance variability in jvm2008 xml.validation
Vladimir Kozlov
Vladimir.Kozlov at Sun.COM
Fri Jan 22 20:59:50 PST 2010
On other hand with this fix we may never get bimorphic code on multi-cores machines.
But without it and with Reason_bimorphic we will compile at least once bimorphic code
optimistically and may really hit bimorphic case.
I see xml.validation score lower with this fix then with Reason_bimorphic.
Vladimir
Vladimir Kozlov wrote:
> Tom,
>
> I agree with you that we need to cleanup deopt reasons table but it is
> for an other bug
> unless you want me to fix it in these changes.
>
> After spending today on investigating the bimorphic problem I found next.
> The reason we always recompile the call site as bimorphic is not atomic
> increment
> of MDO counters and incorrect morphism set in such case.
>
> com/sun/tools/javac/code/Types$5.visitClassType()
>
> uncommon traps in bci 49:
>
> 49 invokevirtual 50 <getEnclosingType>; <()Lcom/sun/tools/javac/code/Type;>
> 240 bci: 49 VirtualCallData trap(class_check recompiled)
> flags(192) count(20871188) entries(2)
>
> "com/sun/tools/javac/code/Type$ClassType"(18296976)
>
> "com/sun/tools/javac/code/Type$ClassType$1"(3378474)
>
> As you see, total counter (20871188) < (rec0_count (18296976) +
> rec1_count (3378474)).
> In such case iMethod::call_profile_at_bci(int bci) will incorrectly
> set morphism to 2 and replaces total counter with the sum of
> rec0_count+rec1_count
> so that counters numbers in LogCompilation looks fine:
>
> <call method='697' count='21617491' prof_factor='1' virtual='1'
> inline='1' receiver='667' receiver_count='18249149' receiver2='678'
> receiver2_count='3368342'/>
>
> Here is call_profile_at_bci() code:
>
> // Determine call site's morphism.
> // The call site count could be == (receivers_count_total + 1)
> // not only in the case of a polymorphic call but also in the case
> // when a method data snapshot is taken after the site count was
> updated
> // but before receivers counters were updated.
> if (morphism == result._limit) {
> // There were no array klasses and morphism <= MorphismLimit.
> if (morphism < ciCallProfile::MorphismLimit ||
> morphism == ciCallProfile::MorphismLimit &&
> (receivers_count_total+1) >= count) { <<<<<<<<<<<!!!!!
> this causing the problem
> result._morphism = morphism;
> }
> }
> // Make the count consistent if this is a call profile. If count is
> // zero or less, presume that this is a typecheck profile and
> // do nothing. Otherwise, increase count to be the sum of all
> // receiver's counts.
> if (count > 0) {
> if (count < receivers_count_total) {
> count = receivers_count_total;
> }
> }
> }
> result._count = count;
>
> I did it because I never imaging that total counter will be less then
> the sum.
>
> So the fix, I think, should be next to allow receivers_count_total be only
> in the range [count-1, count+1] for bimorphic case:
>
> morphism == ciCallProfile::MorphismLimit &&
> - (receivers_count_total+1) >= count) {
> + (count <= receivers_count_total+1 &&
> receivers_count_total <= count+1)) {
> result._morphism = morphism;
>
> And we don't need Reason_bimorphic with this fix.
>
> Thanks,
> Vladimir
>
>
> Tom Rodriguez wrote:
>> But if I understand reason_recorded_per_bytecode_if_any correctly, its
>> purpose is to use a counter which is recorded per bci for one that
>> isn't. The same bytecode couldn't have both a div0 and null check so
>> it's use the null check count for the div0 count. If you convert
>> Reason_bimorphic to Reason_class_check then it's like you don't a
>> Reason_bimorphic counter anymore since you'll be reading the
>> Reason_class_check counter and we're back where we started. The
>> compiler needs to be able to distinguish failure of the mono case from
>> failure of the bi case. If you want to smear it to a different
>> counter then why not use a counter which isn't used at call sites at
>> all like Reason_range_check?
>>
>> Actually I think Reason_RECORDED_LIMIT is computed wrong.
>> Reason_unloaded is currently given a slot in _trap_hist._array because
>> of this requirement:
>>
>> assert(DS_REASON_MASK >= Reason_RECORDED_LIMIT, "enough bits");
>>
>> Reason_RECORDED_LIMIT = Reason_unloaded // some are not recorded
>> per
>> bc
>>
>> DS_REASON_MASK is 7 and Reason_unloaded is currently 7 but why would
>> we want to record Reason_unloaded per_bci? Why isn't
>> Reason_RECORDED_LIMIT = Reason_intrinsic? For that matter why can't
>> we does Reason_none get a counter? If we corrected the limits of the
>> per bci counter then we could really give Reason_bimorphic it's own
>> per bci counter.
>>
>> tom
>>
>> On Jan 22, 2010, at 3:19 PM, Vladimir Kozlov wrote:
>>
>>> Tom,
>>>
>>> John does not suggest to remove Reason_bimorphic. He suggests next:
>>>
>>> src/share/vm/runtime/deoptimization.hpp
>>>
>>> static DeoptReason reason_recorded_per_bytecode_if_any(DeoptReason
>>> reason) {
>>> if (reason_is_recorded_per_bytecode(reason))
>>> return reason;
>>> else if (reason == Reason_div0_check) // null check due to
>>> divide-by-zero?
>>> return Reason_null_check; // recorded per BCI as a
>>> null check
>>> + else if (reason == Reason_bimorphic) // Piggy back
>>> Reason_bimorphic on the
>>> + return Reason_class_check; // Reason_class_check per
>>> bytecode count.
>>> else
>>> return Reason_none;
>>> }
>>>
>>> Which will allow only to remove the next new code in deoptimization.cpp:
>>>
>>> + if (reason == Reason_bimorphic) {
>>> + // This isn't recorded per bci because of MDO limitations
>>> + // but lets piggy back the invalidation on the
>>> + // Reason_class_check count.
>>> + uint prior_trap_count =
>>> trap_mdo->trap_count(Reason_bimorphic);
>>> + if (prior_trap_count >= (uint)PerBytecodeTrapLimit) {
>>> + make_not_entrant = true;
>>> + }
>>> + } else {
>>>
>>> But since reason_recorded_per_bytecode_if_any() is used in other
>>> places I am looking if it is OK there also.
>>>
>>> Vladimir
>>>
>>> John Rose wrote:
>>>> On Jan 22, 2010, at 2:59 PM, Tom Rodriguez wrote:
>>>>> I don't see how this will work right. The whole point of
>>>>> introducing a new reason was to get a separate counter to
>>>>> distinguish between the monomorphic and bimorphic predicted call
>>>>> cases. Otherwise you can never make the mono to bi transition.
>>>>> Maybe I don't understand what changing
>>>>> reason_recorded_per_bytecode_if_any in this way would do. I'd
>>>>> prefer to have a per bytecode counter for bimorphic but we didn't
>>>>> have any left.
>>>> It's a little better than the previous code, because at least the
>>>> state changes are distinct now, because of the reasons.
>>>> There is also a small distinction between counter states. If the
>>>> per-nmethod counter for Reason_bimorphic is zero, then the
>>>> per-bytecode count is disregarded, even though it is non-zero due to
>>>> aliased Reason_class_check failures. Note that maybe_prior_trap
>>>> returned by query_update_method_data will be false even though the
>>>> per-bytecode count (which is a single bit) is non-zero.
>>>> This allows one free bimorphism deopt per nmethod. If this isn't
>>>> enough, maybe we can downgrade one of the other one-bit counts to a
>>>> per-nmethod count.
>>>> -- John
>>
More information about the hotspot-compiler-dev
mailing list