RFR(M): 8031754: Type speculation should favor profile data from outermost inlined method
Vladimir Kozlov
vladimir.kozlov at oracle.com
Mon Feb 3 22:29:19 PST 2014
Hi Roland,
On 2/3/14 1:09 PM, Roland Westrelin wrote:
>
> Hi Vladimir,
>
> Answers are inlined below
>
>>>> Yes, intuitively it looks correct but I am still not sure that we should do that.
>>>>
>>>> How big affect this has on score? I don't see any data.
>>>> Can you write microbenchmark to show such behavior and benefits from these changes?
>>>
>>> This is a test that triggers the problem:
>>>
>>> http://cr.openjdk.java.net/~roland/8031754/TestInlineDepth.java
>>>
>>> m3() is the method that is compiled early. It doesn’t use the profile so there’s no uncommon trap and it doesn’t get a chance to run interpreted again and so update the profile.
>>> m5() is inlined in m1() and ideally inlines C.m(). Without this patch, the profile from m3() is the one that is used for inlining in m5() and A.m() is inlined. That causes an uncommon trap when the compile code for m1() runs and so m1() is recompiled. It inlines A.m() again but this time with a slow case of doing the call.
>>> With the change that I’m proposing, C.m() is inlined in m1(). There’s no uncommon trap.
>>
>> Very nice test, thanks!
>>
>>> Still no numbers. But with this test case it would be a simple matter of finding code that optimizes very well once C.m() is inlined to make it look very good, right?
>>
>> I don't think it would be "simple matter" but agree that we can get such case in some applications.
>>
>>>
>>> With nashorn, one run of the richards.js benchmark was be good and the next one would be 40% slower because of a missed optimization. I believe this is one of the causes (the big one being the trap change that was reviewed recently). But for some reason, I can’t reproduce it anymore as I said in a previous mail.
>>
>> Can you put the code in graphKit.cpp under flag and try to run the benchmark with it on and off?
>
> The nashorn benchmark?
> I don’t expect I’ll see a difference because something apparently changed in the way I run the benchmark and I can't seem to figure out what it is. So I can’t reproduce this problem anymore. So I expect the run with it on and off will show the same results. I can do it anyway if you like.
Okay. I assume there is no any regression with these changes. Right?
>
>> In general I am comfortable to have inline_depth as additional type's attribute. But only for speculative types when it make sense.
>>
>> So why you keep _inline_depth in remove_speculative()?
>
> Because it’s never used in practice with non speculative types (it should always be InlineDepthBottom for a non speculative type). The only place where it changes the behavior of compilation is in GraphKit::record_profile_for_speculation() where it’s used only for speculative types. Anyway, I can reset it to InlineDepthBottom in remove_speculative() if you like.
Yes, please, set to InlineDepthBottom. It will be consistent with types without speculative types.
>
>> Also inline_depth is not type so you need to be careful with meet and dual code. Can you explain why you decided to use negation and min?
>> For example, instance_id is done differently.
>
> I use min because when this happens:
>
> 2144 Node* cast = new(C) CheckCastPPNode(control(), n, current_type->remove_speculative()->join_speculative(spec_type));
>
> I need the inline depth of spec_type to be kept. current_type->remove_speculative() has inline depth InlineDepthBottom. The meet of the dual is min(-InlineDepthBottom, -inline_depth of spec_type) which is - inline_depth of spec_type. Then I use negation because it keeps the type system symmetric. And the InlineDepthTop and InlineDepthBottom are chosen so that meet(depth, InlineDepthTop) = depth and meet(depth, InlineDepthBottom) = InlineDepthBottom. Also it seemed to make sense that when 2 inline depths that are different and not InlineDepthBottom meet, we don’t loose the fact that they are non InlineDepthBottom depth.
But in this case you are lowing inline depth when meeting 2 speculative type which allow types from deeper inlining pass
the test in record_profile_for_speculation(). max could be more appropriate and in such case you can use positive value
for InlineDepthBottom which is more reasonable.
Thanks,
Vladimir
>
> Anyway it’s largely arbitrary, keeps the type system happy and seems good enough.
>
> Roland.
>
>>
>> Thanks,
>> Vladimir
>>
>>>
>>> Roland.
>>>
>>>>
>>>> Thanks,
>>>> Vladimir
>>>>
>>>> On 1/27/14 9:58 AM, Roland Westrelin wrote:
>>>>> Hi Vladimir,
>>>>>
>>>>>>> Thanks for looking at this, Vladimir.
>>>>>>>
>>>>>>>>> My wild guess is that the outermost method generally gives you the best
>>>>>>>>> (cleanest) context information. The inner callees may have been called
>>>>>>>>> from various callers, and so their type profiles may have been polluted
>>>>>>>>> by other callers.
>>>>>>>>
>>>>>>>> They can't be polluted. We record only one speculative type. If Interpreter see a different type it will set flags and that type information will not be used. At least that is how I understand it works. That is why I am asking Roland to clarify this.
>>>>>>>
>>>>>>> That’s the way they work. What can happen for this (and what happens sometimes I believe):
>>>>>>>
>>>>>>> m1() {
>>>>>>> m3();
>>>>>>> }
>>>>>>>
>>>>>>> m() {
>>>>>>> m1();
>>>>>>> m2();
>>>>>>> }
>>>>>>>
>>>>>>> is that m3() is a heavily used method called from some other place early during the application run. Some profile is recorded. The method becomes so hot that it gets compiled with c2. So we stop profiling. Then m3() is called from m1() but because it’s already compiled we don’t record the conflicting profile. When m() is compiled we are stuck with an incorrect profile and we may miss some optimization opportunities.
>>>>>>
>>>>>> Okay, it makes sense. So m3() has totally unrelated to current compilation type information. But you should have some uncommon traps which will force deoptimization of m3() if type is wrong when you run m1(). On other hand deoptimization may not happen fast enough before we compile m(). I agree we can have such case.
>>>>>>
>>>>>> What about case when m3() calls m4() only when m3() is called from m()->m1()? In such situation m4() will collect correct type for m() compilation(). But it will be overwritten by bad type from m3().
>>>>>
>>>>> But in this case, for m4() to have profile, the branch that calls m4() must have been run and the profile there should be consistent with the profile in m3()?
>>>>>
>>>>>>>
>>>>>>> As Krys said, intuitively "the outermost method generally gives you the best (cleanest) context information”. It’s only a heuristic and as such, it’s certainly easy to build a test case for which this heuristic doesn’t do a good job. It does help for experiments I made with nashorn so I’d like to see it used to confirm it does help.
>>>>>>
>>>>>> Is it possible to detect type discrepancy in m3() and ignore (by special marking) all non-matching speculative types in m3()?
>>>>>>
>>>>>> You already do prefer speculative type from call's parameters over type collected for input arguments. Right? May be use that as indication that types in inlined method could be wrong.
>>>>>
>>>>> That doesn’t sound very robust to me. If profile is inaccurate in the callee, most likely we haven’t run the caller long enough to have profiling, right?
>>>>>
>>>>> Another thing that could be done is record the type that causes the deoptimization in the speculative trap. Then, presumably if we hit a trap everything runs interpreted again, profile is updated and when we recompile and we see that a trap was hit, we can compare the new type with the type that caused the deoptimization and if they differ, try optimizing again. That’s more space required in the method data.
>>>>>
>>>>> I found this problem when running nashorn but I don’t seem to able to hit it again for some reason. So I could withdraw this change for now. I’m concerned it will show up again sooner or later anyway and time will be needed to figure out what is happening.
>>>>>
>>>>> Roland.
>>>>>
>>>>>>
>>>>>> Thanks,
>>>>>> Vladimir
>>>>>>
>>>>>>>
>>>>>>> Roland.
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> If you have merging case:
>>>>>>>>
>>>>>>>> if (x)
>>>>>>>> m1()
>>>>>>>> else
>>>>>>>> m2()
>>>>>>>>
>>>>>>>> I don't understand why at merge point information from m2 will be more precise then from m3() called from m1().
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Vladimir
>>>>>>>>
>>>>>>>>>
>>>>>>>>> - Kris
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Wed, Jan 22, 2014 at 7:51 PM, Vladimir Kozlov
>>>>>>>>> <vladimir.kozlov at oracle.com <mailto:vladimir.kozlov at oracle.com>> wrote:
>>>>>>>>>
>>>>>>>>> Roland,
>>>>>>>>>
>>>>>>>>> I don't see how inlining depth can define type's accuracy in general
>>>>>>>>> case. why it can't be reverse: more accurate type from most deeply
>>>>>>>>> inlined method?
>>>>>>>>>
>>>>>>>>> I thought you only have speculative type if it is the only one type
>>>>>>>>> record in MDO. How in your case you can have different types?
>>>>>>>>>
>>>>>>>>> Can you be more specific?
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Vladimir
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 1/22/14 1:42 AM, Roland Westrelin wrote:
>>>>>>>>>
>>>>>>>>> When a node already has a speculative type, and parsing
>>>>>>>>> encounters extra profiling data, the new profiling data is
>>>>>>>>> ignored. So profiling data coming from profile points closer to
>>>>>>>>> the root of the compilation is favored which I think makes
>>>>>>>>> sense: it's the data that is most specific to the context of
>>>>>>>>> this compilation.
>>>>>>>>>
>>>>>>>>> During runs, profile data is not always entirely coherent so we
>>>>>>>>> may hit something like this:
>>>>>>>>> m1() {
>>>>>>>>> m3();
>>>>>>>>> }
>>>>>>>>>
>>>>>>>>> m() {
>>>>>>>>> m1();
>>>>>>>>> m2();
>>>>>>>>> }
>>>>>>>>>
>>>>>>>>> With: m3() and m2() have profile data for the same node. The
>>>>>>>>> first profile data to be encountered during parsing is from m3()
>>>>>>>>> and profile data from m2() is ignored but profile data from m2()
>>>>>>>>> is the one that is actually the most specific and is the one
>>>>>>>>> that should be favored.
>>>>>>>>>
>>>>>>>>> When a speculative type is created, this change records the
>>>>>>>>> inline depth at which the profile point is. The inline depth is
>>>>>>>>> then propagated together with the rest of the type information.
>>>>>>>>> When new profile data is available for a node that already has a
>>>>>>>>> speculative type, the current inline depth and the inline depth
>>>>>>>>> of the current speculative type are used to decide whether the
>>>>>>>>> new data should be used to replace the existing speculative type.
>>>>>>>>>
>>>>>>>>> This change helps stabilize performance with nashorn.
>>>>>>>>>
>>>>>>>>> http://cr.openjdk.java.net/~__roland/8031754/webrev.00/
>>>>>>>>> <http://cr.openjdk.java.net/~roland/8031754/webrev.00/>
>>>>>>>>>
>>>>>>>>> Roland.
>>>>>
>>>
>
More information about the hotspot-compiler-dev
mailing list