RFR (S): 8067014: LinearScan::is_sorted significantly slows down fastdebug builds' performance
Christian Thalinger
christian.thalinger at oracle.com
Mon Feb 22 19:39:40 UTC 2016
> On Feb 18, 2016, at 10:05 PM, Filipp Zhinkin <filipp.zhinkin at gmail.com> wrote:
>
> Hi Vladimir,
>
> On Thu, Feb 18, 2016 at 7:08 PM, Vladimir Ivanov
> <vladimir.x.ivanov at oracle.com> wrote:
>> Thanks, Filipp.
>>
>> I suggest to address CommentedAssembly separately.
>>
>> One question: why don't you simply typedef IntervalArray/IntervalList to
>> GrowableArray<Interval*>? It will eliminate numerous renamings you did.
>
> Well, I'd prefer to explicitly declare type (unless there are numerous
> nested template args).
> But I'm fine with using typedef there.
Please don’t. There is no good reason to use a typedef and it’s just confusing to the reader. To avoid renaming changes should never be a reason for not doing something.
>
> Here is updated webrev:
> http://cr.openjdk.java.net/~fzhinkin/8067014/webrev.03/
>
> Thanks,
> Filipp.
>
>>
>> Otherwise, looks good.
>>
>> Best regards,
>> Vladimir Ivanov
>>
>>
>> On 2/18/16 6:31 PM, Filipp Zhinkin wrote:
>>>
>>> Hi,
>>>
>>> I've looked at how frequently misses are actually occur and
>>> how far false positives are from the interval we're looking for.
>>>
>>> Also I've tried to implement interval_cmp so that it returns 0
>>> if difference between interval "from" values is below some threshold:
>>> http://cr.openjdk.java.net/~fzhinkin/8067014/webrev.02/stats.txt
>>>
>>> All those misses with distance greater than 64 came from
>>> javax.swing.plaf.synth.SynthStyle::populateDefaultValues [1].
>>>
>>> I've also looked to another possible slowness sources and
>>> we spend about 10% of time in LinearScan's verify_intervals method
>>> which checks that every two intervals don't simultaneously intersect
>>> and share the same register [2].
>>>
>>> I don't see a way to significantly speed up such verification,
>>> but I've slightly improved performance by rearranging some expressions.
>>>
>>> Here is an updated webrev:
>>> http://cr.openjdk.java.net/~fzhinkin/8067014/webrev.02/
>>>
>>> Also, unless CommentedAssembly flag is explicitly turned off,
>>> we're generating comments for stubs even if we're not going to print it
>>> out.
>>> Avoiding comments generation in such case will speed up compilation a bit
>>> more,
>>> but I think it would be better to deal with it in a separate RFE.
>>> Difference in code emission time is about 30% when CommentedAssembly is
>>> off
>>> (~ 40s w/ CommentedAssembly, ~ 25s w/o CommentedAssembly).
>>>
>>> [1]
>>> http://hg.openjdk.java.net/jdk9/hs-comp/jdk/file/6c649a7ac744/src/java.desktop/share/classes/javax/swing/plaf/synth/SynthStyle.java#l68
>>> [2]
>>> http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/file/cffca6de2c45/src/share/vm/c1/c1_LinearScan.cpp#l3226
>>>
>>> On Fri, Feb 12, 2016 at 7:08 PM, Filipp Zhinkin
>>> <filipp.zhinkin at gmail.com> wrote:
>>>>
>>>> Hi Aleksey,
>>>>
>>>> On Fri, Feb 12, 2016 at 3:24 PM, Aleksey Shipilev
>>>> <aleksey.shipilev at oracle.com> wrote:
>>>>>
>>>>> Hi Filipp,
>>>>>
>>>>> On 02/12/2016 02:47 PM, Filipp Zhinkin wrote:
>>>>>>
>>>>>> here is a new webrev:
>>>>>> http://cr.openjdk.java.net/~fzhinkin/8067014/webrev.01/
>>>>>
>>>>>
>>>>> The webrev seems incomplete: it has only hotspot.patch in it, but no
>>>>> other views?
>>>>
>>>>
>>>> It seems like only wdiff's are empty for some reason.
>>>> What else is missed out there?
>>>>
>>>>>
>>>>> I wonder how many intervals have the same "from", prompting you to
>>>>> wiggle around looking for the exact interval?
>>>>
>>>>
>>>> Well, there should be (relatively) many intervals with "from" == 0
>>>> created for physical registers.
>>>> For virtual registers there could be few intervals that share the same
>>>> "from" value:
>>>> it depends on amount of temporary registers required by an operation
>>>> and amount of outputs it produces.
>>>>
>>>> So we may simply scan intervals from beginning if key's from value is 0.
>>>>
>>>>> Can we define
>>>>> "interval_cmp" so that "(interval_cmp(i1, i2) == 0) iff (i1 == i2)",
>>>>
>>>>
>>>> No, unfortunately we can't, because intervals are ordered only by "from"
>>>> value.
>>>>
>>>>> or at least make the false positives less frequent with more extensive
>>>>> interval key (assuming collisions are indeed problematic)?
>>>>>
>>>>
>>>> Not sure that I've got you.
>>>>
>>>> Nevertheless, I'll run CTW and check how many false positives are
>>>> actually found.
>>>>
>>>>>
>>>>>> I've hacked VM sources a bit to run CTW with product bits and C1
>>>>>> compilation time on my x86_64 linux laptop
>>>>>> slowed down by 0.4% (from 51029 ± 306 ms to 51230 ± 293 ms). Please
>>>>>> let me know if it too slow.
>>>>>
>>>>>
>>>>> I think this is within the error margin, and therefore statistically
>>>>> insignificant. Even if it was significant, 0.4% is okay for compilation
>>>>> time regression in C1.
>>>>>
>>>>>> With fastdebug bits provided patch allow to reduce C1 compilation time
>>>>>> twice.
>>>>>
>>>>>
>>>>> This is a very good improvement, but we need to see if that's the end of
>>>>> it, or we can squeeze even more with a few changes. I would suggest
>>>>> running the CTW scenario under Solaris Studio Performance Analyzer (see
>>>>> e.g.
>>>>>
>>>>> http://shipilev.net/blog/2016/arrays-wisdom-ancients/#_meet_solaris_studio_performance_analyzer).
>>>>
>>>>
>>>> Thank you for the suggestion, I'll check it.
>>>>
>>>> Regards,
>>>> Filipp.
>>>>
>>>>>
>>>>> Thanks,
>>>>> -Aleksey
>>>>>
>>>>>
>>
More information about the hotspot-compiler-dev
mailing list