RFR (S): 8067014: LinearScan::is_sorted significantly slows down fastdebug builds' performance

Mon Feb 22 19:39:40 UTC 2016

> On Feb 18, 2016, at 10:05 PM, Filipp Zhinkin <filipp.zhinkin at gmail.com> wrote:
> 
> Hi Vladimir,
> 
> On Thu, Feb 18, 2016 at 7:08 PM, Vladimir Ivanov
> <vladimir.x.ivanov at oracle.com> wrote:
>> Thanks, Filipp.
>> 
>> I suggest to address CommentedAssembly separately.
>> 
>> One question: why don't you simply typedef IntervalArray/IntervalList to
>> GrowableArray<Interval*>? It will eliminate numerous renamings you did.
> 
> Well, I'd prefer to explicitly declare type (unless there are numerous
> nested template args).
> But I'm fine with using typedef there.

Please don’t.  There is no good reason to use a typedef and it’s just confusing to the reader.  To avoid renaming changes should never be a reason for not doing something.

> 
> Here is updated webrev:
> http://cr.openjdk.java.net/~fzhinkin/8067014/webrev.03/
> 
> Thanks,
> Filipp.
> 
>> 
>> Otherwise, looks good.
>> 
>> Best regards,
>> Vladimir Ivanov
>> 
>> 
>> On 2/18/16 6:31 PM, Filipp Zhinkin wrote:
>>> 
>>> Hi,
>>> 
>>> I've looked at how frequently misses are actually occur and
>>> how far false positives are from the interval we're looking for.
>>> 
>>> Also I've tried to implement interval_cmp so that it returns 0
>>> if difference between interval "from" values is below some threshold:
>>> http://cr.openjdk.java.net/~fzhinkin/8067014/webrev.02/stats.txt
>>> 
>>> All those misses with distance greater than 64 came from
>>> javax.swing.plaf.synth.SynthStyle::populateDefaultValues [1].
>>> 
>>> I've also looked to another possible slowness sources and
>>> we spend about 10% of time in LinearScan's verify_intervals method
>>> which checks that every two intervals don't simultaneously intersect
>>> and share the same register [2].
>>> 
>>> I don't see a way to significantly speed up such verification,
>>> but I've slightly improved performance by rearranging some expressions.
>>> 
>>> Here is an updated webrev:
>>> http://cr.openjdk.java.net/~fzhinkin/8067014/webrev.02/
>>> 
>>> Also, unless CommentedAssembly flag is explicitly turned off,
>>> we're generating comments for stubs even if we're not going to print it
>>> out.
>>> Avoiding comments generation in such case will speed up compilation a bit
>>> more,
>>> but I think it would be better to deal with it in a separate RFE.
>>> Difference in code emission time is about 30% when CommentedAssembly is
>>> off
>>> (~ 40s w/ CommentedAssembly, ~ 25s w/o CommentedAssembly).
>>> 
>>> [1]
>>> http://hg.openjdk.java.net/jdk9/hs-comp/jdk/file/6c649a7ac744/src/java.desktop/share/classes/javax/swing/plaf/synth/SynthStyle.java#l68
>>> [2]
>>> http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/file/cffca6de2c45/src/share/vm/c1/c1_LinearScan.cpp#l3226
>>> 
>>> On Fri, Feb 12, 2016 at 7:08 PM, Filipp Zhinkin
>>> <filipp.zhinkin at gmail.com> wrote:
>>>> 
>>>> Hi Aleksey,
>>>> 
>>>> On Fri, Feb 12, 2016 at 3:24 PM, Aleksey Shipilev
>>>> <aleksey.shipilev at oracle.com> wrote:
>>>>> 
>>>>> Hi Filipp,
>>>>> 
>>>>> On 02/12/2016 02:47 PM, Filipp Zhinkin wrote:
>>>>>> 
>>>>>> here is a new webrev:
>>>>>> http://cr.openjdk.java.net/~fzhinkin/8067014/webrev.01/
>>>>> 
>>>>> 
>>>>> The webrev seems incomplete: it has only hotspot.patch in it, but no
>>>>> other views?
>>>> 
>>>> 
>>>> It seems like only wdiff's are empty for some reason.
>>>> What else is missed out there?
>>>> 
>>>>> 
>>>>> I wonder how many intervals have the same "from", prompting you to
>>>>> wiggle around looking for the exact interval?
>>>> 
>>>> 
>>>> Well, there should be (relatively) many intervals with "from" == 0
>>>> created for physical registers.
>>>> For virtual registers there could be few intervals that share the same
>>>> "from" value:
>>>> it depends on amount of temporary registers required by an operation
>>>> and amount of outputs it produces.
>>>> 
>>>> So we may simply scan intervals from beginning if key's from value is 0.
>>>> 
>>>>> Can we define
>>>>> "interval_cmp" so that "(interval_cmp(i1, i2) == 0) iff (i1 == i2)",
>>>> 
>>>> 
>>>> No, unfortunately we can't, because intervals are ordered only by "from"
>>>> value.
>>>> 
>>>>> or at least make the false positives less frequent with more extensive
>>>>> interval key (assuming collisions are indeed problematic)?
>>>>> 
>>>> 
>>>> Not sure that I've got you.
>>>> 
>>>> Nevertheless, I'll run CTW and check how many false positives are
>>>> actually found.
>>>> 
>>>>> 
>>>>>> I've hacked VM sources a bit to run CTW with product bits and C1
>>>>>> compilation time on my x86_64 linux laptop
>>>>>> slowed down by 0.4% (from 51029 ± 306 ms to 51230 ± 293 ms). Please
>>>>>> let me know if it too slow.
>>>>> 
>>>>> 
>>>>> I think this is within the error margin, and therefore statistically
>>>>> insignificant. Even if it was significant, 0.4% is okay for compilation
>>>>> time regression in C1.
>>>>> 
>>>>>> With fastdebug bits provided patch allow to reduce C1 compilation time
>>>>>> twice.
>>>>> 
>>>>> 
>>>>> This is a very good improvement, but we need to see if that's the end of
>>>>> it, or we can squeeze even more with a few changes. I would suggest
>>>>> running the CTW scenario under Solaris Studio Performance Analyzer (see
>>>>> e.g.
>>>>> 
>>>>> http://shipilev.net/blog/2016/arrays-wisdom-ancients/#_meet_solaris_studio_performance_analyzer).
>>>> 
>>>> 
>>>> Thank you for the suggestion, I'll check it.
>>>> 
>>>> Regards,
>>>> Filipp.
>>>> 
>>>>> 
>>>>> Thanks,
>>>>> -Aleksey
>>>>> 
>>>>> 
>>