RFR (S): 8067014: LinearScan::is_sorted significantly slows down fastdebug builds' performance

Fri Feb 19 08:05:48 UTC 2016

Hi Vladimir,

On Thu, Feb 18, 2016 at 7:08 PM, Vladimir Ivanov
<vladimir.x.ivanov at oracle.com> wrote:
> Thanks, Filipp.
>
> I suggest to address CommentedAssembly separately.
>
> One question: why don't you simply typedef IntervalArray/IntervalList to
> GrowableArray<Interval*>? It will eliminate numerous renamings you did.

Well, I'd prefer to explicitly declare type (unless there are numerous
nested template args).
But I'm fine with using typedef there.

Here is updated webrev:
http://cr.openjdk.java.net/~fzhinkin/8067014/webrev.03/

Thanks,
Filipp.

>
> Otherwise, looks good.
>
> Best regards,
> Vladimir Ivanov
>
>
> On 2/18/16 6:31 PM, Filipp Zhinkin wrote:
>>
>> Hi,
>>
>> I've looked at how frequently misses are actually occur and
>> how far false positives are from the interval we're looking for.
>>
>> Also I've tried to implement interval_cmp so that it returns 0
>> if difference between interval "from" values is below some threshold:
>> http://cr.openjdk.java.net/~fzhinkin/8067014/webrev.02/stats.txt
>>
>> All those misses with distance greater than 64 came from
>> javax.swing.plaf.synth.SynthStyle::populateDefaultValues [1].
>>
>> I've also looked to another possible slowness sources and
>> we spend about 10% of time in LinearScan's verify_intervals method
>> which checks that every two intervals don't simultaneously intersect
>> and share the same register [2].
>>
>> I don't see a way to significantly speed up such verification,
>> but I've slightly improved performance by rearranging some expressions.
>>
>> Here is an updated webrev:
>> http://cr.openjdk.java.net/~fzhinkin/8067014/webrev.02/
>>
>> Also, unless CommentedAssembly flag is explicitly turned off,
>> we're generating comments for stubs even if we're not going to print it
>> out.
>> Avoiding comments generation in such case will speed up compilation a bit
>> more,
>> but I think it would be better to deal with it in a separate RFE.
>> Difference in code emission time is about 30% when CommentedAssembly is
>> off
>> (~ 40s w/ CommentedAssembly, ~ 25s w/o CommentedAssembly).
>>
>> [1]
>> http://hg.openjdk.java.net/jdk9/hs-comp/jdk/file/6c649a7ac744/src/java.desktop/share/classes/javax/swing/plaf/synth/SynthStyle.java#l68
>> [2]
>> http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/file/cffca6de2c45/src/share/vm/c1/c1_LinearScan.cpp#l3226
>>
>> On Fri, Feb 12, 2016 at 7:08 PM, Filipp Zhinkin
>> <filipp.zhinkin at gmail.com> wrote:
>>>
>>> Hi Aleksey,
>>>
>>> On Fri, Feb 12, 2016 at 3:24 PM, Aleksey Shipilev
>>> <aleksey.shipilev at oracle.com> wrote:
>>>>
>>>> Hi Filipp,
>>>>
>>>> On 02/12/2016 02:47 PM, Filipp Zhinkin wrote:
>>>>>
>>>>> here is a new webrev:
>>>>> http://cr.openjdk.java.net/~fzhinkin/8067014/webrev.01/
>>>>
>>>>
>>>> The webrev seems incomplete: it has only hotspot.patch in it, but no
>>>> other views?
>>>
>>>
>>> It seems like only wdiff's are empty for some reason.
>>> What else is missed out there?
>>>
>>>>
>>>> I wonder how many intervals have the same "from", prompting you to
>>>> wiggle around looking for the exact interval?
>>>
>>>
>>> Well, there should be (relatively) many intervals with "from" == 0
>>> created for physical registers.
>>> For virtual registers there could be few intervals that share the same
>>> "from" value:
>>> it depends on amount of temporary registers required by an operation
>>> and amount of outputs it produces.
>>>
>>> So we may simply scan intervals from beginning if key's from value is 0.
>>>
>>>> Can we define
>>>> "interval_cmp" so that "(interval_cmp(i1, i2) == 0) iff (i1 == i2)",
>>>
>>>
>>> No, unfortunately we can't, because intervals are ordered only by "from"
>>> value.
>>>
>>>> or at least make the false positives less frequent with more extensive
>>>> interval key (assuming collisions are indeed problematic)?
>>>>
>>>
>>> Not sure that I've got you.
>>>
>>> Nevertheless, I'll run CTW and check how many false positives are
>>> actually found.
>>>
>>>>
>>>>> I've hacked VM sources a bit to run CTW with product bits and C1
>>>>> compilation time on my x86_64 linux laptop
>>>>> slowed down by 0.4% (from 51029 ± 306 ms to 51230 ± 293 ms). Please
>>>>> let me know if it too slow.
>>>>
>>>>
>>>> I think this is within the error margin, and therefore statistically
>>>> insignificant. Even if it was significant, 0.4% is okay for compilation
>>>> time regression in C1.
>>>>
>>>>> With fastdebug bits provided patch allow to reduce C1 compilation time
>>>>> twice.
>>>>
>>>>
>>>> This is a very good improvement, but we need to see if that's the end of
>>>> it, or we can squeeze even more with a few changes. I would suggest
>>>> running the CTW scenario under Solaris Studio Performance Analyzer (see
>>>> e.g.
>>>>
>>>> http://shipilev.net/blog/2016/arrays-wisdom-ancients/#_meet_solaris_studio_performance_analyzer).
>>>
>>>
>>> Thank you for the suggestion, I'll check it.
>>>
>>> Regards,
>>> Filipp.
>>>
>>>>
>>>> Thanks,
>>>> -Aleksey
>>>>
>>>>
>