RFR (S): 8067014: LinearScan::is_sorted significantly slows down fastdebug builds' performance

Thu Feb 18 16:08:08 UTC 2016

Thanks, Filipp.

I suggest to address CommentedAssembly separately.

One question: why don't you simply typedef IntervalArray/IntervalList to 
GrowableArray<Interval*>? It will eliminate numerous renamings you did.

Otherwise, looks good.

Best regards,
Vladimir Ivanov

On 2/18/16 6:31 PM, Filipp Zhinkin wrote:
> Hi,
>
> I've looked at how frequently misses are actually occur and
> how far false positives are from the interval we're looking for.
>
> Also I've tried to implement interval_cmp so that it returns 0
> if difference between interval "from" values is below some threshold:
> http://cr.openjdk.java.net/~fzhinkin/8067014/webrev.02/stats.txt
>
> All those misses with distance greater than 64 came from
> javax.swing.plaf.synth.SynthStyle::populateDefaultValues [1].
>
> I've also looked to another possible slowness sources and
> we spend about 10% of time in LinearScan's verify_intervals method
> which checks that every two intervals don't simultaneously intersect
> and share the same register [2].
>
> I don't see a way to significantly speed up such verification,
> but I've slightly improved performance by rearranging some expressions.
>
> Here is an updated webrev:
> http://cr.openjdk.java.net/~fzhinkin/8067014/webrev.02/
>
> Also, unless CommentedAssembly flag is explicitly turned off,
> we're generating comments for stubs even if we're not going to print it out.
> Avoiding comments generation in such case will speed up compilation a bit more,
> but I think it would be better to deal with it in a separate RFE.
> Difference in code emission time is about 30% when CommentedAssembly is off
> (~ 40s w/ CommentedAssembly, ~ 25s w/o CommentedAssembly).
>
> [1] http://hg.openjdk.java.net/jdk9/hs-comp/jdk/file/6c649a7ac744/src/java.desktop/share/classes/javax/swing/plaf/synth/SynthStyle.java#l68
> [2] http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/file/cffca6de2c45/src/share/vm/c1/c1_LinearScan.cpp#l3226
>
> On Fri, Feb 12, 2016 at 7:08 PM, Filipp Zhinkin
> <filipp.zhinkin at gmail.com> wrote:
>> Hi Aleksey,
>>
>> On Fri, Feb 12, 2016 at 3:24 PM, Aleksey Shipilev
>> <aleksey.shipilev at oracle.com> wrote:
>>> Hi Filipp,
>>>
>>> On 02/12/2016 02:47 PM, Filipp Zhinkin wrote:
>>>> here is a new webrev: http://cr.openjdk.java.net/~fzhinkin/8067014/webrev.01/
>>>
>>> The webrev seems incomplete: it has only hotspot.patch in it, but no
>>> other views?
>>
>> It seems like only wdiff's are empty for some reason.
>> What else is missed out there?
>>
>>>
>>> I wonder how many intervals have the same "from", prompting you to
>>> wiggle around looking for the exact interval?
>>
>> Well, there should be (relatively) many intervals with "from" == 0
>> created for physical registers.
>> For virtual registers there could be few intervals that share the same
>> "from" value:
>> it depends on amount of temporary registers required by an operation
>> and amount of outputs it produces.
>>
>> So we may simply scan intervals from beginning if key's from value is 0.
>>
>>> Can we define
>>> "interval_cmp" so that "(interval_cmp(i1, i2) == 0) iff (i1 == i2)",
>>
>> No, unfortunately we can't, because intervals are ordered only by "from" value.
>>
>>> or at least make the false positives less frequent with more extensive
>>> interval key (assuming collisions are indeed problematic)?
>>>
>>
>> Not sure that I've got you.
>>
>> Nevertheless, I'll run CTW and check how many false positives are
>> actually found.
>>
>>>
>>>> I've hacked VM sources a bit to run CTW with product bits and C1
>>>> compilation time on my x86_64 linux laptop
>>>> slowed down by 0.4% (from 51029 ± 306 ms to 51230 ± 293 ms). Please
>>>> let me know if it too slow.
>>>
>>> I think this is within the error margin, and therefore statistically
>>> insignificant. Even if it was significant, 0.4% is okay for compilation
>>> time regression in C1.
>>>
>>>> With fastdebug bits provided patch allow to reduce C1 compilation time twice.
>>>
>>> This is a very good improvement, but we need to see if that's the end of
>>> it, or we can squeeze even more with a few changes. I would suggest
>>> running the CTW scenario under Solaris Studio Performance Analyzer (see
>>> e.g.
>>> http://shipilev.net/blog/2016/arrays-wisdom-ancients/#_meet_solaris_studio_performance_analyzer).
>>
>> Thank you for the suggestion, I'll check it.
>>
>> Regards,
>> Filipp.
>>
>>>
>>> Thanks,
>>> -Aleksey
>>>
>>>