RFR 8134802 - LCM register pressure scheduling
Vladimir Kozlov
vladimir.kozlov at oracle.com
Fri Sep 11 01:04:20 UTC 2015
On 9/10/15 12:11 PM, Berg, Michael C wrote:
> Ok, I can make is_reduction and is_scheduled have the same value. Since I'm clearing it during init processing that will work quite well. Nobody downstream processes reductions.
>
> Problem:
>
> The C++ standard implements enum as int sized, we should union _flags with NodeFlags and increase NodeFlags to juint. We would actually decrease the amount of storage in node by doing so since right now storage for NodeFlags is additive with _flags. We would get 16 more flag slots and make node smaller.
NodeFlags is type, there is no a field in Node class with NodeFlags
type. NodeFlags is only used to define flags values which are used to
set bits in _flags. So I am not sure what you are proposing.
Thanks,
Vladimir
>
> Michael
>
> -----Original Message-----
> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
> Sent: Wednesday, September 09, 2015 8:29 PM
> To: Berg, Michael C; hotspot-compiler-dev at openjdk.java.net
> Subject: Re: RFR 8134802 - LCM register pressure scheduling
>
> We only have 3 bits left since total is 16:
>
> jushort _flags;
>
> You have Flag_is_reduction which is used only in loop opts/superword. So you can overlap these flags.
>
> We need to clean up this (no you, Michael). We have flags which are used only by Ideal node (Flag_is_macro, Flag_is_expensive). And flags used by Mach nodes (5 flags). We may try to overlap them.
>
> Vladimir
>
> On 9/9/15 7:34 PM, Berg, Michael C wrote:
>> All, please see the link:
>> https://bugs.openjdk.java.net/browse/JDK-8134802
>>
>> As I have uploaded a performance report for data collected with/wo register pressure scheduling. I would like to keep the node flag in place, we have room for 15 more flags after this one is added, and this is a formal phase of C2 and so a good use of one the flags. The addition of VectorSet would incrementally raise the overhead of the algorithm. Please have a look and comment as needed.
>>
>> Thanks,
>> Michael
>>
>> -----Original Message-----
>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
>> Sent: Friday, September 04, 2015 6:42 PM
>> To: Berg, Michael C; hotspot-compiler-dev at openjdk.java.net
>> Subject: Re: RFR 8134802 - LCM register pressure scheduling
>>
>> Impressive work. Thank you for reusing current RA functionality.
>>
>> "is very minimal" - how minimal? 2% or 10%?
>>
>> Did it gave any performance improvement? Changes are significant and should be justified.
>>
>> Changes look reasonable. I only notice one thing:
>> Flag bits in Node is very precious to use for node's state tracking. Why not use VectorSet?
>>
>> Thanks,
>> Vladimir
>>
>> On 9/4/15 1:33 PM, Berg, Michael C wrote:
>>> Hi Folks,
>>>
>>> I would like to contribute LCM register pressure scheduling. I need
>>> two reviewers to examine this patch and comment as needed:
>>>
>>> Bug-id: https://bugs.openjdk.java.net/browse/JDK-8134802
>>>
>>> webrev:
>>>
>>> http://cr.openjdk.java.net/~mcberg/8134802/webrev.01/
>>>
>>> These changes calculate register pressure at the entry of a basic
>>> block, at the end and incrementally while we are scheduling. It uses
>>> an efficient algorithm for recalculating register pressure on a as
>>> needed basis. The algorithm uses heuristics to switch to a pressure
>>> based algorithm to reduce spills for int and float registers using
>>> thresholds for each. It also uses weights which count on a per
>>> register class basis to dope ready list candidate choice while
>>> scheduling so that we reduce register pressure when possible. Once we
>>> fall over either threshold, we start trying mitigate pressure upon
>>> the affected class of registers which are over the limit. This
>>> happens on both register classes and/or separately for each. We
>>> switch back to latency scheduling when pressure is alleviated. As
>>> before we obey hard artifacts such as barriers, fences and such.
>>> Overhead for constructing and providing liveness information and the
>>> additional algorithmic usage is very minimal, so as affect compile time minimally.
>>>
>>> Thanks,
>>>
>>> Michael
>>>
More information about the hotspot-compiler-dev
mailing list