RFR 8134802 - LCM register pressure scheduling
Berg, Michael C
michael.c.berg at intel.com
Fri Sep 11 17:43:18 UTC 2015
Vladimir, please see the latest update at:
http://cr.openjdk.java.net/~mcberg/8134802/webrev.02/
I have made the node change from below to share flag definitions (reduction/scheduling).
I also added code to screen out methods with only small blocks for live range analysis and register pressure scheduling.
For methods which have some larger blocks we now screen out the small blocks as well. Meaning, overhead
Is by and large not an issue as I see x64 and x86 C2 time not affected by my algorithm with any scheduling budget being offset by time not spent register allocation.
Thanks,
Michael
-----Original Message-----
From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
Sent: Thursday, September 10, 2015 6:04 PM
To: Berg, Michael C; hotspot-compiler-dev at openjdk.java.net
Subject: Re: RFR 8134802 - LCM register pressure scheduling
On 9/10/15 12:11 PM, Berg, Michael C wrote:
> Ok, I can make is_reduction and is_scheduled have the same value. Since I'm clearing it during init processing that will work quite well. Nobody downstream processes reductions.
>
> Problem:
>
> The C++ standard implements enum as int sized, we should union _flags with NodeFlags and increase NodeFlags to juint. We would actually decrease the amount of storage in node by doing so since right now storage for NodeFlags is additive with _flags. We would get 16 more flag slots and make node smaller.
NodeFlags is type, there is no a field in Node class with NodeFlags type. NodeFlags is only used to define flags values which are used to set bits in _flags. So I am not sure what you are proposing.
Thanks,
Vladimir
>
> Michael
>
> -----Original Message-----
> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
> Sent: Wednesday, September 09, 2015 8:29 PM
> To: Berg, Michael C; hotspot-compiler-dev at openjdk.java.net
> Subject: Re: RFR 8134802 - LCM register pressure scheduling
>
> We only have 3 bits left since total is 16:
>
> jushort _flags;
>
> You have Flag_is_reduction which is used only in loop opts/superword. So you can overlap these flags.
>
> We need to clean up this (no you, Michael). We have flags which are used only by Ideal node (Flag_is_macro, Flag_is_expensive). And flags used by Mach nodes (5 flags). We may try to overlap them.
>
> Vladimir
>
> On 9/9/15 7:34 PM, Berg, Michael C wrote:
>> All, please see the link:
>> https://bugs.openjdk.java.net/browse/JDK-8134802
>>
>> As I have uploaded a performance report for data collected with/wo register pressure scheduling. I would like to keep the node flag in place, we have room for 15 more flags after this one is added, and this is a formal phase of C2 and so a good use of one the flags. The addition of VectorSet would incrementally raise the overhead of the algorithm. Please have a look and comment as needed.
>>
>> Thanks,
>> Michael
>>
>> -----Original Message-----
>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
>> Sent: Friday, September 04, 2015 6:42 PM
>> To: Berg, Michael C; hotspot-compiler-dev at openjdk.java.net
>> Subject: Re: RFR 8134802 - LCM register pressure scheduling
>>
>> Impressive work. Thank you for reusing current RA functionality.
>>
>> "is very minimal" - how minimal? 2% or 10%?
>>
>> Did it gave any performance improvement? Changes are significant and should be justified.
>>
>> Changes look reasonable. I only notice one thing:
>> Flag bits in Node is very precious to use for node's state tracking. Why not use VectorSet?
>>
>> Thanks,
>> Vladimir
>>
>> On 9/4/15 1:33 PM, Berg, Michael C wrote:
>>> Hi Folks,
>>>
>>> I would like to contribute LCM register pressure scheduling. I need
>>> two reviewers to examine this patch and comment as needed:
>>>
>>> Bug-id: https://bugs.openjdk.java.net/browse/JDK-8134802
>>>
>>> webrev:
>>>
>>> http://cr.openjdk.java.net/~mcberg/8134802/webrev.01/
>>>
>>> These changes calculate register pressure at the entry of a basic
>>> block, at the end and incrementally while we are scheduling. It uses
>>> an efficient algorithm for recalculating register pressure on a as
>>> needed basis. The algorithm uses heuristics to switch to a pressure
>>> based algorithm to reduce spills for int and float registers using
>>> thresholds for each. It also uses weights which count on a per
>>> register class basis to dope ready list candidate choice while
>>> scheduling so that we reduce register pressure when possible. Once
>>> we fall over either threshold, we start trying mitigate pressure
>>> upon the affected class of registers which are over the limit. This
>>> happens on both register classes and/or separately for each. We
>>> switch back to latency scheduling when pressure is alleviated. As
>>> before we obey hard artifacts such as barriers, fences and such.
>>> Overhead for constructing and providing liveness information and the
>>> additional algorithmic usage is very minimal, so as affect compile time minimally.
>>>
>>> Thanks,
>>>
>>> Michael
>>>
More information about the hotspot-compiler-dev
mailing list