RFR 8134802 - LCM register pressure scheduling
Berg, Michael C
michael.c.berg at intel.com
Mon Sep 14 07:50:26 UTC 2015
Ok I can add that, it is better anyway to avoid vector size.
-----Original Message-----
From: Berg, Michael C
Sent: Monday, September 14, 2015 12:48 AM
To: 'Vladimir Kozlov'; hotspot-compiler-dev at openjdk.java.net
Subject: RE: RFR 8134802 - LCM register pressure scheduling
And have that live in all the .ad files...
-----Original Message-----
From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
Sent: Monday, September 14, 2015 12:45 AM
To: Berg, Michael C; hotspot-compiler-dev at openjdk.java.net
Subject: Re: RFR 8134802 - LCM register pressure scheduling
SPARC and ARM64 :)
The problem is UseAVX flag is not defined on those CPUs so you can't use it in shared code.
The correct solution would be to add new Matcher function float_reg_pressure_scale() (similar to
Matcher::max_vector_size()) on all platforms with definition in corresponding .ad files. Then you can simple do:
float_pressure *= Matcher::float_reg_pressure_scale();
without condition.
Regards,
Vladimir
On 9/13/15 11:29 PM, Berg, Michael C wrote:
> Vladimir, I need to know some things about your run. Machine spec, which compiler x86 or x64, etc.
> Sure I will run the nashorn metic. Further guarding the code will not buy us much in overhead avoidance (as in the suggestion below), but I will see what I can do.
> For now the vector size check will work, but as soon as some other uarch has a Z vector, we will have to revisit this.
> The reason I need it is for EVEX enabled uarch machines, which have 2x more xmms.
>
> -----Original Message-----
> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
> Sent: Friday, September 11, 2015 8:58 PM
> To: Berg, Michael C; hotspot-compiler-dev at openjdk.java.net
> Subject: Re: RFR 8134802 - LCM register pressure scheduling
>
> Looks good.
>
> I looked on performance data and for scimark.lu.large C2 time increase significantly (~ 39%) while score did not improve (0,18%).
> I can accept compilation time regression if it gives performance improvement as crypto.aes. But otherwise we need to investigate why that happens.
>
> Can you rerun this on sub-benchmark to see if it repeated?
>
> Also, please, do performance run for nashorn as Aleksey suggested.
>
> RA code at the beginning of gcm.cpp is not guarded by OptoRegScheduling.
> I think you can put guard around all that new code including:
> _regalloc = ®alloc;
>
> Also JPRT reported build failures:
>
> hotspot/src/share/vm/opto/lcm.cpp:999:9: error: 'UseAVX' was not
> declared in this scope
>
> if (UseAVX > 2) {
> float_pressure *= 2;
>
> UseAVX is x86 platform-specific. Why you need to increase float_pressure? If you really need it you can check:
>
> if (Matcher::max_vector_size(T_DOUBLE) > 4)
>
> Thanks,
> Vladimir
>
> On 9/11/15 10:43 AM, Berg, Michael C wrote:
>> Vladimir, please see the latest update at:
>>
>> http://cr.openjdk.java.net/~mcberg/8134802/webrev.02/
>>
>> I have made the node change from below to share flag definitions (reduction/scheduling).
>> I also added code to screen out methods with only small blocks for live range analysis and register pressure scheduling.
>> For methods which have some larger blocks we now screen out the small
>> blocks as well. Meaning, overhead Is by and large not an issue as I see x64 and x86 C2 time not affected by my algorithm with any scheduling budget being offset by time not spent register allocation.
>>
>> Thanks,
>> Michael
>>
>> -----Original Message-----
>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
>> Sent: Thursday, September 10, 2015 6:04 PM
>> To: Berg, Michael C; hotspot-compiler-dev at openjdk.java.net
>> Subject: Re: RFR 8134802 - LCM register pressure scheduling
>>
>> On 9/10/15 12:11 PM, Berg, Michael C wrote:
>>> Ok, I can make is_reduction and is_scheduled have the same value. Since I'm clearing it during init processing that will work quite well. Nobody downstream processes reductions.
>>>
>>> Problem:
>>>
>>> The C++ standard implements enum as int sized, we should union _flags with NodeFlags and increase NodeFlags to juint. We would actually decrease the amount of storage in node by doing so since right now storage for NodeFlags is additive with _flags. We would get 16 more flag slots and make node smaller.
>>
>> NodeFlags is type, there is no a field in Node class with NodeFlags type. NodeFlags is only used to define flags values which are used to set bits in _flags. So I am not sure what you are proposing.
>>
>> Thanks,
>> Vladimir
>>
>>>
>>> Michael
>>>
>>> -----Original Message-----
>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
>>> Sent: Wednesday, September 09, 2015 8:29 PM
>>> To: Berg, Michael C; hotspot-compiler-dev at openjdk.java.net
>>> Subject: Re: RFR 8134802 - LCM register pressure scheduling
>>>
>>> We only have 3 bits left since total is 16:
>>>
>>> jushort _flags;
>>>
>>> You have Flag_is_reduction which is used only in loop opts/superword. So you can overlap these flags.
>>>
>>> We need to clean up this (no you, Michael). We have flags which are used only by Ideal node (Flag_is_macro, Flag_is_expensive). And flags used by Mach nodes (5 flags). We may try to overlap them.
>>>
>>> Vladimir
>>>
>>> On 9/9/15 7:34 PM, Berg, Michael C wrote:
>>>> All, please see the link:
>>>> https://bugs.openjdk.java.net/browse/JDK-8134802
>>>>
>>>> As I have uploaded a performance report for data collected with/wo register pressure scheduling. I would like to keep the node flag in place, we have room for 15 more flags after this one is added, and this is a formal phase of C2 and so a good use of one the flags. The addition of VectorSet would incrementally raise the overhead of the algorithm. Please have a look and comment as needed.
>>>>
>>>> Thanks,
>>>> Michael
>>>>
>>>> -----Original Message-----
>>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
>>>> Sent: Friday, September 04, 2015 6:42 PM
>>>> To: Berg, Michael C; hotspot-compiler-dev at openjdk.java.net
>>>> Subject: Re: RFR 8134802 - LCM register pressure scheduling
>>>>
>>>> Impressive work. Thank you for reusing current RA functionality.
>>>>
>>>> "is very minimal" - how minimal? 2% or 10%?
>>>>
>>>> Did it gave any performance improvement? Changes are significant and should be justified.
>>>>
>>>> Changes look reasonable. I only notice one thing:
>>>> Flag bits in Node is very precious to use for node's state tracking. Why not use VectorSet?
>>>>
>>>> Thanks,
>>>> Vladimir
>>>>
>>>> On 9/4/15 1:33 PM, Berg, Michael C wrote:
>>>>> Hi Folks,
>>>>>
>>>>> I would like to contribute LCM register pressure scheduling. I
>>>>> need two reviewers to examine this patch and comment as needed:
>>>>>
>>>>> Bug-id: https://bugs.openjdk.java.net/browse/JDK-8134802
>>>>>
>>>>> webrev:
>>>>>
>>>>> http://cr.openjdk.java.net/~mcberg/8134802/webrev.01/
>>>>>
>>>>> These changes calculate register pressure at the entry of a basic
>>>>> block, at the end and incrementally while we are scheduling. It
>>>>> uses an efficient algorithm for recalculating register pressure on
>>>>> a as needed basis. The algorithm uses heuristics to switch to a
>>>>> pressure based algorithm to reduce spills for int and float
>>>>> registers using thresholds for each. It also uses weights which
>>>>> count on a per register class basis to dope ready list candidate
>>>>> choice while scheduling so that we reduce register pressure when
>>>>> possible. Once we fall over either threshold, we start trying
>>>>> mitigate pressure upon the affected class of registers which are
>>>>> over the limit. This happens on both register classes and/or
>>>>> separately for each. We switch back to latency scheduling when
>>>>> pressure is alleviated. As before we obey hard artifacts such as barriers, fences and such.
>>>>> Overhead for constructing and providing liveness information and
>>>>> the additional algorithmic usage is very minimal, so as affect compile time minimally.
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Michael
>>>>>
More information about the hotspot-compiler-dev
mailing list