CR for RFR 8153998
Vladimir Kozlov
vladimir.kozlov at oracle.com
Mon Apr 18 07:00:28 UTC 2016
This looks good. I will start our testing for it.
Thanks,
Vladimir
On 4/15/16 9:20 PM, Berg, Michael C wrote:
> Vladimir/Christian:
>
> I believe I have addressed all concerns in this update:
>
> Webrev:
> http://cr.openjdk.java.net/~mcberg/8153998/webrev.04/
>
> Regards,
> Michael
>
> -----Original Message-----
> From: hotspot-compiler-dev [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of Berg, Michael C
> Sent: Friday, April 15, 2016 2:04 AM
> To: 'Vladimir Kozlov' <vladimir.kozlov at oracle.com>
> Cc: 'hotspot-compiler-dev at openjdk.java.net' <hotspot-compiler-dev at openjdk.java.net>
> Subject: RE: CR for RFR 8153998
>
> Vladimir, the code has been updated and is available at:
>
> webrev:
> http://cr.openjdk.java.net/~mcberg/8153998/webrev.03a/
>
> Thanks,
> Michael
>
> -----Original Message-----
> From: Berg, Michael C
> Sent: Thursday, April 14, 2016 5:54 PM
> To: Vladimir Kozlov <vladimir.kozlov at oracle.com>
> Cc: hotspot-compiler-dev at openjdk.java.net
> Subject: RE: CR for RFR 8153998
>
> Ok, now we understand one another, I have added a size calc for the new pattern match which will accurately map the emit size for the 32bit and 64bit add files for CountedLoopEnd.
> It will be clean when next you see the code.
>
> -Michael
>
> -----Original Message-----
> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
> Sent: Thursday, April 14, 2016 5:52 PM
> To: Berg, Michael C <michael.c.berg at intel.com>
> Cc: hotspot-compiler-dev at openjdk.java.net
> Subject: Re: CR for RFR 8153998
>
> On 4/14/16 5:12 PM, Berg, Michael C wrote:
>> The restore mark is sizeless, it restores to the know global configuration value for k1 which is used automatically in all of code gen.
>
> How it is sizeless when it generates kmovwl() instruction?
> Do you mean it does not have side effects (no flags modified)?
>
> Vladimir
>
>>
>> Ok, I will try the pattern match method.
>>
>> Thanks
>> -Michael
>>
>> -----Original Message-----
>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
>> Sent: Thursday, April 14, 2016 5:02 PM
>> To: Berg, Michael C <michael.c.berg at intel.com>; Christian Thalinger
>> <christian.thalinger at oracle.com>
>> Cc: hotspot-compiler-dev at openjdk.java.net
>> Subject: Re: CR for RFR 8153998
>>
>> On 4/14/16 4:38 PM, Berg, Michael C wrote:
>>> Vladimir,
>>>
>>> Multiversion copies the post rce'd loop after we check, and we check again before we mark the post loop for mask vector optimization after superword succeeds on the main loop, so we are all good. The mask version of the post loop is always clean when we apply the optimization.
>>>
>>> I tried something like that early on with CountedLoopEnd.
>>
>> In the mach version with restore mark you should specify correct size(X) or don't specify at all to calculate it dynamically (done automatically).
>> I don't see any side effects for restoremask in your code. What are you talking about?
>>
>> I am suggesting something like next:
>>
>> instruct jmpLoopEnd_and_restoreMask(cmpOp cop, rFlagsReg cr, label labl) %{
>> predicate(n->has_vect_mask_set());
>> match(CountedLoopEnd cop cr);
>> effect(USE labl);
>>
>> ins_cost(400);
>> format %{ "j$cop $labl\t# loop end\n\t"
>> "restoremask \t# vector mask restore for loops"
>> %}
>> ins_encode %{
>> Label* L = $labl$$label;
>> __ jcc((Assembler::Condition)($cop$$cmpcode), *L, false); // Always long jump
>> __ restoremask();
>> %}
>> ins_pipe(pipe_jcc);
>> %}
>>
>> Vladimir
>>
>>> The problem is that the resultant jump has an expected size counter for jcc, so you could not do it during emit. You would still have to add the side effect much like what I did. I would be adding a flag to node when we don't need one. What would like to do then, process via flag or how I do it now? We would basically be doing it in the same place.
>>>
>>> -Michael
>>>
>>> -----Original Message-----
>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
>>> Sent: Thursday, April 14, 2016 4:27 PM
>>> To: Christian Thalinger <christian.thalinger at oracle.com>; Berg,
>>> Michael C <michael.c.berg at intel.com>
>>> Cc: hotspot-compiler-dev at openjdk.java.net
>>> Subject: Re: CR for RFR 8153998
>>>
>>> On 4/14/16 3:35 PM, Christian Thalinger wrote:
>>>>
>>>>> On Apr 14, 2016, at 8:44 AM, Berg, Michael C <michael.c.berg at intel.com <mailto:michael.c.berg at intel.com>> wrote:
>>>>>
>>>>> Christian,
>>>>> There is but I would have to anchor it via a gpr register, instead I am treating it like nop emits (yet another side effect), with some guidance for placement.
>>>>
>>>> That’s unfortunate but I understand. I’m fine with it then.
>>>
>>> You can try to generate restore mask on loop exit in mach instruction for CountedLoopEnd node with predicate(n->has_vect_mask_set()). has_vect_mask_set flag should be set in CountedLoopEnd node after creation of CreateMaskI node. Note, CreateMaskI will be only generated for clean counted post loop with on vector iteration - there should not be any other branches there.
>>>
>>> Vladimir
>>>
>>>>
>>>>> Regards,
>>>>> Michael
>>>>> *From:*Christian Thalinger [mailto:christian.thalinger at oracle.com]
>>>>> *Sent:*Thursday, April 14, 2016 11:20 AM *To:*Berg, Michael C
>>>>> <michael.c.berg at intel.com <mailto:michael.c.berg at intel.com>>
>>>>> *Cc:*hotspot-compiler-dev at openjdk.java.net
>>>>> <mailto:hotspot-compiler-dev at openjdk.java.net>
>>>>> *Subject:*Re: CR for RFR 8153998
>>>>>
>>>>> On Apr 13, 2016, at 11:35 AM, Berg, Michael C <michael.c.berg at intel.com <mailto:michael.c.berg at intel.com>> wrote:
>>>>> See below for context.
>>>>> Regards,
>>>>> Michael
>>>>> *From:*Christian Thalinger [mailto:christian.thalinger at oracle.com]
>>>>> *Sent:*Wednesday, April 13, 2016 2:08 PM
>>>>> *To:*Berg, Michael C <michael.c.berg at intel.com <mailto:michael.c.berg at intel.com>>
>>>>> *Cc:*hotspot-compiler-dev at openjdk.java.net <mailto:hotspot-compiler-dev at openjdk.java.net>
>>>>> *Subject:*Re: CR for RFR 8153998
>>>>>
>>>>> On Apr 12, 2016, at 8:26 PM, Berg, Michael C <michael.c.berg at intel.com <mailto:michael.c.berg at intel.com>> wrote:
>>>>> Hi Folks,
>>>>>
>>>>> I would like to contribute Programmable SIMD as implemented on multi-versioned post loops. See:https://bugs.openjdk.java.net/browse/JDK-8151573for the first half of the implementation.
>>>>> This component delivers mask programmed post loops which execute in a single iteration in place of fixup scalar loops which used to take many iterations to complete work for user loops.
>>>>> Currently I have enabled this optimization for x86 only, specifically for machines with masked data predication implemented as per fully enabled EVEX targets. It delivers up to 2x
>>>>> performance and has been modeled over a large number of loop lengths and forms of loops.
>>>>> This code was tested as follows(see jbs entry below):
>>>>>
>>>>> Bug-id:https://bugs.openjdk.java.net/browse/JDK-8153998
>>>>>
>>>>> webrev:
>>>>> http://cr.openjdk.java.net/~mcberg/8153998/webrev.01a/
>>>>>
>>>>>
>>>>> +//------------------------------MachMskNode-----------------------
>>>>> +-
>>>>> +-
>>>>> ----------
>>>>>
>>>>> +// Machine function Msk Node
>>>>>
>>>>> +class MachMskNode : public MachIdealNode {
>>>>>
>>>>> Does “Msk” mean mask? Then we should call it MachMaskNode.
>>>>> <MCB> Ok, that’s easy enough.
>>>>> Also, I don’t quite understand why we have:
>>>>>
>>>>> +instruct set_mask(rRegI dst, rRegI src) %{
>>>>>
>>>>> + predicate(VM_Version::supports_avx512vl());
>>>>>
>>>>> + match(Set dst (MaskCreateI src));
>>>>>
>>>>> + effect(TEMP dst);
>>>>>
>>>>> + format %{ "createmsk $dst, $src" %}
>>>>>
>>>>> + ins_encode %{
>>>>>
>>>>> + __ createmsk($dst$$Register, $src$$Register);
>>>>>
>>>>> + %}
>>>>>
>>>>> but:
>>>>>
>>>>> + void MachMskNode::emit(CodeBuffer &cbuf, PhaseRegAlloc*)
>>>>> const {
>>>>>
>>>>> + MacroAssembler _masm(&cbuf);
>>>>>
>>>>> + __ restoremsk();
>>>>>
>>>>> + }
>>>>>
>>>>> The reason: Currently k registers or mask registers are not allocated, meaning we have to treat their usage as side effects.
>>>>> The case with set_mask is to take our remaining iterations as an index and provide a mask used in k1 that is applicable to its post loop.
>>>>> The subsequent restore, preplaces the default value back into k1. The set_mask rule is posed in such a way that it ensures that the side effect value will survive optimization.
>>>>> The restore is fully a side effect with no produced definition in rule space as mask registers are not formal definitions.
>>>>>
>>>>> Hmm. So, there is no way we can have a RestoreMaskINode?
>>>>>
>>>>> Thanks,
>>>>> Michael
>>>>
More information about the hotspot-compiler-dev
mailing list